causalkit.data.causaldata.CausalData#

class causalkit.data.causaldata.CausalData(df, treatment, outcome, confounders=None)[source]#

Container for causal inference datasets.

Wraps a pandas DataFrame and stores the names of treatment, outcome, and optional confounder columns. The stored DataFrame is restricted to only those columns.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing the data. Cannot contain NaN values. Only columns specified in outcome, treatment, and confounders will be stored.

  • treatment (str) – Column name representing the treatment variable.

  • outcome (str) – Column name representing the outcome (target) variable.

  • confounders (Union[str, List[str]], optional) – Column name(s) representing the confounders/covariates.

df#

A copy of the original data restricted to [outcome, treatment] + confounders.

Type:

pd.DataFrame

treatment#

Name of the treatment column.

Type:

str

outcome#

Name of the outcome (target) column.

Type:

str

confounders#

Names of the confounder columns (may be empty).

Type:

list[str]

Examples

>>> from causalkit.data import generate_rct_data
>>> from causalkit.data import CausalData
>>>
>>> # Generate data
>>> df = generate_rct_data()
>>>
>>> # Create CausalData object
>>> causal_data = CausalData(
...     df=df,
...     treatment='treatment',
...     outcome='outcome',
...     confounders=['age', 'invited_friend']
... )
>>>
>>> # Access data
>>> causal_data.df.head()
>>>
>>> # Access columns by role
>>> causal_data.target
>>> causal_data.confounders
>>> causal_data.treatment
__init__(df, treatment, outcome, confounders=None)[source]#

Initialize a CausalData object.

Parameters:

Methods

__init__(df, treatment, outcome[, confounders])

Initialize a CausalData object.

get_df([columns, include_treatment, ...])

Get a DataFrame from the CausalData object with specified columns.

Attributes

confounders

List of confounder column names.

outcome

target

Get the outcome/outcome variable.

treatment

Get the treatment variable.

__init__(df, treatment, outcome, confounders=None)[source]#

Initialize a CausalData object.

Parameters:
property target: Series#

Get the outcome/outcome variable.

Returns:

The outcome column as a pandas Series.

Return type:

pd.Series

property outcome: Series#
property confounders: List[str]#

List of confounder column names.

property treatment: Series#

Get the treatment variable.

Returns:

The treatment column as a pandas Series.

Return type:

pd.Series

get_df(columns=None, include_treatment=True, include_target=True, include_confounders=True)[source]#

Get a DataFrame from the CausalData object with specified columns.

Parameters:
  • columns (List[str], optional) – Specific column names to include in the returned DataFrame. If provided, these columns will be included in addition to any columns specified by the include parameters. If None, columns will be determined solely by the include parameters. If None and no include parameters are True, returns the entire DataFrame.

  • include_treatment (bool, default True) – Whether to include treatment column(s) in the returned DataFrame.

  • include_target (bool, default True) – Whether to include target column(s) in the returned DataFrame.

  • include_confounders (bool, default True) – Whether to include confounder column(s) in the returned DataFrame.

Returns:

DataFrame containing the specified columns.

Return type:

pd.DataFrame

Examples

>>> from causalkit.data import generate_rct_data
>>> from causalkit.data import CausalData
>>>
>>> # Generate data
>>> df = generate_rct_data()
>>>
>>> # Create CausalData object
>>> causal_data = CausalData(
...     df=df,
...     treatment='treatment',
...     outcome='outcome',
...     confounders=['age', 'invited_friend']
... )
>>>
>>> # Get specific columns
>>> causal_data.get_df(columns=['age'])
>>>
>>> # Get all columns
>>> causal_data.get_df()
__repr__()[source]#

String representation of the CausalData object.

Return type:

str