causalkit.data.causaldata.CausalData#
- class causalkit.data.causaldata.CausalData(df, treatment, outcome, confounders=None)[source]#
Container for causal inference datasets.
Wraps a pandas DataFrame and stores the names of treatment, outcome, and optional confounder columns. The stored DataFrame is restricted to only those columns.
- Parameters:
df (pd.DataFrame) – The DataFrame containing the data. Cannot contain NaN values. Only columns specified in outcome, treatment, and confounders will be stored.
treatment (str) – Column name representing the treatment variable.
outcome (str) – Column name representing the outcome (target) variable.
confounders (Union[str, List[str]], optional) – Column name(s) representing the confounders/covariates.
- df#
A copy of the original data restricted to [outcome, treatment] + confounders.
- Type:
pd.DataFrame
Examples
>>> from causalkit.data import generate_rct_data >>> from causalkit.data import CausalData >>> >>> # Generate data >>> df = generate_rct_data() >>> >>> # Create CausalData object >>> causal_data = CausalData( ... df=df, ... treatment='treatment', ... outcome='outcome', ... confounders=['age', 'invited_friend'] ... ) >>> >>> # Access data >>> causal_data.df.head() >>> >>> # Access columns by role >>> causal_data.target >>> causal_data.confounders >>> causal_data.treatment
Methods
__init__
(df, treatment, outcome[, confounders])Initialize a CausalData object.
get_df
([columns, include_treatment, ...])Get a DataFrame from the CausalData object with specified columns.
Attributes
List of confounder column names.
Get the outcome/outcome variable.
Get the treatment variable.
- property target: Series#
Get the outcome/outcome variable.
- Returns:
The outcome column as a pandas Series.
- Return type:
pd.Series
- property treatment: Series#
Get the treatment variable.
- Returns:
The treatment column as a pandas Series.
- Return type:
pd.Series
- get_df(columns=None, include_treatment=True, include_target=True, include_confounders=True)[source]#
Get a DataFrame from the CausalData object with specified columns.
- Parameters:
columns (List[str], optional) – Specific column names to include in the returned DataFrame. If provided, these columns will be included in addition to any columns specified by the include parameters. If None, columns will be determined solely by the include parameters. If None and no include parameters are True, returns the entire DataFrame.
include_treatment (bool, default True) – Whether to include treatment column(s) in the returned DataFrame.
include_target (bool, default True) – Whether to include target column(s) in the returned DataFrame.
include_confounders (bool, default True) – Whether to include confounder column(s) in the returned DataFrame.
- Returns:
DataFrame containing the specified columns.
- Return type:
pd.DataFrame
Examples
>>> from causalkit.data import generate_rct_data >>> from causalkit.data import CausalData >>> >>> # Generate data >>> df = generate_rct_data() >>> >>> # Create CausalData object >>> causal_data = CausalData( ... df=df, ... treatment='treatment', ... outcome='outcome', ... confounders=['age', 'invited_friend'] ... ) >>> >>> # Get specific columns >>> causal_data.get_df(columns=['age']) >>> >>> # Get all columns >>> causal_data.get_df()