Exploratory Data Analysis (EDA)#
The eda module provides quick, practical diagnostics to check whether your binary-treatment causal problem is suitable for effect estimation. It focuses on interpretability and helps you answer questions like: Do treatment and control groups overlap? Are confounders balanced? Which features drive treatment assignment and the outcome?
What you can do
Inspect outcome by treatment: summary statistics and simple plots
Estimate propensity scores with cross‑validation and check diagnostics (ROC AUC, positivity/overlap, score overlap plot)
Assess covariate balance via means and standardized mean differences (SMD)
Fit a simple outcome model (confounders only) and inspect predictive accuracy and feature attributions
Typical workflow
from causalkit.eda import CausalEDA
from causalkit.data import CausalData # optional import if you need to construct CausalData
# Prepare your dataset (must be CausalData or compatible with df/treatment/outcome/confounders)
causal_data = ... # your CausalData object
eda = CausalEDA(causal_data)
# 1) Quick dataset checks
shape = eda.data_shape() # {'n_rows': ..., 'n_columns': ...}
stats = eda.outcome_stats() # outcome summary by treatment
fig1, fig2 = eda.outcome_plots() # histogram + boxplot by treatment
# 2) Propensity and overlap
ps_model = eda.fit_propensity()
auc = ps_model.roc_auc # treatment predictability
positivity = ps_model.positivity_check()
ps_model.ps_graph() # overlap of propensity scores
shap_t = ps_model.shap # features driving treatment
# 3) Balance
balance = eda.confounders_means() # means, abs diff, SMD
# 4) Outcome model (confounders only)
out = eda.outcome_fit()
metrics = out.scores # RMSE/MAE
shap_y = out.shap # features driving outcome
Notes
By default, CatBoost models are used and categorical features are handled natively; you can provide your own models if needed.
See docs/examples/basic_example.ipynb for an end‑to‑end demonstration that uses these EDA tools before inference.