Inference#

The causalis.inference module provides estimators to quantify causal effects from observational or experimental data.
It offers ready-to-use functions for common targets such as:

ATE (Average Treatment Effect): overall effect of a binary treatment on the outcome
ATT (Average Treatment effect on the Treated)
CATE / GATE: heterogeneity of effects across individuals or groups

Under the hood, ATE/ATT routines rely on Double Machine Learning (DoubleML / IRM) with sensible defaults
(CatBoost learners) and cross-fitting to reduce bias from flexible nuisance models.

A Very Short ATE Example#

Below is the minimal flow using the internal IRM ATE estimator.
It expects a CausalData object with one binary treatment, one outcome, and a list of confounders.

from causalis.inference.ate import dml_ate

# Assume you already constructed a CausalData object: `causal_data`
# (see the User Guide pages for data preparation and EDA)

results = dml_ate(causal_data, n_folds=5, confidence_level=0.95)

print("ATE (coefficient):", results["coefficient"])      # float
print("Std. error:", results["std_error"])               # float
print("P-value:", results["p_value"])                    # float
print("95% CI:", results["confidence_interval"])         # (lower, upper)

What It Returns#

{
  'coefficient': float,                  # estimated average treatment effect
  'std_error': float,                    # standard error
  'p_value': float,                      # p-value for H0: effect == 0
  'confidence_interval': (float, float), # (lower, upper) at the requested level
  'model': IRM,                          # fitted IRM object for advanced inspection
  'diagnostic_data': dict                # comprehensive diagnostic information (optional)
}

Advanced Usage#

Custom Machine Learning Models#

You can pass your own ML models to replace the default CatBoost learners:

from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

results = dml_ate(
    causal_data,
    ml_g=RandomForestRegressor(n_estimators=100, max_depth=5),
    ml_m=RandomForestClassifier(n_estimators=100, max_depth=5),
    n_folds=5,
    confidence_level=0.95
)

Additional Parameters#

The dml_ate function provides several additional parameters for fine-tuning:

results = dml_ate(
    causal_data,
    n_folds=5,                      # number of cross-fitting folds
    n_rep=1,                        # number of repetitions (currently 1 supported)
    score="ATE",                    # "ATE" or "ATTE"
    confidence_level=0.95,          # confidence level for CI
    normalize_ipw=False,            # whether to normalize IPW terms
    trimming_rule="truncate",       # trimming approach for propensity
    trimming_threshold=1e-2,        # trimming threshold
    random_state=42,                # random seed for reproducibility
    store_diagnostic_data=True      # store comprehensive diagnostics
)

Diagnostic Data#

By default, dml_ate can store detailed diagnostic information useful for validation and refutation tests:

results = dml_ate(causal_data, store_diagnostic_data=True)

# Access diagnostic data
diagnostics = results["diagnostic_data"]

# Available diagnostic information:
# - m_hat: estimated propensity scores
# - g0_hat: estimated outcome under control
# - g1_hat: estimated outcome under treatment
# - y: observed outcomes
# - d: observed treatment
# - x: confounders
# - psi: influence function values
# - psi_a, psi_b: score components
# - folds: fold assignments
# - score: score type used
# - normalize_ipw: whether IPW was normalized
# - trimming_threshold: trimming threshold used
# - p1: treatment prevalence

This diagnostic data enables comprehensive refutation tests, including overlap diagnostics, score validation, and sensitivity analysis.