causalkit.data.generators.generate_rct_data#

causalkit.data.generators.generate_rct_data(n_users=20000, split=0.5, random_state=42, target_type='binary', target_params=None)[source]#

Create synthetic RCT data using CausalDatasetGenerator as the core engine.

Treatment is randomized to approximately match split (independent of covariates).
Outcome distribution is controlled by target_type and target_params.
Returns a legacy-compatible schema with ancillary covariates derived from the outcome (age, cnt_trans, platform_Android, platform_iOS, invited_friend), plus a UUID user_id.

Parameters:

n_users (int) – Total number of users in the dataset.
split (float) – Proportion of users in the treatment group (e.g., 0.5 => 50/50).
random_state (int, optional) – Seed for reproducibility.
target_type ({"binary","normal","nonnormal"}) – Outcome family. “nonnormal” is approximated via a Poisson mean process.
target_params (dict, optional) –
If None, defaults are used:
- binary : {“p”: {“A”: 0.10, “B”: 0.12}}
- normal : {“mean”: {“A”: 0.00, “B”: 0.20}, “std”: 1.0}
- nonnormal: {“shape”: 2.0, “scale”: {“A”: 1.0, “B”: 1.1}}

Returns:

Columns: user_id, treatment, outcome, age, cnt_trans,: platform_Android, platform_iOS, invited_friend.

Return type:

pd.DataFrame

Raises:

ValueError – If target_type is not one of {“binary”, “normal”, “nonnormal”}.