causalkit.data.generators.generate_rct_data#
- causalkit.data.generators.generate_rct_data(n_users=20000, split=0.5, random_state=42, target_type='binary', target_params=None)[source]#
Create synthetic RCT data using CausalDatasetGenerator as the core engine.
Treatment is randomized to approximately match split (independent of covariates).
Outcome distribution is controlled by target_type and target_params.
Returns a legacy-compatible schema with ancillary covariates derived from the outcome (age, cnt_trans, platform_Android, platform_iOS, invited_friend), plus a UUID user_id.
- Parameters:
n_users (int) – Total number of users in the dataset.
split (float) – Proportion of users in the treatment group (e.g., 0.5 => 50/50).
random_state (int, optional) – Seed for reproducibility.
target_type ({"binary","normal","nonnormal"}) – Outcome family. “nonnormal” is approximated via a Poisson mean process.
target_params (dict, optional) –
- If None, defaults are used:
binary : {“p”: {“A”: 0.10, “B”: 0.12}}
normal : {“mean”: {“A”: 0.00, “B”: 0.20}, “std”: 1.0}
nonnormal: {“shape”: 2.0, “scale”: {“A”: 1.0, “B”: 1.1}}
- Returns:
- Columns: user_id, treatment, outcome, age, cnt_trans,
platform_Android, platform_iOS, invited_friend.
- Return type:
pd.DataFrame
- Raises:
ValueError – If target_type is not one of {“binary”, “normal”, “nonnormal”}.