causalkit.data.generators.generate_rct_data#

causalkit.data.generators.generate_rct_data(n_users=20000, split=0.5, random_state=42, target_type='binary', target_params=None)[source]#

Create synthetic RCT data using CausalDatasetGenerator as the core engine.

  • Treatment is randomized to approximately match split (independent of covariates).

  • Outcome distribution is controlled by target_type and target_params.

  • Returns a legacy-compatible schema with ancillary covariates derived from the outcome (age, cnt_trans, platform_Android, platform_iOS, invited_friend), plus a UUID user_id.

Parameters:
  • n_users (int) – Total number of users in the dataset.

  • split (float) – Proportion of users in the treatment group (e.g., 0.5 => 50/50).

  • random_state (int, optional) – Seed for reproducibility.

  • target_type ({"binary","normal","nonnormal"}) – Outcome family. “nonnormal” is approximated via a Poisson mean process.

  • target_params (dict, optional) –

    If None, defaults are used:
    • binary : {“p”: {“A”: 0.10, “B”: 0.12}}

    • normal : {“mean”: {“A”: 0.00, “B”: 0.20}, “std”: 1.0}

    • nonnormal: {“shape”: 2.0, “scale”: {“A”: 1.0, “B”: 1.1}}

Returns:

Columns: user_id, treatment, outcome, age, cnt_trans,

platform_Android, platform_iOS, invited_friend.

Return type:

pd.DataFrame

Raises:

ValueError – If target_type is not one of {“binary”, “normal”, “nonnormal”}.