Examples¶

This page provides examples of using CausalKit for various causal inference tasks.

A/B Testing Example¶

This example demonstrates how to generate A/B test data and analyze the results:

import causalkit
from causalkit.data import generate_ab_test_data
from causalkit.inference import compare_ab

# Generate synthetic A/B test data
df = generate_ab_test_data(
    n_samples={"A": 5000, "B": 5000},
    conversion_rates={"A": 0.10, "B": 0.12},
    random_state=42
)

# Extract control and treatment data
control = df[df['group'] == 'A']['conversion'].values
treatment = df[df['group'] == 'B']['conversion'].values

# Compare the results
compare_ab(control, treatment)

Randomized Controlled Trial (RCT) Example¶

This example shows how to generate RCT data and analyze it:

import numpy as np
import pandas as pd
from causalkit.data import generate_rct_data
from causalkit.inference import compare_ab_with_plr

# Generate RCT data
df = generate_rct_data(
    n_users=10000,
    split=0.5,
    target_type="continuous",
    random_state=42
)

# Extract control and treatment data
control = df[df['treatment'] == 0]['outcome'].values
treatment = df[df['treatment'] == 1]['outcome'].values

# Compare using PLR
compare_ab_with_plr(control, treatment)

Traffic Splitting Example¶

This example demonstrates how to split traffic for an experiment:

import pandas as pd
import numpy as np
from causalkit.design.traffic_splitter import split_traffic

# Create a sample DataFrame
df = pd.DataFrame({
    'user_id': range(1000),
    'age': np.random.normal(30, 5, 1000),
    'gender': np.random.choice(['M', 'F'], 1000),
    'country': np.random.choice(['US', 'UK', 'CA', 'AU'], 1000)
})

# Split into control and treatment groups with stratification
control_df, treatment_df = split_traffic(
    df,
    split_ratio=0.5,
    stratify_column='country',
    random_state=42
)

# Verify the distribution of the stratification variable
print("Control group country distribution:")
print(control_df['country'].value_counts(normalize=True))

print("\nTreatment group country distribution:")
print(treatment_df['country'].value_counts(normalize=True))

More Examples¶

For more examples, check out the Jupyter notebooks in the examples directory of the repository:

A/B Testing with T-Test
Traffic Splitting
Double Machine Learning with PLR
Data Functions - Demonstrates how to use the data generation functions and the causaldata class for managing causal inference data
Design Functions - Shows how to use traffic splitting for A/B testing and calculate minimum detectable effect (MDE) for experimental design