Getting Started with CausalKit¶

This guide will help you get started with CausalKit by walking through some basic examples.

Basic Workflow¶

A typical workflow with CausalKit involves:

Generating or loading data
Designing and implementing an experiment
Analyzing the results

Let's walk through each step with examples.

Data Generation¶

CausalKit provides several functions for generating synthetic data for causal inference tasks.

Generating A/B Test Data¶

from causalkit.data import generate_ab_test_data

# Generate A/B test data with default parameters
df = generate_ab_test_data()

# Generate A/B test data with custom parameters
df_custom = generate_ab_test_data(
    n_samples={"A": 5000, "B": 5000},
    conversion_rates={"A": 0.10, "B": 0.12},
    random_state=42
)

print(df_custom.head())

Generating Randomized Controlled Trial (RCT) Data¶

from causalkit.data import generate_rct_data

# Generate RCT data with default parameters
df = generate_rct_data()

# Generate RCT data with custom parameters
df_custom = generate_rct_data(
    n_users=10000,
    split=0.5,
    target_type="binary",
    random_state=42
)

print(df_custom.head())

Experimental Design¶

Splitting Traffic¶

CausalKit provides utilities for splitting traffic data for experiments.

import pandas as pd
from causalkit.design.traffic_splitter import split_traffic

# Create a sample DataFrame
df = pd.DataFrame({
    'user_id': range(1000),
    'feature_1': np.random.normal(0, 1, 1000),
    'feature_2': np.random.choice(['A', 'B', 'C'], 1000)
})

# Split into training and test sets (70% / 30%)
train_df, test_df = split_traffic(df, split_ratio=0.7, random_state=42)

# Split into training, validation, and test sets (60% / 20% / 20%)
train_df, val_df, test_df = split_traffic(df, split_ratio=[0.6, 0.2], random_state=42)

# Stratified split based on a categorical feature
train_df, test_df = split_traffic(df, split_ratio=0.7, stratify_column='feature_2', random_state=42)

Analysis¶

Comparing A/B Test Results¶

import numpy as np
from causalkit.inference import compare_ab

# Generate some sample data
control = np.random.normal(10, 2, 1000)  # Control group data
treatment = np.random.normal(10.5, 2, 1000)  # Treatment group data

# Compare the results
compare_ab(control, treatment)

Advanced Analysis with PLR¶

from causalkit.inference import compare_ab_with_plr

# Compare using Partial Linear Regression
compare_ab_with_plr(control, treatment)

Next Steps¶

Now that you're familiar with the basic functionality of CausalKit, you can:

Explore the API Reference for detailed documentation of all functions
Check out the Examples for more complex use cases
Read about advanced topics in causal inference in the user guide

For any questions or issues, please visit the GitHub repository.