Analysis Module¶
The causalkit.analysis
module provides statistical analysis tools for causal inference.
Overview¶
This module includes functions for:
- Performing t-tests on causaldata objects to compare target variables between treatment groups
- Calculating p-values, absolute differences, and relative differences with confidence intervals
T-Test Analysis¶
The ttest
function performs a t-test on a causaldata object to compare the target variable between treatment groups. This is particularly useful for analyzing the results of A/B tests or randomized controlled trials (RCTs).
Key Features¶
- Compares means between treatment and control groups
- Calculates p-values to determine statistical significance
- Provides absolute difference between group means with confidence intervals
- Calculates relative difference (percentage change) with confidence intervals
- Supports customizable confidence levels
When to Use T-Tests¶
T-tests are appropriate when:
- You have a binary treatment variable (e.g., control vs. treatment)
- Your target variable is continuous or binary
- You want to determine if there's a statistically significant difference between groups
- You need to quantify the magnitude of the effect with confidence intervals
Example Usage¶
from causalkit.data import generate_rct_data, CausalData
from causalkit.inference import ttest
# Generate sample RCT data
df = generate_rct_data(
n_users=10000,
split=0.5,
target_type="normal",
target_params={"mean": {"A": 10.0, "B": 10.5}, "std": 2.0},
random_state=42
)
# Create causaldata object
ck = CausalData(
df=df,
target='target',
treatment='treatment'
)
# Perform t-test with 95% confidence level
results = ttest(ck, confidence_level=0.95)
# Print results
print(f"P-value: {results['p_value']:.4f}")
print(f"Absolute difference: {results['absolute_difference']:.4f}")
print(f"Absolute CI: {results['absolute_ci']}")
print(f"Relative difference: {results['relative_difference']:.2f}%")
print(f"Relative CI: {results['relative_ci']}")
Interpreting Results¶
- p-value: Indicates the probability of observing the data if there is no true difference between groups. A small p-value (typically < 0.05) suggests that the observed difference is statistically significant.
- absolute_difference: The raw difference between the treatment and control means.
- absolute_ci: Confidence interval for the absolute difference. If this interval does not include zero, the difference is statistically significant.
- relative_difference: The percentage change relative to the control group mean.
- relative_ci: Confidence interval for the relative difference.
API Reference¶
T-test inference for causaldata objects.
ttest(data, confidence_level=0.95)
¶
Perform a t-test on a causaldata object to compare the target variable between treatment groups.
Parameters¶
data : CausalData The causaldata object containing treatment and target variables. confidence_level : float, default 0.95 The confidence level for calculating confidence intervals (between 0 and 1).
Returns¶
Dict[str, Any] A dictionary containing: - p_value: The p-value from the t-test - absolute_difference: The absolute difference between treatment and control means - absolute_ci: Tuple of (lower, upper) bounds for the absolute difference confidence interval - relative_difference: The relative difference (percentage change) between treatment and control means - relative_ci: Tuple of (lower, upper) bounds for the relative difference confidence interval
Raises¶
ValueError If the causaldata object doesn't have both treatment and target variables defined, or if the treatment variable is not binary.
Examples¶
from causalkit.data import generate_rct_data from causalkit.data import CausalData from causalkit.inference import ttest
Generate data¶
df = generate_rct_data()
Create causaldata object¶
ck = CausalData( ... df=df, ... target='target', ... treatment='treatment' ... )
Perform t-test¶
results = ttest(ck) print(f"P-value: {results['p_value']:.4f}") print(f"Absolute difference: {results['absolute_difference']:.4f}") print(f"Absolute CI: {results['absolute_ci']}") print(f"Relative difference: {results['relative_difference']:.2f}%") print(f"Relative CI: {results['relative_ci']}")
Source code in causalkit/inference/ttest.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
|