Impact of 401(k) on Financial Assets#

Data explanation#

1991 Survey of Income and Program Participation

net_tfa — Net Total Financial Assets. Calculated as the sum of all liquid and interest-earning assets (IRA balances, 401(k) balances, checking accounts, U.S. savings bonds, other interest‐earning accounts, stocks, mutual funds, etc.) minus non‐mortgage debts.
e401 — 401(k) Eligibility Indicator. Equals 1 if the individual’s employer offers a 401(k) plan; otherwise 0.
p401 — 401(k) Participation Indicator. Equals 1 if the individual participate in 401(k) plan; otherwise 0.
age — Age. Age of the individual in years.
inc — Annual Income. Annual income of the individual, measured in U.S. dollars for the year 1990.
educ — Years of Education. Number of completed years of formal education.
fsize — Family Size. Total number of persons living in the household.
marr — Marital Status. Equals 1 if the individual is married; otherwise 0.
twoearn — Two-Earner Household. Equals 1 if there are two wage earners in the household; otherwise 0.
db — Defined-Benefit Pension Plan. Equals 1 if the individual is covered by a defined-benefit pension plan; otherwise 0.
pira — IRA Participation. Equals 1 if the individual contributes to an Individual Retirement Account (IRA); otherwise 0.
hown — Home Ownership. Equals 1 if the household owns its home; 0 if renting.

We download it with fetch_401K function from doubleML.datasets

This dataset has a problem. Cofounders were measured in the same year as treatment and outcome, like they are demographic factors that do not change over time. So inference might still be biased.

from doubleml.datasets import fetch_401K
df = fetch_401K(return_type='DataFrame')

# don't know what they mean
df = df.drop(columns=['tw', 'nifa'])
# eligibility for 401k, drop because it's an instrument
df = df.drop(columns=['e401'])

df.columns

Index(['net_tfa', 'age', 'inc', 'fsize', 'educ', 'db', 'marr', 'twoearn',
       'p401', 'pira', 'hown'],
      dtype='object')

from causalis.data import CausalData
causal_data = CausalData(df=df, treatment='p401',
                         outcome='net_tfa',
                         confounders=['age', 'inc', 'fsize', 'educ', 'db', 'marr', 'twoearn', 'pira', 'hown'])

EDA#

from causalis.eda import CausalEDA
eda = CausalEDA(causal_data)

# shape of data
eda.data_shape()

{'n_rows': 9915, 'n_columns': 11}

# 1) Outcome statistics by treatment
eda.outcome_stats()

	count	mean	std	min	p10	p25	median	p75	p90	max
treatment
0	7321	10890.477539	55256.829173	-502302.0	-5427.0	-1184.0	200.0	7399.0	33500.0	1462115.0
1	2594	38262.058594	79087.535303	-283356.0	-1300.0	3000.0	15249.0	45985.5	98887.4	1536798.0

Participants’ average net worth is ≈28k higher, but this gap cannot be causally attributed solely to participation. The classes are imbalanced—only 26% are treated.

eda.outcome_hist()

../_images/70ddb2baa40dfd121a41763d5d28697dcb6156b6692f581e1690279f6ac7795e.png

eda.outcome_boxplot()

../_images/6e11c3e55b8e19db88cf0570f39b3f580751a35e4390b562d785a8f81d5481f9.png

Our outcome has large right tail

# Shows means of confounders for control/treated groups, absolute differences, and SMD values
confounders_balance_df = eda.confounders_means()
display(confounders_balance_df)

	mean_t_0	mean_t_1	abs_diff	smd
confounders
inc	32889.802486	49366.975713	16477.173227	0.662197
hown	0.589127	0.765227	0.176100	0.383456
pira	0.200246	0.360447	0.160201	0.362421
db	0.228248	0.391673	0.163426	0.358968
twoearn	0.337659	0.502699	0.165040	0.339096
educ	12.991121	13.813416	0.822294	0.299061
marr	0.574512	0.690439	0.115928	0.242175
age	40.900970	41.509638	0.608668	0.060108
fsize	2.848108	2.915960	0.067852	0.044731

Treatment and control are unbalanced on all confounders except age and fsize; nonetheless, we retain age and fsize in the model to gain efficiency

# Propensity model fit
ps_model = eda.fit_propensity()

# ROC AUC - shows how predictable treatment is from confounders
roc_auc_score = ps_model.roc_auc
print("ROC AUC from PropensityModel:", round(roc_auc_score, 4))

ROC AUC from PropensityModel: 0.7145

# Positivity check - assess overlap between treatment groups
positivity_result = ps_model.positivity_check()
print("Positivity check from PropensityModel:", positivity_result)

Positivity check from PropensityModel: {'bounds': (0.05, 0.95), 'share_below': 0.08371154815935451, 'share_above': 0.0, 'flag': True}

# SHAP values - feature importance for treatment assignment from confounders
shap_values_df = ps_model.shap
display(shap_values_df)

	feature	shap_mean	shap_mean_abs	exact_pp_change_abs	exact_pp_change_signed
0	num__inc	0.006932	0.583319	0.126203	0.001334
1	num__fsize	0.005461	0.080593	0.015778	0.001050
2	num__age	-0.004583	0.154312	0.030720	-0.000879
3	num__marr	-0.004524	0.030119	0.005827	-0.000868
4	num__educ	-0.003760	0.121098	0.023929	-0.000722
5	num__hown	0.001024	0.187255	0.037551	0.000197
6	num__db	-0.000816	0.272325	0.055610	-0.000157
7	num__twoearn	0.000228	0.043159	0.008376	0.000044
8	num__pira	0.000038	0.129544	0.025647	0.000007

# Propensity score overlap graph
ps_model.plot_m_overlap()

../_images/cee310cd1c51350eb01d14ed797326d7e05b78da91a1b3a61fe44216c39090fb.png

# Outcome model fit
outcome_model = eda.outcome_fit()

# RMSE and MAE of regression model
print(outcome_model.scores)

{'rmse': 60189.944724760215, 'mae': 20331.55384475013}

# 2) SHAP values - feature importance for outcome prediction from confounders
shap_outcome_df = outcome_model.shap
display(shap_outcome_df)

	feature	shap_mean
0	inc	731.774932
1	pira	-683.388458
2	age	370.619026
3	marr	-189.688960
4	educ	-133.732826
5	twoearn	-106.324198
6	fsize	68.288189
7	db	-32.271609
8	hown	-25.276096

Inference#

from causalis.inference.ate import dml_ate

# Estimate Average Treatment Effect (ATE)
ate_result = dml_ate(causal_data, n_folds=4, confidence_level=0.95)

print(ate_result.get('coefficient'))
print(ate_result.get('p_value'))
print(ate_result.get('confidence_interval'))

11250.6922050985
4.0745185003743245e-13
(8210.458854591107, 14290.925555605892)

Average Treatment Effect is significant and equals 11385 dollars in CI bounds (8674, 14096)

Refutation#

Overlap validation#

from  causalis.refutation import *
rep = run_overlap_diagnostics(res=ate_result)
rep["summary"]

	metric	value	flag
0	edge_0.01_below	0.000000	GREEN
1	edge_0.01_above	0.000000	GREEN
2	edge_0.02_below	0.026828	GREEN
3	edge_0.02_above	0.000000	GREEN
4	KS	0.307200	YELLOW
5	AUC	0.706812	GREEN
6	ESS_treated_ratio	0.365399	GREEN
7	ESS_control_ratio	0.893936	GREEN
8	tails_w1_q99/med	33.171484	YELLOW
9	tails_w0_q99/med	5.250310	GREEN
10	ATT_identity_relerr	0.069338	YELLOW
11	clip_m_total	0.009783	GREEN
12	calib_ECE	0.026384	GREEN
13	calib_slope	0.765845	YELLOW
14	calib_intercept	-0.197527	GREEN

We find no evidence of a violation of the overlap (positivity) assumption.

Score validation#

from causalis.refutation.score.score_validation import run_score_diagnostics
rep_score = run_score_diagnostics(res=ate_result)
rep_score["summary"]

	metric	value	flag
0	se_plugin	1551.167968	NA
1	psi_p99_over_med	36.192430	RED
2	psi_kurtosis	263.310341	RED
3	max_\|t\|_g1	3.951050	YELLOW
4	max_\|t\|_g0	2.472043	YELLOW
5	max_\|t\|_m	1.875749	GREEN
6	oos_tstat_fold	0.000028	GREEN
7	oos_tstat_strict	0.000028	GREEN

We see that psi_p99_over_med and psi_kurtosis are RED. That’s because large tail in outcome. We find no evidence of anomaly score behavior

SUTVA#

print_sutva_questions()

) Are your clients independent (i)?
) Do you measure confounders, treatment, and outcome in the same intervals?
) Do you measure confounders before treatment and outcome after?
) Do you have a consistent label of treatment, such as if a person does not receive a treatment, he has a label 0?

1.) Yes
2.) Yes
3.) No. We have problems with design
4.) Yes
In conclusion cofounders are measured not before treatment. So treatment affected cofounders

Uncofoundedness#

from causalis.refutation.unconfoundedness.uncofoundedness_validation import run_unconfoundedness_diagnostics

rep_uc = run_unconfoundedness_diagnostics(res=ate_result)
rep_uc['summary']

	metric	value	flag
0	balance_max_smd	7.606060e-02	GREEN
1	balance_frac_violations	0.000000e+00	GREEN
2	ess_treated_ratio	3.653986e-01	RED
3	ess_control_ratio	8.939358e-01	GREEN
4	w_tail_ratio_treated	1.218068e+13	RED
5	w_tail_ratio_control	2.482226e+00	GREEN
6	top1_mass_share_treated	2.137807e-01	GREEN
7	top1_mass_share_control	3.937273e-02	GREEN
8	ks_m_treated_vs_control	3.071998e-01	YELLOW
9	pct_m_outside_overlap	9.783157e-01	GREEN

We see w_tail_ratio_treated and ess_treated_ratio are RED. It’s ok. These tests are unstable due to small sample

from causalis.refutation.unconfoundedness.uncofoundedness_validation import (
    sensitivity_analysis, sensitivity_benchmark
)

sensitivity_analysis(ate_result, cf_y=0.01, cf_d=0.01, rho=1.0, level=0.95, use_signed_rr=True)

{'theta': 11250.6922050985,
 'se': 1551.1679676200004,
 'level': 0.95,
 'z': 1.959963984540054,
 'sampling_ci': (8210.458854591107, 14290.925555605892),
 'theta_bounds_confounding': (10298.960958424657, 12202.423451772342),
 'bias_aware_ci': (7258.727607917263, 15242.656802279736),
 'max_bias': 951.7312466738426,
 'sigma2': 3589568447.751854,
 'nu2': 0.37645320639967106,
 'params': {'cf_y': 0.01, 'cf_d': 0.01, 'rho': 1.0, 'use_signed_rr': True}}

Even when we have unobserved cofounder with these parameters our CI bounds > 0

sensitivity_benchmark(ate_result, benchmarking_set =['inc'])

/Users/ioannmartynov/PycharmProjects/Ckit/causalis/data/causaldata.py:108: UserWarning: Found 112 duplicate rows out of 9915 total rows in the DataFrame. This leaves 9803 unique rows for analysis. Duplicate rows may affect the quality of causal inference results. Consider removing duplicates if they are not intentional.
  self._check_duplicate_rows(self.df)

	cf_y	cf_d	rho	theta_long	theta_short	delta
p401	0.000041	0.000004	-1.0	11250.692205	14355.071728	-3104.379523

And even when unobserved cofounder is strong as ‘inc’ - income our estimate has CI bounds > 0

Conclution#

There are problems with design: cofounders are measured not before treatment. So treatment affected cofounders. However estimate is robust and in real life participation in 401k is increasing net financial assets. To keep in mind real CI bounds may differ from our estimation