causalkit.refutation.sensitivity_analysis_set#

causalkit.refutation.sensitivity_analysis_set(effect_estimation, benchmarking_set, level=0.95, null_hypothesis=0.0, **kwargs)[source]#

Benchmark a set of observed confounders to assess robustness, mirroring DoubleML’s sensitivity_benchmark for DoubleMLIRM.

Parameters:
  • effect_estimation (Dict[str, Any]) – A dictionary containing the effect estimation results with a fitted DoubleML model under the ‘model’ key.

  • benchmarking_set (Union[str, List[str], List[List[str]]]) – One or multiple names of observed confounders to benchmark (e.g., [“inc”], [“pira”], [“twoearn”]). Accepts: - a single string (benchmarks that single confounder), - a list of strings (interpreted as multiple single-variable benchmarks, each run separately), or - a list of lists/tuples of strings to specify explicit benchmarking groups (each inner list is run once together).

  • level (float, default 0.95) – Confidence level used by the benchmarking procedure.

  • null_hypothesis (float, default 0.0) – The null hypothesis value for the target parameter.

  • **kwargs (Any) – Additional keyword arguments passed through to the underlying DoubleML sensitivity_benchmark method.

Returns:

  • If a single confounder/group is provided, returns the object from a single call to model.sensitivity_benchmark(benchmarking_set=[…]).

  • If multiple confounders/groups are provided, returns a dict mapping each confounder (str) or group (tuple[str, …]) to its corresponding result object.

Return type:

Any

Raises:
  • TypeError – If inputs have invalid types or if the model does not support sensitivity benchmarking.

  • ValueError – If required inputs are missing or invalid (e.g., empty benchmarking_set, invalid level).

  • RuntimeError – If the underlying sensitivity_benchmark call fails.