Mechanism-Bottleneck Causal Networks (MB-CNet)
MB-CNet is ICg-CaST’s primary methodological contribution. It is a two-stage model whose risk predictions are forced to flow through a hidden layer pinned to the qAOP latent state vector. The bottleneck pin is what converts the package’s coherence story from a post-hoc evaluation (predict, then ask whether the explanation matches the biology) into a by-construction constraint (any risk prediction must factor through bottleneck units, so do-operations on those units are well-defined interventions on the model itself, not on the input features).
Source: src/icg_cast/bottleneck.py.
Architecture
stage 1: g_phi : omics_features -> hat{qAOP_state} (multi-output regressor)
stage 2: h_theta: hat{qAOP_state} -> risk probability (calibrated classifier)
The default bottleneck units are the nine state_final_* qAOP states
(DEFAULT_BOTTLENECK_UNITS in bottleneck.py):
state_final_DNA_adducts
state_final_ROS
state_final_inflammation
state_final_epigenetic_age
state_final_proliferation
state_final_mutation_rate
state_final_clone_fraction
state_final_driver_count_proxy
state_final_immune_clearance
Stage 1 is a MultiOutputRegressor (default: RandomForestRegressor). Pass a
custom sklearn-compatible stage1_estimator to swap in a stronger latent-state
recoverer while preserving the bottleneck contract:
from sklearn.multioutput import MultiOutputRegressor
from sklearn.ensemble import HistGradientBoostingRegressor
model = MechanismBottleneckClassifier(
stage1_estimator=MultiOutputRegressor(HistGradientBoostingRegressor()),
)
For the default stage-1 pipeline, NaN-valued omics features are imputed with
SimpleImputer(strategy="mean", add_indicator=True). The added indicators make
modality dropout observable to the stage-1 regressor instead of silently
collapsing masked values onto the feature mean. After fitting, call
model.missingness_report() to audit per-feature missing fractions by modality.
Stage 2 has three available kinds:
calibrated_logistic(v0.1 default) — unconstrained logistic regression with isotonic calibration.sign_constrained—SignConstrainedLogisticRegression; coefficients per bottleneck unit are bounded to the half-line implied by coefficient-cardeffect_directionmetadata.STRUCTURAL_SIGNSremains as a derived compatibility export, not the source of truth.sign_constrained_augmented— the same sign constraints, but stage 2 is additionally trained on intervention-implied synthetic samples drawn throughaugment_with_interventionsusing the simulator’sstarter_kit_latent_riskstructural equation. This closes the loophole where the constraint can drive a coefficient to zero without ever responding to the intervention.
The module uses scikit-learn only — no torch, jax, or
pytorch-geometric. The differentiable, end-to-end, neural-ODE / UDE version
is explicitly deferred (PLAN.md §7.4).
Do-operations on bottleneck units
from icg_cast.bottleneck import MechanismBottleneckClassifier
model = MechanismBottleneckClassifier(stage2_kind="sign_constrained_augmented")
model.fit(X_train.join(S_train), y_train)
base = model.predict_proba(X_test)[:, 1]
model.intervene(unit="state_final_DNA_adducts", scale=0.5) # do(DNA_adducts := 0.5 x)
after = model.predict_proba(X_test)[:, 1]
model.clear_interventions()
Because the intervention is applied to the bottleneck row rather than to the input features, the resulting prediction is the model’s analogue of a structural do-operation on the qAOP state. Conformity to the simulator’s DGP-implied direction is what ICg-Bench’s task_intervention_conformity measures.
Survival outcomes
src/icg_cast/survival.py supplies the time-to-threshold variant of the outcome:
time_to_event(trajectory, column="latent_risk", threshold=0.5)— returns(time_index, event_observed), right-censored at the trajectory horizon.add_survival_columns(cohort, trajectories, threshold, horizon)— appendstime_to_high_risk_thresholdandevent_observedto a cohort.restricted_mean_survival(times, events, horizon)— RMST via the step-function Kaplan-Meier integral. Nolifelinesdependency.counterfactual_rmst_difference(model, cohort, intervention, horizon)— bootstrap-CI estimate ofRMST(after) - RMST(before)under a callable intervention on the cohort.
The binary future_cancer_transition_event column from
simulate_cohort is preserved unchanged; the survival columns are purely
additive.
Acceptance criteria
bottleneck_recovery.csvreports per-state recovery R² — see outputs/bottleneck_v0_5/per_state_recovery_r2.csv.Mean recovery R² ≥ 0.60 across the 10 qAOP states on the default cohort.
Intervention-conformity score ≥ 0.85 across the seven
do_*interventions defined incli.py(_INTERVENTIONS).AUROC within 0.03 of the best unconstrained multi-omics baseline.
bottleneck.pydoes not import torch, jax, or any heavy ML dependency.Survival outcomes reproduce the binary event under threshold equivalence; RMST is finite for every archetype on the default cohort.
When a result falls below an acceptance threshold it is reported as
failed_directionality_test (or the corresponding flag), not as a software
failure — see PLAN.md §7.3.
CLI usage
# train one cohort/variant/seed
icg-cast bench run --cohort linear_lowhet --variant sign_constrained_augmented --seed 7
# audit which structural signs actually bind (relax one at a time)
icg-cast bench audit --cohort linear_lowhet --variant sign_constrained --seed 7
# full sweep + figures (delegated to scripts/bottleneck_proof_of_concept.py)
icg-cast bench sweep
icg-cast bench plots
See also docs/benchmark.md for the ICg-Bench scoring tasks that MB-CNet is the reference participant for.