Theory

ICg-CaST implements a synthetic causal scaffold for AI-integrated carcinogenomics. The core hypothesis is that chemical exposure archetypes perturb Key Characteristics of Carcinogens (KCCs), these perturbations move quantitative adverse-outcome-pathway (qAOP) state variables, the states generate multi-omic readouts, and those readouts are associated with a future synthetic cancer-transition endpoint.

This is a theory-development simulator, not a clinical, regulatory, or chemical safety classifier.

Modeled Layers

The package currently represents five linked layers:

Exposure archetype and dose.
Ten KCC coordinates with archetype-specific priors and stochastic variation.
Host susceptibility factors for repair, antioxidant capacity, immune surveillance, detox balance, and baseline proliferation.
Monthly qAOP-like state trajectories for DNA adducts, ROS, inflammation, epigenetic age, proliferation, mutation rate, clone fraction, driver-count proxy, immune clearance, and latent risk.
Multi-omic readouts including transcript modules, epigenomic modules, mutational signature activities, 96-channel synthetic mutation contexts, and mutation burden.

The event label is generated from the latent state trajectory. Modeling helpers therefore explicitly exclude latent-risk summaries and future endpoint columns from feature sets.

Causal Interpretation

The simulator encodes a proposed direction of mechanism:

exposure -> KCC perturbation -> qAOP states -> omic readouts -> synthetic future transition

The baseline models learn associations from generated features to the synthetic endpoint. Their performance is useful for checking whether the simulated signal is recoverable, but it is not evidence that a real exposure causes cancer.

Counterfactual Checks

Counterfactual tests perturb feature groups that correspond to mechanism-level interventions:

DNA repair rescue.
ROS and inflammation blockade.
Epigenetic memory reset.
Proliferation suppression.
Immune surveillance restore.

These tests are model stress tests. They ask whether a trained model responds in the expected direction when mechanism-linked features are shifted. They do not estimate real treatment effects.