Validation

ICg-CaST separates predictive performance from mechanistic coherence. Synthetic AUROC is useful software evidence, but it is not biological validation.

The helpers in src/icg_cast/validation/ group the three families of checks below:

from icg_cast.validation import (
    biological_coherence_score,
    calibration_curve,
    expected_calibration_error,
    human_relevance_transfer_index,
    pathway_attribution_consistency,
)

Predictive Metrics

Baseline training and evaluation report:

ROC AUC.
Average precision.
Brier score.
Event rate.
Mean predicted risk.
Calibration bins and expected calibration error.

validation.calibration adds two leaner entry points:

expected_calibration_error(y, proba, n_bins=10) returns the ECE scalar.
calibration_curve(y, proba, n_bins=10) returns (mean_predicted, observed_fraction, counts) arrays for reliability plots.

Mechanistic Checks

The package includes counterfactual directionality tests for mechanism-linked feature perturbations. A model can score well predictively while failing a directionality test. Such failures are reported as biological-coherence diagnostics, not as software errors.

The biological-coherence score is:

correct_direction_count / tested_intervention_count

validation.biological_coherence provides:

biological_coherence_score(counterfactual_table) returns the scalar directly.
pathway_attribution_consistency(importance, pathway_map) aggregates per-feature permutation importance into per-pathway shares, so feature weight can be inspected at the modality / pathway level.

For by-construction (rather than post-hoc) coherence, see docs/bottleneck.md and the task_intervention_conformity task in docs/benchmark.md.

Cross-Species Human Relevance

validation.cross_species.human_relevance_transfer_index implements the HRTI estimate from PLAN.md §9.4:

HRTI = conserved_human_KE_activation
      / (conserved_human_KE_activation + rodent_specific_KE_activation)

It takes an explicit table with key_event, conservation, human_activation, and rodent_activation columns and returns an HRTIResult with the score, contributing counts, and per-key-event reason strings. It does not wrap a classifier and does not look up KE conservation databases automatically — the caller supplies the conservation labels. This is intentional: HRTI is a transparent ratio, not a regulatory conclusion.

Simulator Sanity Checks

Internal consistency checks should focus on synthetic-world expectations:

Inert controls should usually have lower risk than active archetypes.
Genotoxic archetypes should elevate DNA-damage and mutational features.
ROS archetypes should elevate oxidative and inflammatory features.
Receptor-mediated archetypes should elevate proliferative modules.
Immunosuppressive archetypes should reduce clearance-related features.

These are checks on the simulator assumptions, not real-world claims.

External Validation

Real-data validation is future work and requires local files, provenance, appropriate permissions, and domain review. Human genomic data can be identifying. Controlled-access datasets require the proper approvals before any analysis.