Coefficient registry

Every numeric coefficient that drives the qAOP dynamics, the latent_risk equation, the chemical archetype tables, and the host susceptibility distributions is declared in materials/coefficient_cards.yaml and loaded through src/icg_cast/coefficients/.

Inline numeric literals in those code sites are forbidden. A pytest spot-check in tests/test_coefficient_registry.py catches the most obvious slips.

Card schema

- name: dynamics.dna_adducts.decay
  default_value: 0.68
  units: "month^-1"
  evidence_level: E4
  effect_direction: 1
  prior_distribution: auto
  prior_params: {}
  source: "starter kit (PLAN.md sections 6 and 7)"
  notes: "first-order monthly persistence of the DNA-adduct burden between exposures"
  last_reviewed: "2026-05-13"

Fields:

Field

Required

Meaning

name

yes

Dotted namespace; must be unique.

default_value

yes

Scalar (float/int), vector (list of numbers), or string label.

effect_direction

no

Net direction of effect on downstream risk: 1 harmful/increases risk, -1 protective/decreases risk, 0 neutral/unknown. Used by MB-CNet sign constraints.

units

no

Free-text units description.

evidence_level

no

One of E1..E5. Default E5 (no source).

prior_distribution

no

One of auto, fixed, normal, lognormal, signed_lognormal, logit_normal, or dirichlet. Default auto.

prior_params

no

Optional sampler overrides such as sigma, low, high, or concentration.

source

no

DOI, dataset name, or "starter kit".

notes

no

Free-text explanation.

last_reviewed

no

ISO date string.

Top-level defaults apply to every card unless the card overrides them.

Evidence levels

Level

Meaning

E1

Published quantitative literature value.

E2

Published qualitative direction or magnitude.

E3

AOP-Wiki / AOP-DB / KER weight-of-evidence.

E4

Expert estimate, plausible biological order of magnitude.

E5

No source (“hand-tuned to produce interesting cohorts”).

A coefficient is flagged “load-bearing” when the sensitivity audit (icg-cast coeffs sensitivity) shows >20% effect on any downstream metric; load-bearing coefficients must reach at least E2.

Coefficient uncertainty

The point value in default_value is the center of the card’s prior. In auto mode, strings, seeds, hard minima, hard maxima, and minimum counts stay fixed; bounded scalar/vector values use logit-normal priors; positive scalars use log-normal priors; signed scalars use signed log-normal priors; and probability vectors use Dirichlet priors. Evidence level controls the default spread: E1 is tightest and E5 is widest.

Use point mode for exact reproducibility against previous demos:

from icg_cast import SimConfig, simulate_cohort

cohort, _ = simulate_cohort(SimConfig(coefficient_mode="point"))

Use prior-sample mode to draw one seedable coefficient realization for the whole cohort:

cohort, _ = simulate_cohort(
    SimConfig(coefficient_mode="prior_sample", coefficient_seed=42)
)

The resulting cohort includes coefficient_seed; point mode writes -1.

Python API

from icg_cast.coefficients import registry

r = registry()
decay = r.get("dynamics.dna_adducts.decay")          # 0.68
kcc   = r.get_vector("archetypes.pah_tobacco_like.kcc")
sig   = r.get_str("archetypes.pah_tobacco_like.signature")

# Audit: find coefficients with no source
unsourced = r.filter(evidence_level="E5")

# Loading a custom YAML (useful for tests or alternate priors)
from icg_cast.coefficients import load_registry
r2 = load_registry("path/to/custom.yaml")

# Seedable prior draw
from icg_cast.coefficients import sampled_registry
r3 = sampled_registry(seed=42)

The default registry is cached at module level; the same object is returned on every registry() call. Set ICG_CAST_COEFFICIENTS_PATH to override the file location for one process.

CLI

# How many cards at each evidence level?
icg-cast coeffs audit

# List every unsourced (E5) coefficient
icg-cast coeffs list --evidence E5

# Inspect a namespace
icg-cast coeffs list --prefix dynamics.latent_risk

# Machine-readable
icg-cast coeffs list --prefix archetypes --json

# Draw one coefficient-prior realization for a simulated cohort
icg-cast simulate --coefficient-mode prior_sample --coefficient-seed 42

# Run the full demo under coefficient uncertainty
icg-cast make-demo --coefficient-mode prior_sample --coefficient-seed 42

coeffs audit is the smallest useful command to run on any branch that touches the registry. It surfaces the registry’s evidence-level distribution and is intended to be wired into review prompts.

Coverage status

Site

Status

simulator.py (qAOP dynamics, susceptibility, cohort sampling)

covered

constants.py (ARCHETYPE_KCC, ARCHETYPE_SIGNATURE)

covered

omics.py (transcriptomic / epigenomic module weights, signature mixing, total-mutation Poisson rate)

covered

signatures.py (toy SBS recipe parameters: background gamma, per-context boosts, lower clip)

covered

coefficient priors (prior_distribution, prior_params, seedable draws)

covered

Current registry breakdown (run icg-cast coeffs audit to refresh):

Namespace

Cards

Notes

dynamics.*

74

qAOP recurrence + latent_risk

susceptibility.*

20

host distribution parameters

archetypes.*

18

8 KCC vectors + 8 signature labels + prior + noise

cohort.*

1

high-risk quantile

omics.transcript.*

51

18 modules × ~3 inputs + measurement noise

omics.epi.*

24

8 modules × ~3 inputs + measurement noise

omics.signature_mix.*

10

aging baseline + primary blend + oxidative blend

omics.mut_total.*

6

Poisson rate intercept + 4 slope terms + min clip

signatures.*

14

background gamma + per-signature boost recipe

total

218

Of these, 9 are E4 (archetype KCC vectors and the mutation-rate scale, where there is at least a literature order of magnitude) and the remaining 209 are E5. Calibration adapters can upgrade the COSMIC-tied signature, ToxCast-tied KCC, AOP-Wiki-tied coupling, and LINCS-tied transcriptomic module coefficients out of E5.

How to edit a coefficient

  1. Edit materials/coefficient_cards.yaml. Update default_value, evidence_level, prior_distribution, prior_params, source, notes, and last_reviewed together.

  2. Run pytest tests/test_coefficient_registry.py -q to verify the schema still parses.

  3. Run the full suite (pytest -q) to confirm determinism-sensitive tests still pass — a coefficient change is expected to change cohort numerics, so deterministic tests may need their reference values updated in the same PR.

  4. Run icg-cast coeffs audit and paste the table into the PR description.