# Coefficient registry Every numeric coefficient that drives the qAOP dynamics, the `latent_risk` equation, the chemical archetype tables, and the host susceptibility distributions is declared in [materials/coefficient_cards.yaml](../materials/coefficient_cards.yaml) and loaded through [src/icg_cast/coefficients/](../src/icg_cast/coefficients/). Inline numeric literals in those code sites are forbidden. A pytest spot-check in [tests/test_coefficient_registry.py](../tests/test_coefficient_registry.py) catches the most obvious slips. ## Card schema ```yaml - name: dynamics.dna_adducts.decay default_value: 0.68 units: "month^-1" evidence_level: E4 effect_direction: 1 prior_distribution: auto prior_params: {} source: "starter kit (PLAN.md sections 6 and 7)" notes: "first-order monthly persistence of the DNA-adduct burden between exposures" last_reviewed: "2026-05-13" ``` Fields: | Field | Required | Meaning | |------------------|:--------:|---------| | `name` | yes | Dotted namespace; must be unique. | | `default_value` | yes | Scalar (float/int), vector (list of numbers), or string label. | | `effect_direction` | no | Net direction of effect on downstream risk: `1` harmful/increases risk, `-1` protective/decreases risk, `0` neutral/unknown. Used by MB-CNet sign constraints. | | `units` | no | Free-text units description. | | `evidence_level` | no | One of `E1`..`E5`. Default `E5` (no source). | | `prior_distribution` | no | One of `auto`, `fixed`, `normal`, `lognormal`, `signed_lognormal`, `logit_normal`, or `dirichlet`. Default `auto`. | | `prior_params` | no | Optional sampler overrides such as `sigma`, `low`, `high`, or `concentration`. | | `source` | no | DOI, dataset name, or `"starter kit"`. | | `notes` | no | Free-text explanation. | | `last_reviewed` | no | ISO date string. | Top-level `defaults` apply to every card unless the card overrides them. ## Evidence levels | Level | Meaning | |-------|---------| | `E1` | Published quantitative literature value. | | `E2` | Published qualitative direction or magnitude. | | `E3` | AOP-Wiki / AOP-DB / KER weight-of-evidence. | | `E4` | Expert estimate, plausible biological order of magnitude. | | `E5` | No source ("hand-tuned to produce interesting cohorts"). | A coefficient is flagged "load-bearing" when the sensitivity audit (`icg-cast coeffs sensitivity`) shows >20% effect on any downstream metric; load-bearing coefficients must reach at least `E2`. ## Coefficient uncertainty The point value in `default_value` is the center of the card's prior. In `auto` mode, strings, seeds, hard minima, hard maxima, and minimum counts stay fixed; bounded scalar/vector values use logit-normal priors; positive scalars use log-normal priors; signed scalars use signed log-normal priors; and probability vectors use Dirichlet priors. Evidence level controls the default spread: `E1` is tightest and `E5` is widest. Use point mode for exact reproducibility against previous demos: ```python from icg_cast import SimConfig, simulate_cohort cohort, _ = simulate_cohort(SimConfig(coefficient_mode="point")) ``` Use prior-sample mode to draw one seedable coefficient realization for the whole cohort: ```python cohort, _ = simulate_cohort( SimConfig(coefficient_mode="prior_sample", coefficient_seed=42) ) ``` The resulting cohort includes `coefficient_seed`; point mode writes `-1`. ## Python API ```python from icg_cast.coefficients import registry r = registry() decay = r.get("dynamics.dna_adducts.decay") # 0.68 kcc = r.get_vector("archetypes.pah_tobacco_like.kcc") sig = r.get_str("archetypes.pah_tobacco_like.signature") # Audit: find coefficients with no source unsourced = r.filter(evidence_level="E5") # Loading a custom YAML (useful for tests or alternate priors) from icg_cast.coefficients import load_registry r2 = load_registry("path/to/custom.yaml") # Seedable prior draw from icg_cast.coefficients import sampled_registry r3 = sampled_registry(seed=42) ``` The default registry is cached at module level; the same object is returned on every `registry()` call. Set `ICG_CAST_COEFFICIENTS_PATH` to override the file location for one process. ## CLI ```bash # How many cards at each evidence level? icg-cast coeffs audit # List every unsourced (E5) coefficient icg-cast coeffs list --evidence E5 # Inspect a namespace icg-cast coeffs list --prefix dynamics.latent_risk # Machine-readable icg-cast coeffs list --prefix archetypes --json # Draw one coefficient-prior realization for a simulated cohort icg-cast simulate --coefficient-mode prior_sample --coefficient-seed 42 # Run the full demo under coefficient uncertainty icg-cast make-demo --coefficient-mode prior_sample --coefficient-seed 42 ``` `coeffs audit` is the smallest useful command to run on any branch that touches the registry. It surfaces the registry's evidence-level distribution and is intended to be wired into review prompts. ## Coverage status | Site | Status | |------|--------| | `simulator.py` (qAOP dynamics, susceptibility, cohort sampling) | covered | | `constants.py` (`ARCHETYPE_KCC`, `ARCHETYPE_SIGNATURE`) | covered | | `omics.py` (transcriptomic / epigenomic module weights, signature mixing, total-mutation Poisson rate) | covered | | `signatures.py` (toy SBS recipe parameters: background gamma, per-context boosts, lower clip) | covered | | coefficient priors (`prior_distribution`, `prior_params`, seedable draws) | covered | Current registry breakdown (run `icg-cast coeffs audit` to refresh): | Namespace | Cards | Notes | |---|---:|---| | `dynamics.*` | 74 | qAOP recurrence + `latent_risk` | | `susceptibility.*` | 20 | host distribution parameters | | `archetypes.*` | 18 | 8 KCC vectors + 8 signature labels + prior + noise | | `cohort.*` | 1 | high-risk quantile | | `omics.transcript.*` | 51 | 18 modules × ~3 inputs + measurement noise | | `omics.epi.*` | 24 | 8 modules × ~3 inputs + measurement noise | | `omics.signature_mix.*` | 10 | aging baseline + primary blend + oxidative blend | | `omics.mut_total.*` | 6 | Poisson rate intercept + 4 slope terms + min clip | | `signatures.*` | 14 | background gamma + per-signature boost recipe | | **total** | **218** | | Of these, **9 are `E4`** (archetype KCC vectors and the mutation-rate scale, where there is at least a literature order of magnitude) and the remaining 209 are `E5`. Calibration adapters can upgrade the COSMIC-tied signature, ToxCast-tied KCC, AOP-Wiki-tied coupling, and LINCS-tied transcriptomic module coefficients out of `E5`. ## How to edit a coefficient 1. Edit [materials/coefficient_cards.yaml](../materials/coefficient_cards.yaml). Update `default_value`, `evidence_level`, `prior_distribution`, `prior_params`, `source`, `notes`, and `last_reviewed` together. 2. Run `pytest tests/test_coefficient_registry.py -q` to verify the schema still parses. 3. Run the full suite (`pytest -q`) to confirm determinism-sensitive tests still pass — a coefficient change is expected to change cohort numerics, so deterministic tests may need their reference values updated in the same PR. 4. Run `icg-cast coeffs audit` and paste the table into the PR description.