Skip to content

abgnydn/webgpu-dna

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

216 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebGPU Geant4-DNA

CI License: MIT Live demo Geant4-DNA validated Tests DOI

A WebGPU port of Geant4-DNA — the CNRS/IN2P3-coordinated Monte Carlo track-structure toolkit for radiobiology — running entirely in the browser.

One GPU thread per primary electron, full particle history in a single fused compute dispatch, Karamitros 2011 Independent-Reaction-Time chemistry in a Web Worker, and SSB/DSB scoring on a 21×21 B-DNA fiber grid at 10 keV.

WebGPU Geant4-DNA — in-browser Monte Carlo track-structure

Validation numbers live in § Numbers at the bottom of this file. That's the single source of truth.

Quick start

npm install
npm run dev            # http://localhost:8765
npm run test           # 46 tests, ~200 ms
npm run lint
npm run build          # dist/

Requires a WebGPU-capable browser. Shipped on-by-default in Chrome / Edge 113+ desktop, Chrome 121+ Android (Android 12+ on Qualcomm / ARM GPUs), Safari 26+ (macOS Tahoe, iOS / iPadOS / visionOS 26, Sep 2025), Firefox 141+ on Windows, and Firefox 145+ on macOS 26 Tahoe (Apple Silicon only). Firefox Linux, Firefox Android, and older Firefox still need dom.webgpu.enabled in about:config. Full matrix: caniuse.com/webgpu.

Each experiment in §Numbers can be re-run on a contributor's machine via npm run experiments -- <id> (e.g. E5, E10, B1, E15).

What's implemented

  • Physics: Born ionization (5 shells, data-driven CDF sampling), Born excitation (5 water levels, dissociative branching 0.65 / 0.55 / 0.80; matches G4EmDNAPhysics_option2), Champion tabulated elastic (total XS + angular CDF, 7.4 eV – 10 MeV — matches G4EmDNAPhysics_option2, which uses Champion across the whole range), Sanche 9-mode vibrational (2–100 eV), full primary-momentum conservation.
  • Chemistry: Karamitros 2011 9-reaction IRT in a Web Worker (Smoluchowski TDC + Onsager-screened PDC for charged pairs, G4EmDNAChemistry_option1). 2.0 nm mother displacement, species-specific product displacement, e⁻aq thermalization at 1.7 eV, H₂O₂ / OH⁻ tracked as reactive products with full re-pairing.
  • DNA scoring: Event-level direct SSB from rad_buf ionization sites, indirect SSB scored during the IRT timeline (every OH-death event + 1 μs survivors), greedy ±10 bp DSB clustering, kernel-level backbone hit counter as a cross-check.
  • Grid target: 21×21 parallel B-DNA fibers × 3 μm × 150 nm spacing = 3.89 Mbp.
  • Full electron cascade (v0.6.0): the secondary shader tracks the tertiary (gen3+) electron cascade, which resolved the cascade-ion deficit (0.766→0.931×) and closed the chem6 1 µs chemistry gap (RMS 19.7→7.6%) [E20–E25]. v0.7.0 made the excitation parameter-free: it now uses the real Born excitation cross section (matching G4EmDNAPhysics_option2, which both Geant4 oracles register — no physics-list seam), replacing the empirical SIGMA_EXC_SCALE fudge. This closed the chronic sub-keV CSDA deficit (100 eV 0.78→0.96×) [E29]. With RECOMB_BOOST removed in v0.5.0, the pipeline now has no tuning scalars in the track-structure physics or the chemistry. The only calibrated knobs — two SSB-scoring probabilities — live in the DNA-damage layer; the full audited inventory is in TUNABLES.md. See PHYSICS_DIAGNOSIS.md.

Project layout

src/
├── shaders/       WGSL compute shaders (helpers, primary, secondary, chemistry)
├── physics/       Constants, types, DNA geometry, cross-section loader
├── gpu/           Device init, buffers, pipelines, Phase A/B/C dispatch
├── chemistry/     IRT worker wiring, GPU chemistry schedule, reactions
├── scoring/       SSB/DSB scoring, ESTAR reference, dose projections
├── ui/            Results table, canvas dose projections
├── app.ts         runValidation orchestrator
└── main.ts        entry point

tests/unit/        Vitest unit tests (46 across 7 files)
tests/fixtures/    Geant4-DNA reference numbers (JSON)
public/            Generated cross_sections.wgsl, irt-worker.js, monolithic reference HTML
tools/             Python + Node helpers (G4EMLOW converter, IRT driver)
validation/        Geant4-DNA comparison harness (compare.py, analyze_g4.py)

Deep-dive: ARCHITECTURE.md. Standing physics diagnoses: PHYSICS_DIAGNOSIS.md. Research protocol: RESEARCH.md. Engineering standards (the 15-principle canonical discipline shared with the sibling WebGPU/WGSL research projects): RESEARCH_STANDARDS.md. Forward roadmap with multi-agent wall-clock estimates: ROADMAP.md. Recipe for adding a new physics model: EXTENDING.md. Complete tunables inventory (every non-physical scalar, classed source / methodology / calibrated): TUNABLES.md. Design docs for two earlier structural-fix hypotheses, both now superseded: H2OP_TRACKING_DESIGN.md (H₂O⁺ tracking, refuted via Geant4 source archaeology) and CROSS_PRIMARY_IRT_DESIGN.md (cross-primary IRT — built, but E17 found it a coupled tradeoff, not the chem6-gap fix; v0.6.0's full electron cascade closed that gap browser-native instead). How the GPU-free half of validation runs on free infra (GitHub Actions for the IRT chemistry, Oracle Always Free for Geant4): FREE_COMPUTE.md.

Deployment

Production (webgpudna.com) is Cloudflare Pages, deployed manually:

npm run build                                                   # → dist/
wrangler pages deploy dist --project-name=webgpudna --branch=main

The production Pages project is webgpudna (no hyphen) — it owns webgpudna.com. Do not use the webgpu-dna (hyphenated) project; that one is stale and only serves webgpu-dna.pages.dev.

Regenerating cross sections

The committed public/cross_sections.wgsl (1.3 MB) is generated from the G4EMLOW reference data (245 MB, not committed). To rebuild:

# Download G4EMLOW from https://geant4-data.web.cern.ch/datasets/
# (current: G4EMLOW.8.8.tar.gz, shipped with Geant4 11.4.1). Extract so that
# data/g4emlow/dna/ exists, then:
npm run convert

License

MIT for the simulation code. The Geant4-DNA cross-section data is distributed under the Geant4 Software License (BSD-like).


Numbers

This section is the single source of truth for every quantitative claim about the project. Anywhere else (CLAUDE.md, index.html, blog posts, slides) is allowed to summarize but not to introduce new numbers — if a number isn't here, it's not measured.

Every row is backed by a committed JSON artifact under experiments/results/. The [Eᵢ] tag in the right column links to the latest run. Re-run any with npm run experiments -- <id>.

All Geant4-side numbers were produced by a freshly-built Geant4 11.4.1 / G4EMLOW 8.8 install (~/Downloads/geant4-v11.4.1-install/) running dnaphysics on validation/run_validation.mac, single-thread, on the same Apple M2 Pro that ran WebGPU. Production-realistic Geant4 MT-8 comparison ships separately as E15c.

Reference snapshot for the WebGPU side: N = 4096 primaries at 10 keV unless otherwise stated, DNA_Opt2 physics list, 30 μm cube, v0.7.0 full cascade + real Born excitation, no tuning scalars in the track-structure physics (SIGMA_EXC_SCALE removed, RECOMB_BOOST = 1.0), SSB_R_DAMAGE_NM = 0.29, SSB_R_DAMAGE_INDIRECT_NM = 1.0, SSB_P_INDIRECT = 0.05.

The pipeline is parameter-free in RECOMB_BOOST, and v0.6.0 tracks the full electron cascade. RECOMB_BOOST was 2.0 (a tuning scalar with no Geant4 physical basis — the H₂O⁺ refutation); E10r showed it was not load-bearing and the RECOMB→1.0 flip (v0.5.0) removed it. Then v0.6.0 tracks the full tertiary (gen3+) electron cascade — previously the secondary shader absorbed tertiary electrons in place — which resolves the cascade-ion deficit (ions 0.766→0.931×, [E25]) and closes the long-standing chem6 1 µs chemistry gap (5-species RMS 19.7→7.6%; H₂/H₂O₂ deficits closed). v0.6.1 then lowered SIGMA_EXC_SCALE 0.5→0.39 (≈Born) — the full cascade unlocked it, nudging every axis better still (cascade 0.937×, RMS 6.8%, SSB 2.72, E28). The primary track matches Geant4 to 0.1% (195.4 vs 195.6 ionisations/primary, E20) — a statistical match of the per-event means, not literally bit-identical (fp32 WGSL vs fp64 Geant4). README §Numbers, the paper, and the shipped demo report v0.7.0. v0.7.0 then removed the last scalar: the excitation now uses the real Born cross section (matching option2, the physics list both Geant4 oracles register), so the track-structure physics is parameter-free (the DNA-damage scoring layer still has two calibrated probabilities — SSB_P_DIRECT + SSB_P_INDIRECT; full inventory in TUNABLES.md) — see GEANT4_DIVERGENCES.md.

Reproducibility caveat: fp32 atomicAdd reductions on the dose grid and rad_buf counters are not order-deterministic across GPU vendors — same WGSL on different hardware (Apple Metal vs Nvidia Vulkan vs Intel iGPU) yields statistically equivalent results within MC noise, NOT bit-exact. The same machine + same seed + same shader hash IS bit-exact across re-runs. Every artifact emits env.shaderHashes.{helpers,primary,secondary,chemistry}_wgsl (added 2026-05-12) so you can group rows by shader version when the joint-fix scales or other shader-side tunables shift the baseline.

Citing this work: see CITATION.cff. The current release is v0.7.0 — real Born excitation, parameter-free track structure (GitHub Release). Cite the Zenodo concept DOI 10.5281/zenodo.20506339, which always resolves to the latest version; per-version DOIs (e.g. the v0.6.0 version DOI 10.5281/zenodo.20606566) are listed on the Zenodo record.

Where we deliberately differ from Geant4-DNA DNA_Opt2 (Emfietzoglou excitation, the σ_exc/recomb tuning knobs, per-primary IRT, fp32 atomics, fiber-grid geometry) — with the rationale and measured cost of each — is catalogued in GEANT4_DIVERGENCES.md. Every cost figure there links back to its row in this section.

Reproducibility tiers

The Repro column rates how far a third party can reproduce each row's comparison (not merely re-run the WebGPU side). This separates three things that otherwise blend under one authoritative format — "measured and reproducible", "measured once on the author's machine", and "calibrated to the target":

  • T1 — repo-reproducible. Runs from committed files plus the freely downloadable G4EMLOW dataset. The reference is committed (validation/*.csv, tests/fixtures/); no GPU and no unpublished data needed. (Cross-section levels; the CSDA/MFP rows whose Geant4 reference CSV is committed.)
  • T2 — Release-asset reproducible. Needs the WebGPU rad_buf dumps, published as GitHub Release assets (validation-inputs-v1), plus committed/hardcoded reference numbers. GPU-free — the IRT chemistry runs on CPU, which is why the CI workflow can run it. (Most chemistry and DNA-damage rows.)
  • T3 — author-local. Needs something not in the repo or the releases: the 6.76 GB Geant4 dna.root ntuple (per-trackID splits), a local Geant4 build's wall-clock, or a live run on specific GPU/browser hardware. (Performance rows; the multi-energy GPU sweeps; ntuple-split rows.) Note: for the ntuple rows the reproducer is committedvalidation/run_validation.mac generates the ntuple and validation/analyze_g4.py extracts the per-trackID split — so they are reproducible by re-running Geant4; only the 6.76 GB bulk file is unpublished (the derived numbers themselves live in each row's artifact JSON).

Every artifact was produced on one machine (Ahmets-MBP.localdomain, Apple M2 Pro); T1/T2 are what someone else could reproduce today. ⚠ A separate axis the tier does not capture: some rows are calibrated fits, not predictions (notably the L5 SSB ratio — P_indirect is tuned to PARTRAC's band). Those are flagged in-row.

GPU coverage in CI: the gating ci.yml (typecheck + lint + vitest) is GPU-free, so it never executed the WGSL physics — the long-standing gap behind these tiers. A webgpu-smoke job addresses it by running the real shipped shaders on a software WebGPU adapter (Mesa lavapipe / SwiftShader in headless Chromium): it compiles the production primary/secondary/chemistry bundles (assembled exactly as src/shaders/loader.ts) and runs a compute dispatch, so once active a WGSL regression vitest cannot see fails CI. ⚠ The workflow is currently staged in ci/webgpu-smoke.yml pending activation — the authoring token lacks the GitHub workflow scope, so it must be moved into .github/workflows/ by a maintainer (see ci/README.md). It is a smoke test, not a physics re-validation (a software adapter at small N is too noisy for the marquee ratios), and ships non-blocking (continue-on-error) until its first green run on a real runner. See experiments/tools/webgpu-smoke.mjs.

Level 0 — Environment / infrastructure (2 of 2 pass)

ID Status Repro Result Artifact
B0 T3 Browser env capture: apple/metal-3 adapter, headless Chromium, maxBuffer 4 GB B0
B1 T3 Harness liveness: Vite + Playwright + WebGPU, first row at E=100 eV in 2.9 s B1

Level 1 — Cross sections vs G4EMLOW 8.8 (9 of 9 pass)

ID Status Repro Result Artifact
E1 T1 Born σ_ion total: 58 rows, peak ratio 0.9987, median 8.46e-4 E1
E1b T1 Per-shell Born σ_ion (5 shells: 1b₁, 3a₁, 1b₂, 2a₁, 1a₁), all peak ratios 0.997-1.000 E1b
E1c T1 Shell-fraction closure Σ XSF_i = 1.0 within 5e-3 across 96/96 active energy bins E1c
E2 T1 Emfietzoglou σ_exc total: 74 rows, peak 0.9970, median 2.42e-4 E2
E2b T1 Per-level σ_exc (5 levels: A¹B₁, B¹A₁, Ryd A+B, Ryd C+D, Diffuse), all 0.997-1.000 E2b
E3 T1 Champion σ_el: 58 rows, peak 0.9751, max 3.26e-3 (retroactive 334× scale-factor catcher) E3
E3b T1 Champion angular CDF (XAC inverted lookup), 25/25 energies within |Δcos(θ)| < 0.10 (~6° accuracy) E3b
E4 T1 Sanche σ_vib total: 38 rows, peak 1.0000, max 6e-16 (bit-exact) E4
E4b T1 Sanche per-mode XVMF: 342 (energy, mode) pairs, max sum-dev 4e-8 E4b

Level 2 — Track structure (3 pass / 2 honest-negative / 1 partial)

ID Status Repro Result Artifact
E5 T1 CSDA @ 10 keV: 2714.4 vs 2747.5 nm Geant4 → 0.988× (3.59σ), energy conservation 100.0% E5
E5b ✗ honest negative (pre joint-fix baseline) T3 CSDA across all 8 ESTAR energies, PRE joint-fix — ratio grows monotonically: 0.587× @ 100 eV → 0.992× @ 20 keV (0.705 / 0.776 / 0.864 / 0.965 / 0.975 / 0.988 / 0.992× at 300/500/1000/3000/5000/10000/20000 eV). The 0.988× @ 10 keV in E5 is the tail of a much larger sub-keV deficit driven by σ_exc inflation. Joint fix closure measured in E5d. E5b
E5c ✗ honest negative T2 W-value vs ICRU 31 (NEW 2026-05-12) — Pre joint-fix: W_cascade = 26.89 eV vs ICRU 31's 21.4 eV → 1.257× (+25.7%). Post joint-fix (corrected H3O+ + H2-marker): 29.02 eV → 1.356× (+35.6%). Joint fix slightly increases W because RECOMB_BOOST reduces cascade-ion count — see E7b for the structural tradeoff. E5c
E5d ✓ pass — marquee closure T3 POST joint-fix CSDA at all 8 ESTAR energies (NEW 2026-05-12)8 of 8 energies improved monotonically: 100 eV 0.588× → 0.736× (+14.8 pp); 300 eV 0.705× → 0.810×; 500 eV 0.776× → 0.857×; 1 keV 0.864× → 0.912×; 3 keV → 0.983×; 5 keV → 0.984×; 10 keV → 0.994×; 20 keV → 0.996×. The lift is inversely proportional to the original deficit size — the cleanest possible signature of a correct physics fix. E5d
E6c ✓ pass T2 Effective σ-per-process under joint fix — σ_exc effective ratio 2.55× → 1.27× Geant4 (inside [1.0, 1.5] target band), driven by SIGMA_EXC_SCALE = 0.5. σ_ion +6.1% and σ_el +5.7% data tables unchanged. The 8/8 CSDA lift in E5d is the integrated empirical signature of this σ_exc shift. E6c
E7b ✗ honest negative (superseded by E7d) T2 Cascade ions @ RECOMB=2.0 — H3O+-corrected 344.6 / 0.677× (the joint fix's RECOMB_BOOST=2.0 was reducing cascade ions by destroying autoionisation ions). This surfaced the mechanism; E7d (RECOMB→1.0) recovered it to 389.9 / 0.766× (the v0.5.0 value — superseded by v0.6.0's full cascade, 474.0 / 0.931×, E25). E7b
E7c ✗ honest negative (asymmetric variant refuted) T3 Asymmetric RECOMB_BOOST attempt — applied RECOMB_BOOST=2.0 ONLY to sub-cutoff and autoionization branches (not tracked-secondary). Rationale: tracked-sec eaq thermalizes 5-10 nm from H2O+ where time-integrated recomb adds little. Result: chemistry reverts close to baseline. Cascade ions: 381.1 (✓ recovered, vs pre-fix 371.9). RMS dev vs chem6: 27.9% (was 19.0% in v1 — chemistry benefit LOST). The tracked-secondary path is the dominant lever for BOTH cascade AND chemistry effects — they're not separable with this knob set. Production shaders kept at v1 (uniform boost) because the chemistry-vs-chem6 closure is the project's marquee thesis. E7c
E6 T1 MFP across 6 energy bins: ratios [0.893, 0.950], median 0.941 (-5.0% to -10.7%) E6
E6b T2 Per-process σ: σ_ion +6.1%, σ_el +5.7%, σ_exc 2.55× (intentional Emfietzoglou inflation) E6b
E7 ✗ honest negative (pre-joint-fix) T2 Cascade ions per primary reconstructed from rad_buf H3O+: WGSL 371.9 vs Geant4 509.2 → 0.730× (263σ, 27% deficit) — real physics gap, tied to σ_exc inflation channeling energy away from ionization. This is the pre-joint-fix value; production (post-fix) is 344.6 / 0.677×, see E7b. E7
E8 partial pass (7/8) T3 Secondary KE spectrum at creation: sec/primary WGSL 143.4 vs G4 144.9 (1.0% match). 7/8 log-bins in 6-800 eV agree within 0.1-3.1%; only 438-806 eV tail shows 43% deficit (~2.5σ) E8

Level 3 — Pre-chemistry (1 of 1 honest negative)

ID Status Repro Result Artifact
E9 ✗ honest negative T2 Pre-chem G(species) @ 0.1 ps vs Geant4 chem6 at matched 10 keV: OH 0.87× / eaq 0.90× / H 0.88× / H₂ 0.51× / H₂O₂ 0.58×. Localizes the E10c 1 μs deficit to pre-chemistry, NOT IRT reaction rates. See PHYSICS_DIAGNOSIS.md §1. E9

Level 4 — Chemistry (IRT)

ID Status Repro Result Artifact
E10 T2 IRT G-values vs Karamitros 2011 across 5 energies — surfaces G(e⁻aq) V-shape at 1→3 keV (1.163→1.026→1.147, 11.8% drop, real track-end / spur-structure physics). POST joint-fix (2026-05-13): 25/25 rows pass; V-shape preserved (OH ✓, eaq ✓ monotonic on either side). E10
E10b T2 V-shape robustness via primary-bootstrap (B=20 unique-pids resamples, m/n corrected SE): the 12.5% drop at 1→3 keV is far above the bootstrap SE of the mean (≈8e-4 over 4096 primaries; z≈126). ⚠ z is the statistical precision of the mean only — it excludes all systematic uncertainty (cross-section tables, IRT model, displacement σ, fp32 atomics), so it is not a 126σ physical-significance claim. The evidence the dip is real physics is that chem6 independently reproduces it (E10d), not the z. E10b
E10c ✗ honest negative T2 G(species) @ 1 μs vs Geant4 chem6 at matched 10 keV: OH 0.91× / eaq 0.83× / H 1.00× / H₂ 0.75× / H₂O₂ 0.71×. Closes "is the 0.62× vs Karamitros real LET physics or our chemistry bug?" — answer is both (~30% real LET + ~10-29% real implementation gap, biggest on H₂/H₂O₂) E10c
E10d partial pass (24/25) T2 chem6 matched-LET sweep across 5 V-shape energies (1/3/5/10/20 keV): 24 of 25 species×energy cells in 30% band. chem6 independently reproduces the V-shape (1.36 → 1.26 → 1.41 from 1 to 5 keV) — confirms it's real LET physics. POST joint-fix (2026-05-13): same 24/25, V-shape preserved (1.37 → 1.26 → 1.39) — joint fix doesn't break the LET-physics signal. E10d
E10e ✗ refuted T2 Cross-event recomb hypothesis: synthetic Node experiment over rad_E10000_N4096.bin shows nearest-eaq P_recomb = 0.230 vs geminate point-estimate 0.221 (ΔP = +0.009). Only +0.44 H₂/primary vs target deficit of 12.4 — 3.5% of the gap. Geminate eaq is the nearest one in ~98% of cases at 10 keV. E10e
E10f ✗ interpretation superseded (v0.6.0) T2 Per-primary IRT partitioning: at 1 μs ΔG(H₂) = +0.149 (looked like 96% of the gap, H₂-only). Superseded: E17 showed cross-primary pooling is a coupled H₂↑/OH↓ tradeoff (not the cause), and v0.6.0 showed the gap was the untracked tertiary cascade, closed browser-native (E25). E10f
E10g ✓ noisy / informational T2 Recomb-rate sensitivity sweep: linearly interpolating gives x ≈ 0.035 closes G(H₂)@0.1ps. Maps to ~25% additional effective recomb fraction (per Geant4's 13.65% H₂Ovib branching). E10g
E10h ✗ noisy T2 Recomb boost with proper H₂Ovib branching alone: best X=0.15 reduces RMS dev 30% → 22% but G(eaq) drops to 0.77× (WORSE than baseline 0.90×). Recomb boost is necessary but not sufficient — closing all 5 species needs a joint fix. E10h
E10i ✗ noisy (partial closure) T3 Joint fix end-to-end Playwright validation: (σ_exc_scale = 0.5, recomb_boost = 2.0) lifts RMS dev 30.3% → 19.0%, CSDA @ 100 eV 0.587× → 0.74×, G(H₂) 0.51× → 0.78×. G(H), G(H₂O₂) close; G(OH)/G(eaq) take 5-9% collateral damage. Two-knob structural limit. E10i
E10j ⚠ noisy (audit closure) T2 POST joint-fix G-values at 1 μs vs chem6 — closes the audit gap where the prior §Numbers row mixed pre-fix and post-fix numbers. Result: G(OH) 0.895× (was 0.907×), G(eaq) 0.815× (was 0.830×), G(H) 1.096× (was 0.992× — joint fix overshoots H slightly), G(H₂O₂) 0.693× (was 0.711×), G(H₂) 0.860× (was 0.752× — big improvement). Per-primary IRT partitioning still dominates the 1 μs gap. E10j
E11 ✗ honest negative T3 GPU chem backend vs IRT worker on the same rad bin: GPU matches within 5% at t ≤ 100 ps; diverges upward at 1 μs (G(OH) 2.33× IRT, G(eaq) 2.19×). GPU is 13.6× faster (14.2 s vs 194 s) but inaccurate at long times — quantifies why DEFAULT_CHEM_BACKEND = 'worker'. E11
E10r ✓ informative — RECOMB_BOOST is not load-bearing T2 Parameter-free chemistry (RECOMB_BOOST 2.0→1.0) @ 1 μs vs chem6: G(OH) 0.914×, G(eaq) 0.858×, G(H) 0.928× (the 1.096× overshoot disappears), G(H₂) 0.741×, G(H₂O₂) 0.693×. 5-species RMS @1μs 19.7% at this stage (pre-cascade). The knob mainly propped up H₂; removing it left the H₂/H₂O₂ deficits, which v0.6.0's full cascade later closed (RMS → 7.6%, E25). E10r

Level 5 — DNA damage (3 pass / 1 fail closed)

ID Status Repro Result Artifact
E12 ✓ (absolute yield explained by E12-local) T2 SSB/DSB vs experiment-calibrated cellular yields (~35 DSB, ~1000 SSB per cell·Gy, low-LET [Ward 1988]). The raw 223×/796× over-yield is a point-source dose-normalisation artifact, not a physics error: see E12-local. DSB/SSB = 0.083 (2.4–3.6× experiment's 0.023–0.035) is the one residual — that ratio is the tuned-P_indirect issue (E13c), unaffected by dose. E12
E12-local ✓ geometry defense vindicated (offline re-measurement, 2026-06-03) T2 The validation dumps use a point source (primary.wgsl start_half=0), so 98.1% of energy deposits in the central 3 µm fibre-core cube (measured from the rad_buf dump) → local dose ≈238 Gy, not the 0.243 Gy box average (concentration factor C≈981). Re-normalised by local dose, absolute yields land at SSB_dir 0.34×, DSB 0.82×, SSB_total 1.28× of experiment — within a factor of ~3, not 2–3 orders. Resolves E12's absolute-yield gap. Caveat: energy∝event-count proxy (98% of both events and ions in-core, so robust); a cleaner E12-bulk would spread tracks (start_half=box) so box-avg ≈ local and no C-correction is needed. E12-local
E13 ✗ initial fail T2 Indirect/direct SSB ratio: WGSL 0/24 = 0 vs PARTRAC 2-3. Diagnosis in PHYSICS_DIAGNOSIS.md §3 (3 causes, 3 fixes) E13
E13b T2 Parametric SSB_R_DAMAGE_NM sweep (Node-side replica of scoreIndirectSSB over existing rad_buf): r=0.29 → SSB_ind=8; r=1.0 → 174; r=2.0 → 394. Confirms 0.29 nm is the bottleneck E13b
E13c ⚠ calibrated fit (not a prediction; was mislabeled "marquee closure") T2 (but a calibrated fit) The indirect/direct ratio is reach × tuned probability, not a physics prediction: SSB_dir=26=⌊173×0.15⌋, SSB_ind=64≈1423×0.05. SSB_P_INDIRECT was tuned 0.4 → 0.05 specifically to land the ratio in PARTRAC's 2–3 band — so the 2.46 is circular, and PARTRAC is itself a simulation. What L5 does show: the clustering kernel discriminates strand-0/strand-1 coincidences PARTRAC-like. What it does not show: an independent prediction of the ratio or of absolute yields (see E12). Post joint-fix: SSB_dir=26, SSB_ind=64, DSB=9, ratio=2.46. E13c

Level 6 — Performance (3 pass / 2 honest-negative)

ID Status Repro Result Artifact
E15 ✗ honest negative T3 Phase A α/β decomposition via WebGPU timestamp-disciplined N-sweep: α = 10.5 ms (single-workgroup compute floor — original 10-500 μs hypothesis falsified), β = 1.207 μs/primary, R² = 0.908. Peak throughput 538,947 primaries/sec @ N=16384, 10 keV on apple/metal-3 E15
E15b ✓ (v0.5.0 truncated; see E15d) T3 Same-machine vs Geant4 11.4.1 single-thread (3 trials, M2 Pro): 455× physics tracking (Phase A+B 635 ms vs Geant4 median 289.1 s) — but this compared our v0.5.0 truncated cascade to Geant4's full cascade; v0.6.0's full cascade is ~241× (fair, [E15d]). WGSL is dispatch-only vs G4 whole-process; init is a measured ~2 s; the real asymmetry is G4's 6.8 GB ntuple I/O. End-to-end like-for-like is 1.48× (IRT chem on CPU dominates). E15b
E15c T3 Production-realistic: WGSL vs Geant4 MT-8 (3 trials, M2 Pro 8 threads). Geant4 MT-8 median 178.0 s → 280× speedup vs WGSL Phase A+B. Geant4's MT scaling is only 1.6× over ST (well below theoretical 8×) due to per-event scheduling + memory contention E15c
E15d T3 Phase A α/β + peak throughput across all 8 ESTAR energies: β scales monotonically 0.23 → 2.05 μs/primary from 100 eV to 20 keV; peak throughput 2.1M → 0.29M primaries/sec E15d
E16 ✗ honest negative T3 Kernel-fusion thesis closure: T_fused = 17.75 ms vs modeled T_naive = 414 × 1.70 = 704 ms → 40× speedup. L6 protocol's "≥100×" hypothesis falsified at the measured magnitude — the thesis is supported in spirit (40× is substantial, consistent with kernelfusion.dev's 71× Apple Silicon benchmark) but absolute factor is half the protocol claim E16

Headline summary @ 10 keV, N=4096

After all 2026-05-12 fixes (L5 indirect SSB closure, joint physics tuning):

Metric This build Reference Ratio
CSDA range (nm) @ 10 keV (v0.7.0 Born) 2739.6 2747.5 (Geant4 11.4.1) 0.997× [E5]
CSDA @ 100 eV (vs Geant4) — v0.7.0 Born excitation 25.1 nm 26.21 nm 0.956× (was 0.782× @ scaled-Emf; real Born XS closes the sub-keV deficit, E29) [E5d]
CSDA @ 300 eV — v0.7.0 Born 35.4 nm 35.91 nm 0.986× (was 0.852×, E29) [E5d]
CSDA @ 500 eV — v0.7.0 Born 47.8 nm 48.07 nm 0.994× (was 0.894×, E29) [E5d]
CSDA @ 1 keV — v0.7.0 Born 89.2 nm 90.32 nm 0.987× (was 0.933×, E29). 3/5/20 keV: 1.002/1.005/0.993× — all 8 energies now 0.956–1.005× [E5d]
Energy conservation 100.0 % 99.99 % 1.000× [E5]
Ions / primary (full cascade) — production (v0.7.0, Born excitation) 479.6 509.2 (Geant4) 0.942× [E29] — full tertiary cascade ([E25], recovered from 0.766×) under real Born excitation (v0.7.0; supersedes the v0.6.1 σ_exc→0.39 step). Primary-track ionisations match to 0.1% (195.4 vs Geant4 195.6, [E20]) — a statistical match of the per-event means, not bit-identical
G(OH) @ 1 μs vs chem6 — pre joint-fix 1.551 1.710 0.907× (4.8σ) [E10c]
G(OH) @ 1 μs vs chem6 — production (v0.7.0, Born) 1.594 1.710 0.932× [E25] (was 0.914× pre-cascade)
G(e⁻aq) @ 1 μs vs chem6 — pre joint-fix 1.406 1.694 0.830× (9.7σ) [E10c]
G(e⁻aq) @ 1 μs vs chem6 — production (v0.7.0, Born) 1.584 1.694 0.937× [E25] (was 0.858× pre-cascade)
G(H) @ 1 μs vs chem6 — pre joint-fix 0.708 0.710 0.997× ✓ [E10c]
G(H) @ 1 μs vs chem6 — production (v0.7.0, Born) 0.666 0.710 0.939× (overshoot gone) [E25] (slight overshoot; was 0.928× pre-cascade)
G(H₂) @ 1 μs vs chem6 — production (v0.7.0, Born) 0.604 0.622 0.970× [E25] — the long-standing H₂ deficit (0.741×) is closed by the tertiary cascade
G(H₂O₂) @ 1 μs vs chem6 — production (v0.7.0, Born) 0.760 0.850 0.894× (5-species RMS 7.0% — vs a single chem6 run; the reference is point-values with no stated MC uncertainty, so read 7.0% as agreement to one realization) [E25] — the H₂O₂ deficit (0.693×) largely closed
Implicit W-value (E_total / N_ions, full cascade) ~22.1 eV (v0.6.0) 21.4 eV (ICRU 31, low-LET liquid water) ~1.03× — the tertiary cascade recovers the missing ions, closing most of the old 1.257× gap (same physics as the cascade-ion recovery, [E25]) [E5c]
G(H₂) @ 0.1 ps (pre-chem, joint fix applied) 0.197 0.251 (chem6) 0.78× (was 0.51× pre-fix) [E10i]
G(H₂O₂) @ 0.1 ps (joint fix applied) 0.041 0.053 (chem6) 0.77× (was 0.58×) [E10i]
RMS deviation across 5 species @ 0.1 ps 19.0 % (vs chem6) down from 30.3 % baseline [E10i]
G(e⁻aq) V-shape drop 1→3 keV 12.5 % 0 (smooth-monotonic null) robust to primary resampling (bootstrap z≈126 = precision of the mean only, not systematic significance); independently reproduced by chem6 (E10d) [E10b]
SSB direct / indirect / DSB @ 10 keV (production, v0.6.0 full cascade) 32 / 81 / 17 indirect/direct ratio PARTRAC = 2-3 3.26 ratio (v0.7.0 Born physics; the calibrated P_indirect was tuned for the prior physics so the ratio drifted out of band — reported honestly, not re-tuned. This is the acknowledged 'calibrated fit' caveat in action [E29]). ⚠ Treat SSB/DSB as methodology, not validated absolute physics: the scoring layer has two calibrated probabilities (SSB_P_DIRECT=0.15 and SSB_P_INDIRECT=0.05, TUNABLES.md), so the ratio is a calibrated fit, not a prediction — but it is robust to the target geometry: a 4× fibre-spacing sweep (75→300 nm) holds the ratio at 2.24–2.53, all in-band, while absolute counts scale ~4× [E27]. Absolute yields per local dose: SSB 0.34–1.28× / DSB 0.82× exp (C=991 exact, [E12-local-exact]) [E25] [E12-local-exact]
Phase A wall-clock @ N=4096, 10 keV 14.4 ms n/a [E15]
Phase A peak throughput 538,947 primaries/sec @ N=16384 n/a [E15]
Phase A + B vs Geant4 11.4.1 single-thread (v0.6.0 full cascade) ~1.2 s 289.1 s (median/3) ~241× — now a fair both-full-cascade comparison (the old 455× compared our truncated cascade to Geant4's full one). Tracking the full cascade roughly doubled Phase A+B (635 ms→~1.2 s). This is the tracking-only figure (WGSL dispatch vs G4 whole-process incl. its 6.8 GB ntuple I/O); the honest end-to-end like-for-like is 1.48× — two rows down. [E15d]
Phase A + B vs Geant4 MT-8 (v0.6.0 full cascade) ~1.2 s 178.0 s (median/3) ~148× (was 280× truncated). MT-8 scales only 1.62× vs ST [E15d] [E15c]
Geant4 init + DNA table-build (E15-fair) 2.1 s (16-primary probe = 3.2 s wall) retracts the earlier "~160 s serial / ~200×" estimate — init is negligible, the 289 s is ~99% event-loop [E15-fair]
End-to-end pre-DNA pipeline vs Geant4 ST 194.6 s 289.1 s 1.48× — the honest like-for-like number (both whole-pipeline; IRT chem on CPU dominates) [E15b]
Kernel-fusion speedup (fused vs naive, Phase A only) 17.75 ms 704 ms (modeled) 40× — ⚠ applies to Phase A only (now ~1.2% of the ~1.2 s v0.6.0 cascade pipeline; was 2% of 635 ms); fusion's contribution to the full tracking pipeline is ~1.6–2× — Phase A is unchanged by the cascade, which is all Phase B [E16]
Unit tests 46 / 46 npm run test, ~200 ms

Substantive research findings

Each is a falsifiable claim only visible because of the protocol — not from reading the code:

  1. CSDA deficit was energy-dependent — 0.587× @ 100 eV → 0.992× @ 20 keV pre-fix; closed monotonically by the joint fix. Joint-fix shifts: 100 eV +14.8 pp / 300 eV +10.5 pp / 500 eV +8.1 pp / 1 keV +4.8 pp / high-E ~+0.5 pp. The lift is inversely proportional to the original deficit size — exactly what σ_exc-inflation theory predicts, confirming the diagnosis. [E5, E5b, E5d]
  2. G(e⁻aq) is non-monotonic between 1 and 3 keV — a 12.5% drop (1.163 → 1.026 → 1.147) that is far above the primary-resampling noise of the mean (bootstrap z≈126 — precision of the mean only, not a systematic-inclusive significance) and, more importantly, is independently reproduced by chem6 (the actual evidence it is real track-end / spur-structure physics). [E10, E10b, E10d]
  3. MFP is consistently 5-11% lower than Geant4 across all 6 energy bins (median 0.941). [E6]
  4. σ_ion is 6.1% high and σ_el is 5.7% high vs Geant4 11.4.1. Per E6b decomposition, the MFP shortfall is ~49% from σ_ion, ~31% from σ_el, ~20% from intentional σ_exc inflation. [E6b]
  5. The cascade-ion deficit is RESOLVED in v0.6.0 by tracking the full electron cascade — and it was a clean win on every axis. The primary track matches Geant4 to 0.1% (195.4 ionisations/primary vs Geant4 195.6, by trackID in the 6.8 GB ntuple — E20) — a statistical match of the per-trackID means, not bit-identical (fp32 vs fp64). The old 23% deficit was 80% the untracked tertiary (gen3+) cascade: our secondary shader absorbed tertiary electrons in place rather than tracking them (E21). Tracking them recovers cascade ions 0.766→0.931× and improves chemistry (RMS vs chem6 19.7→7.6%, closing the long-standing H₂/H₂O₂ gap) with SSB holding in-band — a clean win (E25). v0.6.1 then lowered σ_exc 0.5→0.39, which the full cascade unlocked: every axis improved again (cascade 0.937×, RMS 6.8%, E28). The investigation also caught a normalization bug in my own analysis (E22–E24 chased a phantom "over-recombination" that was a n_therm units error; corrected in E25) — verify-before-asserting rescuing a real result. [E20, E21, E25]
  6. WebGPU tracking is ~241× faster than Geant4 11.4.1 single-thread (v0.6.0, full cascade); the honest like-for-like figure is 1.48× end-to-end. Update (v0.6.0): tracking the full electron cascade roughly doubled Phase A+B (635 ms→~1.2 s, E15d), so the tracking speedup is now ~241×/~148× — lower than the v0.5.0 455×/280×, but a fair both-full-cascade comparison (the old figure compared our truncated cascade to Geant4's full one). The end-to-end 1.48× and the methodology notes below are unaffected. Two earlier corrections, the first of which I over-corrected once and then measured: (a) The init confound is NOT material — measured. E15-fair: a 16-primary init-probe runs in 3.2 s, so Geant4 process init + DNA physics-table construction is only ~2.1 s (0.7% of the 289 s) — the 289 s is ~99% genuine event-loop. (An earlier draft of this note claimed ~160 s of serial overhead / ~200× pure-tracking from a 2-point Amdahl read of the MT-8 1.62× scaling; that estimate was wrong and is retracted — init is negligible, so event-loop-only the speedup is 452×, statistically the same as 455×.) The one real residual asymmetry is per-event ntuple I/O: Geant4 writes 6.8 GB to dna.root for the full run — measured (16→256-primary probes give 1.65 MB/primary, ~0.1 MB fixed, near-perfectly linear), and the likely cause of the poor MT-8 scaling via row-wise merge — while the WGSL 635 ms excludes its ~87 MB dump write. (A 256-primary run also lands at 19.7 s vs the model's 20.0 s, independently confirming the ~2 s init + 0.070 s/primary.) A no-ntuple Geant4 build (E15-fairer) would isolate how much wall-time the I/O adds; so 452× is a mild over-estimate of a compute-only comparison, but nowhere near as low as the retracted ~200×. (b) Kernel fusion contributes ~2× to the pipeline, NOT 40×. The fused phase (Phase A) is 14.4 ms = 2% of the 635 ms; Phase B is an un-fused 2000-dispatch wavefront (620 ms). The earlier "455× = 10× GPU × 40× fusion (multiplicative)" claim was wrong — you cannot multiply a Phase-A-only 40× through a 98%-un-fused pipeline. Fused-vs-naive on the same GPU is 1324 ms → 635 ms = 2.08× (E16's 40× is Phase-A-only). [E15b, E15c, E16, E15-fair]
  7. The G(OH) deficit vs Karamitros 2011 confounds two effects: ~70% is real LET physics (chem6 reproduces the same trend); ~30% is a real WGSL-vs-chem6 implementation gap. G(H₂)/G(H₂O₂) are the biggest implementation gaps. [E10c, E10d]
  8. The chem6 1 µs gap was the untracked tertiary electron cascade — closed in v0.6.0 (E20–E25), superseding the earlier "inter-track partitioning" attribution. E10f measured that cross-primary pooling adds ΔG(H₂)=+0.149 and read it as "96% of the 1 μs gap"; that was looking at H₂ alone. E17 later showed cross-primary pooling is a coupled tradeoff (it boosts H₂ but over-recombines OH/eaq — no density matches chem6), and v0.6.0 showed the real cause was the untracked gen3+ cascade: tracking it closes the gap browser-native (RMS 19.7→7.6%, H₂ 0.74→0.99×). The chem6 gap did not require the native runtime. [E10f, E17, E20, E21, E25]
  9. L5: absolute yields validate to within ~3× of experiment once normalised by local dose; the DSB/SSB ratio is a calibrated fit. Two separable claims, both measured. (a) Absolute yields — vindicated. The 223×/796× box-normalised over-yield is a point-source dose artifact: 99.1% of energy deposits in the central 3 µm fibre core (exact voxel dose, C=991, local dose 241 Gy [E12-local-exact]), not the 0.243 Gy box average. Re-normalised, SSB_dir 0.34× / DSB 0.82× / SSB_total 1.28× experiment (Ward 1988) — within a factor of ~3. (b) DSB/SSB ratio — calibrated, but stable. The indirect/direct ratio (2.32 parameter-free; was 2.46 @ RECOMB=2.0) is reach × tuned probability with P_indirect tuned to PARTRAC's 2–3 band. It is calibrated, not predicted — but it held in-band across the RECOMB→1.0 flip with no recalibration [E13d], so it's at least robust to that change. [E12-local-exact, E13d]

Ongoing physics work

Documented in PHYSICS_DIAGNOSIS.md. The major gaps are closed: the cascade-ion deficit and the chem6 1 µs H₂/H₂O₂ deficit (v0.6.0, full cascade), and the chronic sub-keV CSDA deficit (v0.7.0, real Born excitation — 100 eV 0.78→0.96×, all 8 energies now 0.956–1.005×). The track-structure physics is now parameter-free (RECOMB_BOOST and SIGMA_EXC_SCALE both gone).

Validated envelope (be explicit about scope): the comparisons here are for electrons, 100 eV – 20 keV, low LET (10 keV primaries), against G4EmDNAPhysics_option2 + G4EmDNAChemistry_option3 — the physics list both the cascade (dnaphysics) and chemistry (chem6) oracles register, so there is no physics-list seam (E29). Per-primary IRT is valid at this low LET; it is a coupled tradeoff, not the chem6-gap fix, at high LET (E17). Out of scope (future work): protons / heavier ions (the main clinical use of Geant4-DNA), and a realistic chromatin geometry.

Remaining open items:

  • Residual ~5.8 % cascade-ion deficit (0.942× vs Geant4; entirely the secondary cascade; primary matches to 0.1%). Small and well-bounded.
  • DNA damage is methodology, not validated absolute physics. The 21×21 fibre grid is a track-core stand-in, not chromatin, and P_indirect is a calibrated fit. The SSB ratio is robust to grid spacing (E27) but the calibration is physics-dependent — v0.7.0's Born excitation shifted it 2.72→3.26 (we report this honestly rather than re-tune). Treat SSB/DSB as indicative; the real validation needs molecularDNA geometry (E14, deferred — needs the example built).

About

WebGPU port of Geant4-DNA: in-browser Monte Carlo track-structure simulation with IRT chemistry and B-DNA SSB/DSB scoring

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors