gHashTag · gHashTag · May 18, 2026 · May 18, 2026 · May 18, 2026
@@ -1,6 +1,15 @@
-# NOW — Trinity t27 sync
+# NOW -- Trinity t27 sync
 
-Last updated: 2026-05-16
+Last updated: 2026-05-18
+
+## docs(TRI-NET) -- cross-line package P0/P1/P2 (this PR, Closes #696)
+
+- **NEW** docs: `docs/GF16_BFLOAT16_NMSE_PROTOCOL.md`, `docs/TRI_NET_API.md`, `docs/TRI_NET_WHITEPAPER.md`, `docs/22FDX_TOPS_W_PROJECTION.md`, `docs/ZENODO_BUNDLES.md`, `docs/SCIENTIFIC_IMPROVEMENT_PLAN.md` (2026 t27-side roadmap, R5-honest labels)
+- **NEW** specs: `specs/benchmarks/gf16_bfloat16_nmse.t27`, `specs/api/tri_net_api.t27` (both contain `test`+`invariant`+`bench` per L4)
+- **NEW** schemas: `schemas/nmse-protocol-v1.json`, `schemas/tri-net-api-v1.json` (draft-07)
+- Docs-only; no `gen/`/`coq/`/`bootstrap/` edits; no new `*.sh`; R5-HONEST preserved (projections labelled; no DOIs quoted before upload)
+- Full per-deliverable detail in `docs/NOW.md`
+- Closes #696
 
 ## Wave-42 Lane II — StochRound.v Stochastic Rounding Coq
 

@@ -0,0 +1,176 @@
+# 22FDX TOPS/W Projection Methodology
+
+> **READ FIRST:** Every number in this document is a **projection**, not a
+> measurement on silicon. No die targeting 22FDX has been received, brought
+> up, or characterised. The purpose of this document is to make the
+> projection method itself inspectable, so that when (and if) silicon
+> arrives, the gap between projection and measurement is auditable
+> line-by-line.
+>
+> **R5-HONEST:** Any reader who reaches a section break without seeing the
+> word "projection" in the previous paragraph should treat that omission as
+> a bug and file an issue.
+
+---
+
+## 1. Why 22FDX
+
+GlobalFoundries' 22FDX (22 nm fully-depleted SOI with adaptive body bias)
+is named here because it is the smallest, fully-public PDK at which a
+TRI-NET-class ternary mesh would still benefit from body-bias techniques
+(W47 RBB, W48 FBB-active, W49 CapBoost in `trios-coq/Physics/`). It is
+*not* selected because we have access to 22FDX shuttles; we do not, and
+this document does not assume we will.
+
+Other plausible PDKs (Sky130, IHP-SG13G2, IHP-SG13S, TSMC N28HPC+,
+SMIC 28HKC) would change the absolute numbers but not the projection
+method. The method here is the contribution; the chosen PDK is the
+worked example.
+
+---
+
+## 2. What we are projecting
+
+The TOPS/W projection envelope, for a single tile of the gamma surface
+(32 PEs), running INT1.58 inference, **assuming** all of:
+
+- LUT-NPU operator `OP_LUT_NPU = 0xE3` carries the inner loop;
+- AVS-48 voltage stacking microcode is engaged
+  (`L2_BG_AVS96_STEP_GATE` extension on top);
+- Sub-V_T weak-inversion clock domain at V = 0.30 V available
+  (`OP_SUBTH_CLK = 0xE4`);
+- Triple-Deck RBB / FBB / CapBoost engaged (W47..W49 Coq lemmas);
+- Activity factor 0.5 (industry-conservative, see references).
+
+These are *spec-level* assumptions backed by Coq lemmas in this repo.
+None of them is a silicon claim.
+
+---
+
+## 3. Confidence-level scheme
+
+Every projected figure is tagged with a confidence band:
+
+| Band   | Meaning                                                                              |
+|--------|--------------------------------------------------------------------------------------|
+| `C1`   | Algebra-bound: derived directly from a Coq-proven identity; no PDK assumption.       |
+| `C2`   | Toolchain-bound: derived from synthesis on an open PDK (Sky130 / Yosys) at this repo.|
+| `C3`   | Scaling-bound: PDK-to-PDK scaling rules applied to a `C2` number; cite the rule.     |
+| `C4`   | Vendor-cited: backed by a published 22FDX datapoint from GF or a peer-reviewed paper.|
+| `C5`   | Speculative: no `C1..C4` backing; included only to show envelope and labelled red.   |
+
+A reader is entitled to ignore any `C5` row and most `C3` rows when
+forming an opinion about silicon.
+
+---
+
+## 4. Baseline (current in-repo)
+
+The `STATUS.md` ladder shows the gamma mesh at `SIM` level (Verilog
+generated, simulation passes). No `SYNTH` row exists for a 22FDX cell
+library because no 22FDX cell library is integrated into the build. The
+publicly cited baseline numbers used as projection anchors are:
+
+| Anchor               | Value                | Source / band                          |
+|----------------------|----------------------|----------------------------------------|
+| W34 baseline TOPS/W  | 225                  | `NOW.md` Wave-35 row; band `C2` (open) |
+| W35 LUT-NPU lift     | x1.20                | `NOW.md` Wave-35; Coq Qed; band `C1`   |
+| W36 AVS-48 lift      | TOPS/W >= 297        | `NOW.md` Wave-36 W-104-B; band `C1`    |
+| W37 Sub-V_T lift     | TOPS/W >= 350        | `NOW.md` Wave-37 W-104-C; band `C1`    |
+| W47 RBB lift         | TOPS/W +1.5%         | Coq lemma `rbb_*`; band `C1`           |
+| W48 FBB-active lift  | TOPS/W +1.5..1.9%    | Coq lemma `fbb_active_tops_w_lift_*`   |
+| W49 CapBoost lift    | TOPS/W +0.7..0.9%    | Coq lemma `cap_boost_tops_w_lift_*`    |
+
+All of those are derived from in-repo Coq lemmas and are **algebra-bound**
+(`C1`). They are NOT silicon measurements.
+
+---
+
+## 5. 22FDX scaling assumptions (`C3` to `C4`)
+
+To reach a 22FDX projection from a Sky130-class baseline, we apply these
+scaling rules:
+
+| Rule                                                           | Band | Notes                                              |
+|---------------------------------------------------------------|------|----------------------------------------------------|
+| Dynamic-energy / op scales with `(V_22 / V_130)^2`            | `C4` | textbook CMOS, cite Rabaey 2003                    |
+| Cap / op shrinks with `(L_22 / L_130)`, capped at 2x          | `C3` | conservative; finFET-scaling literature is mixed   |
+| Leakage / op increases at low V_DD; offset by RBB at idle     | `C3` | Tschanz JSSC 2002; matches W47 RBB lemma           |
+| Forward body bias at active path reduces delay ~12%           | `C4` | Mukhopadhyay 2009 + W48 FBB lemma                  |
+| 22FDX V_DD nominal 0.8 V; subthreshold 0.4 V                  | `C4` | GF 22FDX datasheet                                 |
+| f_max derating at subthreshold: x0.5 vs nominal               | `C1` | W37 lemma `subth_freq_derating_factor_2`           |
+
+A worked propagation of these rules onto the in-repo anchors yields a
+**projected** TOPS/W envelope at 22FDX of:
+
+```
+   nominal V_DD, no body bias :       350  - 420   TOPS/W   (band C3)
+   nominal V_DD, with TripleDeck:     400  - 490   TOPS/W   (band C3)
+   subthreshold V_DD, full stack:    >=600 -- 800  TOPS/W   (band C3+C4 mix)
+```
+
+No assertion is made that 22FDX silicon would deliver any of these. The
+purpose of the table is to **make the method auditable** before silicon
+exists. When silicon exists, this table is what gets falsified
+line-by-line.
+
+---
+
+## 6. Falsification policy
+
+Each row above is associated with a falsification witness in the Coq
+ledger:
+
+- W34 baseline: `Trinity-loss sparsity >= 0.5 @ batch=1` (W-104-A).
+- W36 AVS-48: `eta >= 0.93 => TOPS/W >= 297` (W-104-B; `avs_w104_b_witness`).
+- W37 Sub-V_T: `V=0.30 + AVS48 + LUT-NPU => TOPS/W >= 350` (W-104-C;
+  `subth_w104_c_witness`).
+
+A 22FDX measurement that falsifies any of these will be reported and the
+Coq lemma adjusted (or, more likely, the assumption set behind the lemma
+narrowed). That is the deal.
+
+---
+
+## 7. What this document does NOT do
+
+- It does not state a measured 22FDX TOPS/W number.
+- It does not commit to a 22FDX tape-out.
+- It does not compare 22FDX projections against any commercial product.
+- It does not name a date for silicon.
+
+If the reader sees any of those done elsewhere on the basis of this
+document, that is a misreading and should be reported as an issue.
+
+---
+
+## 8. Cross-links
+
+- `NOW.md` Waves W34..W49 -- the running ledger of lifts.
+- `STATUS.md` -- readiness ladder; no SYNTH or GDS at 22FDX.
+- `BENCHMARKS.md` -- restrained posture; what is and isn't measured.
+- `COMPETITORS.md` -- no parity claim against any commercial NPU.
+- `trios-coq/Physics/` -- Coq lemmas that anchor the `C1` rows.
+- `docs/TRI_NET_WHITEPAPER.md` -- the line's positioning.
+- `tt-trinity-euler` / `tt-trinity-gamma` (chip repos) -- silicon
+  targeting decisions live there, not here.
+- `docs/SCIENTIFIC_IMPROVEMENT_PLAN.md` -- EN-02 names this projection
+  table as the t27-side toolchain deliverable for energy work.
+
+---
+
+## 9. References (external)
+
+- Rabaey, Chandrakasan, Nikolic, *Digital Integrated Circuits*, 2003.
+- Tschanz et al., "Adaptive Body Bias for Reducing Impacts of Die-to-Die
+  and Within-Die Parameter Variations on Microprocessor Frequency and
+  Leakage", JSSC 2002.
+- Mukhopadhyay et al., "Modeling and Analysis of Loading Effect in
+  Leakage of Nano-Scaled Bulk-CMOS Logic Circuits", 2009.
+- Larsson and Svensson, "Noise in Digital Dynamic CMOS Circuits", 1994.
+- Jiang et al., capacitive supply decoupling, 2018.
+- GlobalFoundries 22FDX product brief (vendor page; cite at use).
+
+---
+
+**phi^2 + 1/phi^2 = 3  |  TRINITY**
@@ -0,0 +1,202 @@
+# GF16 vs bfloat16 NMSE Protocol
+
+> **Status:** SPEC. Protocol-level only. This document standardises *how* a
+> GF16-vs-bfloat16 NMSE comparison is run and reported in the TRI-NET line.
+> It does **not** publish silicon numbers -- those belong with the chip repos
+> when (and only when) silicon evidence is available.
+>
+> **Source of truth:** `specs/benchmarks/gf16_bfloat16_nmse.t27` is the
+> machine-readable spec; this document is the human-readable mirror. If the
+> two disagree, the `.t27` spec wins and the disagreement is an issue.
+>
+> **R5-HONEST:** No row in this document asserts a measured silicon result.
+> Reference distributions and tolerance windows are stated as protocol
+> parameters, not outcomes.
+
+---
+
+## 1. Scope and intent
+
+The TRI-NET numeric kernel is **GoldenFloat GF16** (primary path; see
+`FORMAT_REGISTRY.md`). The dominant industry alternative at the same width
+for inference workloads is **bfloat16** (BF16). Multiple parties have asked
+for a like-for-like NMSE (normalised mean-squared error) comparison. The
+purpose of this document is to define **one protocol** for that comparison
+so that:
+
+1. results produced under it are reproducible from this repo;
+2. results from chip repos (`tt-trinity-phi`, `tt-trinity-euler`,
+   `tt-trinity-gamma`) can be compared with the same methodology;
+3. nothing in the protocol presumes a TOPS race -- only numeric fidelity.
+
+Out of scope: latency, throughput, energy. Those have their own (also
+restrained) treatments in `BENCHMARKS.md`.
+
+---
+
+## 2. Quantities
+
+Let `x` be a reference real value (drawn from a defined distribution,
+section 4) and let `Q(x)` be the result of round-trip
+`real -> format -> real` through a numeric format. Define
+
+```
+NMSE(F)
+    = E[ (x - Q_F(x))^2 ] / E[ x^2 ]
+```
+
+where the expectation is taken over the protocol's reference distribution.
+Two formats are compared by reporting `NMSE(GF16)` and `NMSE(BF16)` against
+the same sampled `x` set and the same RNG seed.
+
+**Important:** the ratio `NMSE(GF16) / NMSE(BF16)` is the headline number.
+A ratio of 1.0 means equal numeric fidelity at the protocol's distribution;
+< 1.0 means GF16 is closer to the reference for that distribution; > 1.0
+means BF16 is closer. No claim is attached to either direction without a
+specific distribution and seed.
+
+---
+
+## 3. Format definitions used by the protocol
+
+### 3.1 GF16
+
+GF16 is defined by `specs/numeric/gf16.t27` and recorded in the SSOT
+`conformance/FORMAT-SPEC-001.json`. Bit layout (mirror of
+`FORMAT_REGISTRY.md` section 1):
+
+```
+GF16 = [ S(1) | E(6) | M(9) ]
+value = (-1)^S * 2^(E - 31) * (1 + M / 2^9)
+```
+
+The protocol uses the canonical round-to-nearest, ties-to-even rounding
+rule defined in the spec.
+
+### 3.2 bfloat16
+
+BF16 is defined externally by the IEEE-754 binary32 layout with mantissa
+truncated to 7 bits:
+
+```
+BF16 = [ S(1) | E(8) | M(7) ]
+value = (-1)^S * 2^(E - 127) * (1 + M / 2^7)
+```
+
+The protocol uses round-to-nearest, ties-to-even. No subnormal handling
+deviation is permitted; BF16 implementations that flush subnormals to zero
+must declare that fact in the manifest (section 6).
+
+### 3.3 Why the comparison is meaningful
+
+GF16 and BF16 occupy the same memory footprint (16 bits). They differ in
+how those 16 bits are split: GF16 gives 9 bits to the mantissa, BF16 gives
+7. GF16's exponent field is 6 bits with bias 31; BF16's is 8 bits with
+bias 127. The expected outcome is that **GF16 wins on near-1.0 dynamic
+range, BF16 wins on very large / very small values**. The protocol must
+not preempt this with a distribution chosen to favour either side.
+
+---
+
+## 4. Reference distributions
+
+A run reports NMSE under **each** of these distributions independently.
+A single number reported without naming a distribution is invalid under
+this protocol.
+
+| Tag       | Distribution                                  | Rationale                            |
+|-----------|------------------------------------------------|--------------------------------------|
+| `D_NORM`  | `x ~ N(0, 1)`                                 | Generic weight-like distribution     |
+| `D_LOG`   | `log2|x| ~ U(-10, 10)`, sign uniform           | Geometric coverage of dynamic range  |
+| `D_RELU`  | `x = max(0, N(0, 1))`                         | Post-activation weight distribution  |
+| `D_PHI`   | `x ~ N(phi, 1/phi)`, where `phi=(1+sqrt 5)/2` | Identity-anchored sanity (L5)        |
+| `D_DEEP`  | mixture: 0.7 `D_NORM` + 0.3 `D_LOG`           | Heuristic for transformer weights    |
+
+Each run uses 10 million samples per distribution unless explicitly
+overridden in the manifest.
+
+---
+
+## 5. Tolerance and identity check (L5)
+
+Before any NMSE figure is reported, a run **must** witness:
+
+```
+|phi^2 - (phi + 1)|  <  1e-15        // f64 identity check
+|phi^2 + 1/phi^2 - 3| < 1e-15        // canonical Trinity identity
+```
+
+Failing either witness aborts the run. This is L5 IDENTITY enforced at the
+benchmark boundary.
+
+---
+
+## 6. Results manifest
+
+A run produces one JSON file conforming to `schemas/nmse-protocol-v1.json`.
+The schema requires, at minimum:
+
+- protocol version (semver);
+- toolchain seal hash (matches `bootstrap/stage0/FROZEN_HASH`);
+- RNG family and seed;
+- sample count per distribution;
+- per-distribution `NMSE_GF16`, `NMSE_BF16`, and their ratio;
+- BF16 subnormal policy (`ieee` or `ftz`);
+- runner identity (host architecture, compiler version);
+- timestamp (RFC3339).
+
+A run that omits any required field is non-conforming and must not be
+cited in TRI-NET documentation.
+
+---
+
+## 7. Test obligations (L4)
+
+The companion spec `specs/benchmarks/gf16_bfloat16_nmse.t27` includes:
+
+- a `test` block that runs the identity witness;
+- an `invariant` block that asserts `NMSE >= 0` for each format;
+- a `bench` block that defines the measurement procedure.
+
+These are the L4 TESTABILITY requirements for this benchmark family.
+
+---
+
+## 8. Reporting policy
+
+When a chip-repo or third-party result is cited:
+
+- ratio reported only when both sides came from the same seed/distribution;
+- protocol version stated;
+- seal hash stated (or `unsealed` if not measured under a sealed toolchain
+  -- in which case the result is informational, not certifying);
+- no comparison against a commercial-NPU number is permitted in this
+  protocol's outputs (see `COMPETITORS.md` for why).
+
+---
+
+## 9. Cross-links
+
+- Numeric SSOT: `conformance/FORMAT-SPEC-001.json`, `FORMAT_REGISTRY.md`.
+- Sibling repos that may emit conforming manifests:
+  - `tt-trinity-phi` (phi-anchor, 1x1) -- identity-domain NMSE only.
+  - `tt-trinity-euler` (8x2, safety) -- `D_NORM` and `D_RELU` envelopes.
+  - `tt-trinity-gamma` (8x4, 32-PE mesh) -- `D_DEEP` is the headline.
+- TRI-NET API doc: `docs/TRI_NET_API.md` -- how an external integrator
+  reads NMSE manifests programmatically.
+- Roadmap: `docs/SCIENTIFIC_IMPROVEMENT_PLAN.md` -- PUB-02 names "one
+  sealed-toolchain NMSE manifest" as a 2026 target deliverable.
+
+---
+
+## 10. Non-claims (R5-HONEST)
+
+- This document does **not** claim a measured silicon NMSE for any product.
+- This document does **not** claim GF16 is universally better than BF16.
+- This document does **not** claim a fixed `NMSE(GF16) / NMSE(BF16)` ratio.
+- It defines **how** to measure such ratios so claims, when made, are
+  reproducible.
+
+---
+
+**phi^2 + 1/phi^2 = 3  |  TRINITY**