Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions NOW.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
# NOW Trinity t27 sync
# NOW -- Trinity t27 sync

Last updated: 2026-05-16
Last updated: 2026-05-18

## docs(TRI-NET) -- cross-line package P0/P1/P2 (this PR, Closes #696)

- **NEW** docs: `docs/GF16_BFLOAT16_NMSE_PROTOCOL.md`, `docs/TRI_NET_API.md`, `docs/TRI_NET_WHITEPAPER.md`, `docs/22FDX_TOPS_W_PROJECTION.md`, `docs/ZENODO_BUNDLES.md`, `docs/SCIENTIFIC_IMPROVEMENT_PLAN.md` (2026 t27-side roadmap, R5-honest labels)
- **NEW** specs: `specs/benchmarks/gf16_bfloat16_nmse.t27`, `specs/api/tri_net_api.t27` (both contain `test`+`invariant`+`bench` per L4)
- **NEW** schemas: `schemas/nmse-protocol-v1.json`, `schemas/tri-net-api-v1.json` (draft-07)
- Docs-only; no `gen/`/`coq/`/`bootstrap/` edits; no new `*.sh`; R5-HONEST preserved (projections labelled; no DOIs quoted before upload)
- Full per-deliverable detail in `docs/NOW.md`
- Closes #696

## Wave-42 Lane II — StochRound.v Stochastic Rounding Coq

Expand Down
176 changes: 176 additions & 0 deletions docs/22FDX_TOPS_W_PROJECTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# 22FDX TOPS/W Projection Methodology

> **READ FIRST:** Every number in this document is a **projection**, not a
> measurement on silicon. No die targeting 22FDX has been received, brought
> up, or characterised. The purpose of this document is to make the
> projection method itself inspectable, so that when (and if) silicon
> arrives, the gap between projection and measurement is auditable
> line-by-line.
>
> **R5-HONEST:** Any reader who reaches a section break without seeing the
> word "projection" in the previous paragraph should treat that omission as
> a bug and file an issue.

---

## 1. Why 22FDX

GlobalFoundries' 22FDX (22 nm fully-depleted SOI with adaptive body bias)
is named here because it is the smallest, fully-public PDK at which a
TRI-NET-class ternary mesh would still benefit from body-bias techniques
(W47 RBB, W48 FBB-active, W49 CapBoost in `trios-coq/Physics/`). It is
*not* selected because we have access to 22FDX shuttles; we do not, and
this document does not assume we will.

Other plausible PDKs (Sky130, IHP-SG13G2, IHP-SG13S, TSMC N28HPC+,
SMIC 28HKC) would change the absolute numbers but not the projection
method. The method here is the contribution; the chosen PDK is the
worked example.

---

## 2. What we are projecting

The TOPS/W projection envelope, for a single tile of the gamma surface
(32 PEs), running INT1.58 inference, **assuming** all of:

- LUT-NPU operator `OP_LUT_NPU = 0xE3` carries the inner loop;
- AVS-48 voltage stacking microcode is engaged
(`L2_BG_AVS96_STEP_GATE` extension on top);
- Sub-V_T weak-inversion clock domain at V = 0.30 V available
(`OP_SUBTH_CLK = 0xE4`);
- Triple-Deck RBB / FBB / CapBoost engaged (W47..W49 Coq lemmas);
- Activity factor 0.5 (industry-conservative, see references).

These are *spec-level* assumptions backed by Coq lemmas in this repo.
None of them is a silicon claim.

---

## 3. Confidence-level scheme

Every projected figure is tagged with a confidence band:

| Band | Meaning |
|--------|--------------------------------------------------------------------------------------|
| `C1` | Algebra-bound: derived directly from a Coq-proven identity; no PDK assumption. |
| `C2` | Toolchain-bound: derived from synthesis on an open PDK (Sky130 / Yosys) at this repo.|
| `C3` | Scaling-bound: PDK-to-PDK scaling rules applied to a `C2` number; cite the rule. |
| `C4` | Vendor-cited: backed by a published 22FDX datapoint from GF or a peer-reviewed paper.|
| `C5` | Speculative: no `C1..C4` backing; included only to show envelope and labelled red. |

A reader is entitled to ignore any `C5` row and most `C3` rows when
forming an opinion about silicon.

---

## 4. Baseline (current in-repo)

The `STATUS.md` ladder shows the gamma mesh at `SIM` level (Verilog
generated, simulation passes). No `SYNTH` row exists for a 22FDX cell
library because no 22FDX cell library is integrated into the build. The
publicly cited baseline numbers used as projection anchors are:

| Anchor | Value | Source / band |
|----------------------|----------------------|----------------------------------------|
| W34 baseline TOPS/W | 225 | `NOW.md` Wave-35 row; band `C2` (open) |
| W35 LUT-NPU lift | x1.20 | `NOW.md` Wave-35; Coq Qed; band `C1` |
| W36 AVS-48 lift | TOPS/W >= 297 | `NOW.md` Wave-36 W-104-B; band `C1` |
| W37 Sub-V_T lift | TOPS/W >= 350 | `NOW.md` Wave-37 W-104-C; band `C1` |
| W47 RBB lift | TOPS/W +1.5% | Coq lemma `rbb_*`; band `C1` |
| W48 FBB-active lift | TOPS/W +1.5..1.9% | Coq lemma `fbb_active_tops_w_lift_*` |
| W49 CapBoost lift | TOPS/W +0.7..0.9% | Coq lemma `cap_boost_tops_w_lift_*` |

All of those are derived from in-repo Coq lemmas and are **algebra-bound**
(`C1`). They are NOT silicon measurements.

---

## 5. 22FDX scaling assumptions (`C3` to `C4`)

To reach a 22FDX projection from a Sky130-class baseline, we apply these
scaling rules:

| Rule | Band | Notes |
|---------------------------------------------------------------|------|----------------------------------------------------|
| Dynamic-energy / op scales with `(V_22 / V_130)^2` | `C4` | textbook CMOS, cite Rabaey 2003 |
| Cap / op shrinks with `(L_22 / L_130)`, capped at 2x | `C3` | conservative; finFET-scaling literature is mixed |
| Leakage / op increases at low V_DD; offset by RBB at idle | `C3` | Tschanz JSSC 2002; matches W47 RBB lemma |
| Forward body bias at active path reduces delay ~12% | `C4` | Mukhopadhyay 2009 + W48 FBB lemma |
| 22FDX V_DD nominal 0.8 V; subthreshold 0.4 V | `C4` | GF 22FDX datasheet |
| f_max derating at subthreshold: x0.5 vs nominal | `C1` | W37 lemma `subth_freq_derating_factor_2` |

A worked propagation of these rules onto the in-repo anchors yields a
**projected** TOPS/W envelope at 22FDX of:

```
nominal V_DD, no body bias : 350 - 420 TOPS/W (band C3)
nominal V_DD, with TripleDeck: 400 - 490 TOPS/W (band C3)
subthreshold V_DD, full stack: >=600 -- 800 TOPS/W (band C3+C4 mix)
```

No assertion is made that 22FDX silicon would deliver any of these. The
purpose of the table is to **make the method auditable** before silicon
exists. When silicon exists, this table is what gets falsified
line-by-line.

---

## 6. Falsification policy

Each row above is associated with a falsification witness in the Coq
ledger:

- W34 baseline: `Trinity-loss sparsity >= 0.5 @ batch=1` (W-104-A).
- W36 AVS-48: `eta >= 0.93 => TOPS/W >= 297` (W-104-B; `avs_w104_b_witness`).
- W37 Sub-V_T: `V=0.30 + AVS48 + LUT-NPU => TOPS/W >= 350` (W-104-C;
`subth_w104_c_witness`).

A 22FDX measurement that falsifies any of these will be reported and the
Coq lemma adjusted (or, more likely, the assumption set behind the lemma
narrowed). That is the deal.

---

## 7. What this document does NOT do

- It does not state a measured 22FDX TOPS/W number.
- It does not commit to a 22FDX tape-out.
- It does not compare 22FDX projections against any commercial product.
- It does not name a date for silicon.

If the reader sees any of those done elsewhere on the basis of this
document, that is a misreading and should be reported as an issue.

---

## 8. Cross-links

- `NOW.md` Waves W34..W49 -- the running ledger of lifts.
- `STATUS.md` -- readiness ladder; no SYNTH or GDS at 22FDX.
- `BENCHMARKS.md` -- restrained posture; what is and isn't measured.
- `COMPETITORS.md` -- no parity claim against any commercial NPU.
- `trios-coq/Physics/` -- Coq lemmas that anchor the `C1` rows.
- `docs/TRI_NET_WHITEPAPER.md` -- the line's positioning.
- `tt-trinity-euler` / `tt-trinity-gamma` (chip repos) -- silicon
targeting decisions live there, not here.
- `docs/SCIENTIFIC_IMPROVEMENT_PLAN.md` -- EN-02 names this projection
table as the t27-side toolchain deliverable for energy work.

---

## 9. References (external)

- Rabaey, Chandrakasan, Nikolic, *Digital Integrated Circuits*, 2003.
- Tschanz et al., "Adaptive Body Bias for Reducing Impacts of Die-to-Die
and Within-Die Parameter Variations on Microprocessor Frequency and
Leakage", JSSC 2002.
- Mukhopadhyay et al., "Modeling and Analysis of Loading Effect in
Leakage of Nano-Scaled Bulk-CMOS Logic Circuits", 2009.
- Larsson and Svensson, "Noise in Digital Dynamic CMOS Circuits", 1994.
- Jiang et al., capacitive supply decoupling, 2018.
- GlobalFoundries 22FDX product brief (vendor page; cite at use).

---

**phi^2 + 1/phi^2 = 3 | TRINITY**
202 changes: 202 additions & 0 deletions docs/GF16_BFLOAT16_NMSE_PROTOCOL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# GF16 vs bfloat16 NMSE Protocol

> **Status:** SPEC. Protocol-level only. This document standardises *how* a
> GF16-vs-bfloat16 NMSE comparison is run and reported in the TRI-NET line.
> It does **not** publish silicon numbers -- those belong with the chip repos
> when (and only when) silicon evidence is available.
>
> **Source of truth:** `specs/benchmarks/gf16_bfloat16_nmse.t27` is the
> machine-readable spec; this document is the human-readable mirror. If the
> two disagree, the `.t27` spec wins and the disagreement is an issue.
>
> **R5-HONEST:** No row in this document asserts a measured silicon result.
> Reference distributions and tolerance windows are stated as protocol
> parameters, not outcomes.

---

## 1. Scope and intent

The TRI-NET numeric kernel is **GoldenFloat GF16** (primary path; see
`FORMAT_REGISTRY.md`). The dominant industry alternative at the same width
for inference workloads is **bfloat16** (BF16). Multiple parties have asked
for a like-for-like NMSE (normalised mean-squared error) comparison. The
purpose of this document is to define **one protocol** for that comparison
so that:

1. results produced under it are reproducible from this repo;
2. results from chip repos (`tt-trinity-phi`, `tt-trinity-euler`,
`tt-trinity-gamma`) can be compared with the same methodology;
3. nothing in the protocol presumes a TOPS race -- only numeric fidelity.

Out of scope: latency, throughput, energy. Those have their own (also
restrained) treatments in `BENCHMARKS.md`.

---

## 2. Quantities

Let `x` be a reference real value (drawn from a defined distribution,
section 4) and let `Q(x)` be the result of round-trip
`real -> format -> real` through a numeric format. Define

```
NMSE(F)
= E[ (x - Q_F(x))^2 ] / E[ x^2 ]
```

where the expectation is taken over the protocol's reference distribution.
Two formats are compared by reporting `NMSE(GF16)` and `NMSE(BF16)` against
the same sampled `x` set and the same RNG seed.

**Important:** the ratio `NMSE(GF16) / NMSE(BF16)` is the headline number.
A ratio of 1.0 means equal numeric fidelity at the protocol's distribution;
< 1.0 means GF16 is closer to the reference for that distribution; > 1.0
means BF16 is closer. No claim is attached to either direction without a
specific distribution and seed.

---

## 3. Format definitions used by the protocol

### 3.1 GF16

GF16 is defined by `specs/numeric/gf16.t27` and recorded in the SSOT
`conformance/FORMAT-SPEC-001.json`. Bit layout (mirror of
`FORMAT_REGISTRY.md` section 1):

```
GF16 = [ S(1) | E(6) | M(9) ]
value = (-1)^S * 2^(E - 31) * (1 + M / 2^9)
```

The protocol uses the canonical round-to-nearest, ties-to-even rounding
rule defined in the spec.

### 3.2 bfloat16

BF16 is defined externally by the IEEE-754 binary32 layout with mantissa
truncated to 7 bits:

```
BF16 = [ S(1) | E(8) | M(7) ]
value = (-1)^S * 2^(E - 127) * (1 + M / 2^7)
```

The protocol uses round-to-nearest, ties-to-even. No subnormal handling
deviation is permitted; BF16 implementations that flush subnormals to zero
must declare that fact in the manifest (section 6).

### 3.3 Why the comparison is meaningful

GF16 and BF16 occupy the same memory footprint (16 bits). They differ in
how those 16 bits are split: GF16 gives 9 bits to the mantissa, BF16 gives
7. GF16's exponent field is 6 bits with bias 31; BF16's is 8 bits with
bias 127. The expected outcome is that **GF16 wins on near-1.0 dynamic
range, BF16 wins on very large / very small values**. The protocol must
not preempt this with a distribution chosen to favour either side.

---

## 4. Reference distributions

A run reports NMSE under **each** of these distributions independently.
A single number reported without naming a distribution is invalid under
this protocol.

| Tag | Distribution | Rationale |
|-----------|------------------------------------------------|--------------------------------------|
| `D_NORM` | `x ~ N(0, 1)` | Generic weight-like distribution |
| `D_LOG` | `log2|x| ~ U(-10, 10)`, sign uniform | Geometric coverage of dynamic range |
| `D_RELU` | `x = max(0, N(0, 1))` | Post-activation weight distribution |
| `D_PHI` | `x ~ N(phi, 1/phi)`, where `phi=(1+sqrt 5)/2` | Identity-anchored sanity (L5) |
| `D_DEEP` | mixture: 0.7 `D_NORM` + 0.3 `D_LOG` | Heuristic for transformer weights |

Each run uses 10 million samples per distribution unless explicitly
overridden in the manifest.

---

## 5. Tolerance and identity check (L5)

Before any NMSE figure is reported, a run **must** witness:

```
|phi^2 - (phi + 1)| < 1e-15 // f64 identity check
|phi^2 + 1/phi^2 - 3| < 1e-15 // canonical Trinity identity
```

Failing either witness aborts the run. This is L5 IDENTITY enforced at the
benchmark boundary.

---

## 6. Results manifest

A run produces one JSON file conforming to `schemas/nmse-protocol-v1.json`.
The schema requires, at minimum:

- protocol version (semver);
- toolchain seal hash (matches `bootstrap/stage0/FROZEN_HASH`);
- RNG family and seed;
- sample count per distribution;
- per-distribution `NMSE_GF16`, `NMSE_BF16`, and their ratio;
- BF16 subnormal policy (`ieee` or `ftz`);
- runner identity (host architecture, compiler version);
- timestamp (RFC3339).

A run that omits any required field is non-conforming and must not be
cited in TRI-NET documentation.

---

## 7. Test obligations (L4)

The companion spec `specs/benchmarks/gf16_bfloat16_nmse.t27` includes:

- a `test` block that runs the identity witness;
- an `invariant` block that asserts `NMSE >= 0` for each format;
- a `bench` block that defines the measurement procedure.

These are the L4 TESTABILITY requirements for this benchmark family.

---

## 8. Reporting policy

When a chip-repo or third-party result is cited:

- ratio reported only when both sides came from the same seed/distribution;
- protocol version stated;
- seal hash stated (or `unsealed` if not measured under a sealed toolchain
-- in which case the result is informational, not certifying);
- no comparison against a commercial-NPU number is permitted in this
protocol's outputs (see `COMPETITORS.md` for why).

---

## 9. Cross-links

- Numeric SSOT: `conformance/FORMAT-SPEC-001.json`, `FORMAT_REGISTRY.md`.
- Sibling repos that may emit conforming manifests:
- `tt-trinity-phi` (phi-anchor, 1x1) -- identity-domain NMSE only.
- `tt-trinity-euler` (8x2, safety) -- `D_NORM` and `D_RELU` envelopes.
- `tt-trinity-gamma` (8x4, 32-PE mesh) -- `D_DEEP` is the headline.
- TRI-NET API doc: `docs/TRI_NET_API.md` -- how an external integrator
reads NMSE manifests programmatically.
- Roadmap: `docs/SCIENTIFIC_IMPROVEMENT_PLAN.md` -- PUB-02 names "one
sealed-toolchain NMSE manifest" as a 2026 target deliverable.

---

## 10. Non-claims (R5-HONEST)

- This document does **not** claim a measured silicon NMSE for any product.
- This document does **not** claim GF16 is universally better than BF16.
- This document does **not** claim a fixed `NMSE(GF16) / NMSE(BF16)` ratio.
- It defines **how** to measure such ratios so claims, when made, are
reproducible.

---

**phi^2 + 1/phi^2 = 3 | TRINITY**
Loading
Loading