Skip to content

0.8.0 — Slice 5 (G1 structured SearchHit + FTS5 tokenizer) + Slices 0 close + HITL decisions#83

Merged
coreyt merged 21 commits into
mainfrom
0.8.0-slice-5-campaign
Jun 3, 2026
Merged

0.8.0 — Slice 5 (G1 structured SearchHit + FTS5 tokenizer) + Slices 0 close + HITL decisions#83
coreyt merged 21 commits into
mainfrom
0.8.0-slice-5-campaign

Conversation

@coreyt
Copy link
Copy Markdown
Owner

@coreyt coreyt commented Jun 3, 2026

0.8.0 campaign — Slices 0 + 5, signed HITL decisions, corpus-line integration

Brings the local 0.8.0 campaign onto main, and integrates the out-of-band corpus-work line.

Slice 5 — G1 structured SearchHit + global FTS5 tokenizer upgrade (PASS after codex BLOCK→fix-1)

  • SearchResult.results: Vec<SearchHit{id=write_cursor, kind, body, score:f64, branch}>; Eq dropped (compiles). Both branches emit structured hits (vector=vec_distance_l2, FTS=bm25()); dedup-on-body + vector-first preserved.
  • NEW step_id 11 drop+recreate FTS5 tokenizer migration (accretion-exemption marker), SCHEMA_VERSION 10→11; re-tokenization wired at open.
  • codex §9 review found a [P1] crash-safety bug (reindex gated on the one-time 10→11 boundary → forever-empty FTS index on a crash in the step-11/reproject window). fix-1 made it crash-retryable + idempotent via an atomic completion marker in _fathomdb_open_state; codex re-review: PASS.
  • Recall floor held across the migration: 1.000 → 1.000. Py + TS SDK parity in lockstep; X1 functional search harnesses (Py+TS, cross-binding equivalence) stood up.
  • AC-037 no-egress gate CONFIRMED GREEN on windchill3 (post-merge code; no off-loopback connects).

Slice 0 — plan + STATUS board + DOC-INDEX + ADRs (design-adr, CLOSED)

HITL decisions recorded (2026-06-02)

Retrieval ADR: Q1=1A (G9 RRF + G10 filtered-KNN table-stakes), Q2=2A (substrate bi-temporal-aware, implement minimal), Q3=documented-only / NO fusion_mode knob, Q4=edges-too, Q5=advisory. Slice 10 contract reconciled to drop the knob. Slice 15 renumbered to step_id 12 / SCHEMA_VERSION 11→12. AC-037 → CI at Slice 40 (gate n).

Corpus-work integration

Merged origin/main 83f5156 with local authoritative for all campaign docs/ADRs/STATUS/code; preserved origin's unique corpus/eval artifacts (tests/corpus/*, dev/corpus-creation/*, dev/notes/0.8.x-corpus-*, corpus QA prompts, test_corpus_eval_qa.py) + added DOC-INDEX rows.

Verification

Engine + schema tests green (incl. pr_g1_* + crash-recovery); clippy clean; mkdocs build --strict green; Py/TS parity confirmed. Slices 5 closed; pointer → Slice 10 (gate-clear).

🤖 Generated with Claude Code

coreyt and others added 21 commits June 2, 2026 06:37
HITL-approved 0.8.0 implementation plan and its supporting corpus, committed
to give the Slice 0 agent a clean baseline (cold-start §12.2 + worktree
baseline §1 both need these on main).

Corpus:
- dev/plans/0.8.0-implementation.md — the approved 9-slice plan (mod-5
  numbering + reserved gaps; slice-orchestrator model; per-slice
  design→TDD→codex→fix-N discipline; cross-cutting X1 SDK parity+functional
  harnesses / X2 mkdocs build / X3 per-slice docs + dev/DOC-INDEX.md).
- dev/design/0.8.0-agent-memory-fit.md — gap ladder G0–G12 + consumer fit (§9).
- dev/design/0.8.0-v05-feature-triage.md — v0.5.x add/defer/drop triage.
- dev/design/agent-memory-impl-strategy.md — per-gap leverage build plan.
- dev/adr/ADR-0.8.0-supersede-five-verb-surface-cap.md (new) +
  ADR-0.8.0-agent-memory-retrieval-and-identity.md (scope reclass).
- dev/roadmap/0.8.0.md — v0.5.x revival scope.
- dev/profiling/ — ingest+retrieval stack profiling component guide.
- dev/plans/prompts/0.8.0-PROF-*.md — ingest/retrieval profiler slice prompts.
- dev/plans/runs/*.wf.js — the planning workflow scripts (resumable).

Slice 0 (separate agent) executes next; this commit does not start execution.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…concile mkdocs nav, author substrate ADR, advance supersession ADR; advance pointer to Slice 5

Slice 0 [design-adr], no worktree, no code. Subagents 0.a ∥ 0.b (both spawned
as subagent_type: implementer, no fallback) returned PASS against their bars.

- 0.a: NEW dev/adr/ADR-0.8.0-canonical-identity-substrate.md — four substrate
  decisions settled (additive column shape; invalidate-not-delete; edges carry
  temporal cols = Q4 yes; op-store cascade); verbatim Slice-15 schema delta
  (logical_id+superseded_at on canonical_nodes AND canonical_edges; partial
  unique-active index; folded G4/G5 indexes; MIGRATION-ACCRETION-EXEMPTION;
  SCHEMA_VERSION 10->11); in-place additive migration policy; write_cursor-as-
  row-id deviation FLAGGED for HITL; shadow vec0/FTS5 reconciliation named as
  reserved Slice 16.
- 0.b: advanced ADR-0.8.0-supersede-five-verb-surface-cap.md -> decision-ready
  (Q1-Q5 = A1/B1/amend/confirm/SDK-only; conformance rewrite enumerated not
  executed; three guarantees carried forward); authored 0.8.0-plan.md (mod-5
  ladder) + STATUS-0.8.0.md (nine §12.5 sections + X1/X2/X3 column + witness +
  harness contract); created dev/DOC-INDEX.md (X3); reconciled mkdocs nav
  (added 0.6.1; 0.8.0 stub) + mkdocs build --strict green (X2).

Adversarial review PASS. codex (--sandbox read-only) unrunnable here (bubblewrap
net-namespace init failure; relaxation flags denied by harness classifier);
substituted an independent adversarial subagent on the identical four-check +
Slice-15 rubric. Verdict+provenance: dev/plans/runs/0.8.0-slice-0-review-20260602T115112Z.md
(raw codex failure log alongside, .log).

Witness: implementer subagent-type EXISTS + selectable (supersedes stale MEMORY
orchestration-execution-traps note). Terminal Slice-0 exit = HITL gate-package
sign-off (substrate gates Slice 15; supersession finalized at Slice 25).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
….0 ADRs in index

Recovery denylist (HITL decision 2026-06-02, 'five everywhere'): correct prose to
{recover,restore,repair,fix,rebuild} across bindings.md, the supersede ADR (element-3
table, preserves-verdict, Q4, Slice-15 guarantee), interface-inventory (x2), and
0.8.0-v05-feature-triage (x2). doctor is SDK-absent via the positive verb allowlist,
NOT this recovery-name denylist. The five-name enforcement artifacts
(test_no_recovery_surface py/ts/rs + AC-035d) stay byte-unchanged; prose now matches.

ADR index: add a Phase 0.8.0 section (#33-36) registering all four 0.8.0 ADRs in
ADR-0.6.0-decision-index.md, which agents must consult before scanning the tree.

Both issues surfaced by codex review across multiple rounds; final codex pass clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…leteness audit

codex completeness audit of Slice 0 (HEAD a42f234) rated all six substantive
deliverables PRESENT/MEETS-BAR and flagged only a literal reading of the 'no
worktree outstanding' criterion: raw `git worktree list` shows two worktrees
while the board ledger said 'empty'. Neither is a 0.8.0 slice-managed worktree
(Slice 0 is design-adr and created none):
  - .claude/worktrees/agent-ad59c9d7bcc049a3d — locked prior-agent harness orphan
    (0.6.x commit 0debd6b, live pid);
  - .claude/worktrees/corpus-work — active owner-managed corpus-expansion branch.

Make the §6 ledger precise: it tracks 0.8.0 slice-managed worktrees (none
outstanding); the two pre-existing non-slice worktrees are out of scope and not
Slice 0's to clean up. No worktree touched; no code; docs-only. Slice 0 remains
CLOSED — deliverables complete; this only corrects board accuracy.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rompt, record step-11 plan-adjustment + SearchHit/tokenizer compat-ledger ack

Orchestrator bookkeeping (no code). Slice 5 (G1 structured SearchHit + global
FTS5 tokenizer upgrade) advanced NEW→WORKTREE_CREATED:

- Worktree /tmp/fdb-slice-5-20260602T215841Z @ branch slice-5-20260602T215841Z,
  baseline 944cbb4 (main HEAD; re-verified before `git worktree add`).
- Self-contained 5.a implementer prompt: dev/plans/prompts/0.8.0-slice-5.md.
- Plan-adjustment (§12.4): Slice 5's NEW tokenizer migration is step_id 11 and
  bumps SCHEMA_VERSION 10→11 (migrate requires contiguous step_id + open guard
  user_version<=SCHEMA_VERSION; witnessed max step=10). This re-numbers Slice 15
  to step_id 12 / 11→12 (its "step 11 / 10→11" contract is now stale; reconciled
  at Slice 15 close).
- Compat-ledger ack: breaking SearchHit data-class change + tokenizer recall shift
  accepted as documented 0.8.0 events (AC-057a-clean; no HITL sign-off needed).

Board (STATUS-0.8.0.md): §1 current slice, §2 row ⏳, §6 worktree ledger,
§7 decisions, §8 next-action resume loop updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…orktree; slice agent owns its worktree and merges to main

HITL correction: the orchestrator must not create worktrees. The slice agent does
implementation work on a worktree IT owns and merges its green work onto local main
itself; the orchestrator works on main AFTER the merge (review → 5.b → close + advance
pointer). Supersedes the orchestration.md main-thread-owns-worktree / cherry-pick
mechanic for the 0.8.0 campaign.

- Removed the worktree + branch I had erroneously created
  (/tmp/fdb-slice-5-20260602T215841Z, slice-5-20260602T215841Z); git worktree list clean.
- Rewrote dev/plans/prompts/0.8.0-slice-5.md: §0 the slice agent creates its own worktree
  from live main HEAD; §5 it merges to main (no push) when green + output.json written;
  §7 output.json carries merged_to_main_sha.
- Board: §1 state PROMPTED, §2 row, §6 ledger note (slice-agent-owned), §7 correction +
  PROMPTED entries, §8 next-action resume loop reworked to operate on main post-merge.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…TS5 tokenizer upgrade

Design-first: SearchHit shape (id=write_cursor interim, per-branch score, branch
tag), dedup-on-body + vector-first ordering preserved, NEW step-11 drop+recreate
FTS5 tokenizer migration (SCHEMA_VERSION 10->11) with open-time re-tokenization,
Py+TS parity, X1/X2/X3 plan, recall-floor guardrail.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…loor across migration

pr_g1_search_hits.rs: SearchResult.results is Vec<SearchHit> (id==write_cursor,
kind, body, finite score, branch); no Eq derive but PartialEq retained;
dedup-on-body + vector-first ordering. pr_g1_tokenizer_recall.rs: recall floor
>=0.90 on a DB migrated from SCHEMA_VERSION 10 (not just fresh), pinning the
no-op-on-existing-DB failure mode RED.

AC-G1-hit-shape, AC-G1-no-eq, AC-G1-dedup-order, AC-FTS-tokenizer-floor.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… upgrade + Py/TS parity

Engine: add SearchHit{id,kind,body,score,branch}; SearchResult.results ->
Vec<SearchHit>; drop Eq (f64 score); widen ReaderResponse + read_search_in_tx;
vector branch carries write_cursor+kind+vec_distance_l2 score, FTS branch
carries body+kind+write_cursor+bm25(); dedup-on-body + vector-first preserved.

Schema: NEW migration step 11 (drop+recreate search_index with tokenizer
'porter unicode61 remove_diacritics 2', accretion-exemption marker);
SCHEMA_VERSION 10->11. Engine re-tokenizes search_index from canonical source
rows on open across the step-11 boundary (projection-only, no source-record
migration) — fixes the no-op-on-existing-DB failure mode.

Bindings: fathomdb-py PySearchHit + PySearchResult.results parity + .pyi +
types.py/engine.py/__init__.py; fathomdb-napi SearchHit + SearchResult.results
parity + binding.ts NativeSearchHit + index.ts SearchHit + mapper.

Consumers: all .results readers (recall harnesses eu7/eu8, perf_gates,
projection_runtime, cursors, excise_source, fts5_injection_safety,
cursor_read_after_write) read hit.body; migration-step assertions expect step 11.

Recall floor 0.90 holds before AND after the tokenizer upgrade across the
migration (pr_g1_tokenizer_recall: before=1.000 v10, after=1.000 v11, delta 0).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…with cross-binding equivalence

src/python/tests/test_functional_search.py + src/ts/tests/functional-search.test.ts:
open a real engine, write a small corpus, search(), and assert the structured
SearchHit shape end-to-end across the FFI (id/kind/body/score/branch present and
typed) in both languages. Both read the SAME functional_search_fixture.json
(single source of truth) and assert identical body sets per query -> cross-binding
equivalence. Seed of the write->search->retrieve->admin harness later slices extend.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t-plan/DOC-INDEX rows

X3: docs/reference/python-api.md + typescript-api.md document
SearchResult.results: list[SearchHit]/SearchHit[] and the new SearchHit shape
(id/kind/body/score/branch); new docs/guides/structured-search-hits.md usage
example (Py + TS); dev/architecture.md records the structured-hit carrier +
step-11 tokenizer default; dev/test-plan.md adds the SDK functional-harness
tier (X1) Suite Map row; dev/DOC-INDEX.md rows updated for every touched doc.
X2: mkdocs build --strict green; guides promoted to a nav section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…h in the step-11/reproject window

Reconstructs the durable post-crash artifact (user_version=11 + empty
search_index) on a real on-disk DB and asserts recall recovers on reopen.
Fails on current code: the boundary-crossing guard (before<11 && after>=11)
sees before==11 and skips the reproject, leaving the FTS shadow empty
forever (recall 0.000).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The step-11 migration commits user_version=11 with an EMPTY search_index in
its own transaction; the reproject that repopulates the FTS shadow ran in a
SEPARATE later transaction gated on crossing the step-11 boundary
(before<11 && after>=11). A crash after step 11 commits but before the
reproject commits left a durable v11 + empty index on which the next open
saw before==11, skipped the reproject, and stranded the index empty forever
(recall collapses to ~0).

Gate repair on the ABSENCE of a durable completion marker
(_fathomdb_open_state['search_index_tokenizer_reproject_complete']) written
INSIDE the same transaction as the reindex DELETE+INSERT, instead of on the
boundary crossing. Atomic + idempotent: a crash before commit rolls both
back (no marker -> next open re-runs); a crash after finds the marker and
skips. Projection-only; SCHEMA_VERSION unchanged (still 11); no new
migration step.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The marker-absence gate runs on every v11 open. Synthetic/legacy DBs whose
user_version was stamped to 11 without running our migrations lack the
_fathomdb_open_state table (created in step 1), so the marker read raised
'no such table' and masked the downstream embedder-identity / dimension
mismatch errors (AC-048 / AC-048b). Treat a missing _fathomdb_open_state as
'reproject complete / nothing to do' so those DBs fall through to the
embedder-identity probe unchanged. A genuinely migrated (crash-affected) DB
always has the table, so crash-repair is unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…le (codex P1; state/marker-gated, not boundary-gated)
… (PASS after codex BLOCK→fix-1); advance pointer to Slice 10

Slice 5 merged to main by the slice agent (initial c4ab615; fix-1 e76d68b final).
codex §9 review (primary, runnable here) found one [P1] crash-safety bug — the
tokenizer reindex was gated on the one-time 10→11 boundary crossing, so a crash
after step 11 commits (user_version=11, empty search_index) but before reproject
commits left the FTS index empty forever. BLOCK→fix-1 made the reindex
crash-retryable + idempotent via an atomic completion marker; codex re-review of
the fix-1 diff: PASS, no findings.

Close (docs-only):
- Slice 5 CLOSED blocks in 0.8.0-implementation.md + 0.8.0-plan.md; pointer → Slice 10.
- STATUS board: §1 current slice → 10, §2 table (5 ✅ + X1/X2/X3), §3 scoreboard
  (G1 ✅; recall floor held 1.000/1.000 across the migration), §6 worktree ledger
  (slice-5 worktree REMOVED), §7 decision record, §8 next action.
- Renumber notes added: Slice 5 consumed step_id 11 / SCHEMA_VERSION 11, so Slice 15
  becomes step_id 12 / SCHEMA_VERSION 11→12 (impl-plan Slice 15 heading + canonical
  identity ADR AUTHORIZED-delta).
- Closure output.json + both codex review verdicts (slice-diff + fix-1) committed.

Carried to HITL (environment-only, not a code defect): agent-verify.sh STRICT=1
fails AC-037 netns-deny-egress (no rootless userns in this sandbox).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… fusion_mode knob from Slice 10; AC-037 → CI at Slice 40

HITL signed the retrieval ADR (2026-06-02), unblocking Slice 10:
- Q1 = Option 1A: G9 RRF + G10 filtered-KNN both table-stakes, ship in Slice 10
  (G10 uses a CLOSED SearchFilter struct; filter-grammar DSL stays Slice 35).
- Q2 = Option 2A: substrate designed bi-temporal-aware, implement single-supersession only.
- Q3 = documented-only, NO knob: RRF is the unconditional new ranking; the fusion_mode /
  legacy-union escape hatch is DROPPED ("do not carry the overhead"). The entire Slice 10
  contract is reconciled — every fusion_mode mention removed or marked NOT-fusion_mode;
  G12-recency now gated behind a dedicated recency flag; the compat event is documented-only.
- Q4 = edges too: canonical_edges carry logical_id+superseded_at (schema-only).
- Q5 = advisory: §8d capability ladder stays advisory input, not canonical.

ADR status → accepted; added a "## HITL decisions (2026-06-02)" block.
STATUS board §5: retrieval package SIGNED (Slice 10 gate cleared); substrate package
partially signed (Q2/Q4 done; op-store cascade + migration policy + write_cursor deviation
still open before Slice 15). §1/§8 mark the Slice 10 gate cleared.

AC-037 (no-egress) disposition: can't run on windchill3 (Ubuntu 24.04 AppArmor
apparmor_restrict_unprivileged_userns=1) and runs in NO CI workflow. Accept-by-reasoning for
Slice 5 (no networking added); wire scripts/agent-security.sh into CI on a userns-permissive
runner as a NEW Slice 40 gate (n).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t-Slice-5/fix-1); keep Slice 40 CI gate (n)

HITL ran the gate on windchill3 with the AppArmor userns lockdown temporarily relaxed
(kernel.apparmor_restrict_unprivileged_userns=0, restored to 1 after):
  AC-037 OK: all connect() syscalls were loopback / AF_UNIX / AF_NETLINK.
This machine-confirms the merged Slice 5 + fix-1 code makes no network egress — upgrading
AC-037 from accept-by-reasoning to confirmed. The one-time pass is point-in-time, so the
continuous Slice 40 CI gate (n) is KEPT (wire agent-security.sh on a userns-permissive runner).

Updated the Slice 5 close blocks (impl-plan + plan + board §1), board §5 disposition, and
gate (n) note to reflect the confirmation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…l campaign authoritative, preserve corpus artifacts

Local main (0.8.0 Slices 0+5 campaign) is authoritative for all overlapping
planning docs / ADRs / STATUS / code (resolved -X ours). Brings in origin's
unique corpus-expansion artifacts (tests/corpus/*, dev/corpus-creation/*,
dev/notes/0.8.x-corpus-*, corpus QA prompts, test_corpus_eval_qa.py).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-of-band, owner-managed)

Closes the X3/Slice-40-gate-m doc-map gap for the corpus/eval files brought in by the
origin/main 83f5156 integration. Marked owner-managed; owner curates.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coreyt coreyt merged commit c27028b into main Jun 3, 2026
11 of 14 checks passed
@coreyt coreyt deleted the 0.8.0-slice-5-campaign branch June 3, 2026 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant