Complete rewrite#1
Conversation
Add a Docker-runnable offline benchmark for compaction behavior with pressure-style synthetic scenarios, scoped current/history/recall assertions, and assertion mode for selected compactors. This creates RED probes for exact state recovery, recall recovery, stale-current leakage, bulk offloading, and cache-churn signals before broader cache-aware compaction work. Validation: node --check on benchmark files; git diff --check; docker build -t pi-vcc-bench .; docker benchmark descriptive/jsonl/assertion runs.
Add deterministic evidence extraction so compacted current state keeps exact paths, error signatures, IDs, and commit-ish hashes needed for continuation. Large tool errors now retain salient failure lines while omitting low-value log bulk from the active prompt, and corrected preferences supersede stale positive guidance across summary merges. Validation: node --check on changed TypeScript files; git diff --check; Docker benchmark descriptive/jsonl runs; docker run pi-vcc-bench --compactors pi-vcc --assert; docker run pi-vcc-bench --compactors cache-aware-layered --assert; focused Bun tests for build-sections, compile, and format. Full clean Docker bun test still lacks peer/runtime modules (@mariozechner/pi-coding-agent, @sinclair/typebox).
Extend the offline compaction benchmark from compacted-summary churn to simulated provider-prompt churn. Each cycle now composes stable provider/tool/project layers, the compactor output layers, and a kept raw tail, then reports full-prompt LCP, first changed prompt layer, stable prefix tokens, and per-layer token deltas. Validation: node --check bench/compaction/offline-runner.ts scripts/bench-compaction.ts; git diff --check; docker build -t pi-vcc-bench .; docker run --rm --entrypoint bun pi-vcc-bench scripts/bench-compaction.ts; docker run --rm pi-vcc-bench --compactors pi-vcc --assert.
Treat current blocker, blocker update, status update, and next step messages as volatile state rather than stable session goals. This keeps the goal section more cache-stable across repeated compactions while preserving the latest blocker in outstanding context and transcript layers. Validation: node --check src/extract/goals.ts tests/extract-goals.test.ts; docker run --rm -v "/home/fl/code/personal/pi-vcc":/work -w /work oven/bun:1.3.13 bun test tests/extract-goals.test.ts; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert.
Order stable user preferences before volatile outstanding context and split pi-vcc current summary sections into simulated prompt layers for benchmark cache metrics. The cache-bust scenario now identifies Outstanding Context as the first changed prompt layer, making stable-prefix effects visible before live provider probes. Validation: node --check src/core/format.ts src/core/summarize.ts bench/compaction/offline-runner.ts; git diff --check; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; focused Bun tests for format and compile.
Add an optional JSONL session loader so the compaction benchmark can replay mounted Pi sessions without depending on pi-core node_modules. Real-session cases generate compaction points and report size, latency, and cache-churn metrics without gold assertions, complementing the synthetic RED probes. Validation: node --check bench/compaction/real-sessions.ts scripts/bench-compaction.ts; git diff --check; docker build -t pi-vcc-bench .; docker run --rm -v ~/.pi/agent/sessions:/sessions:ro pi-vcc-bench --real-only --real-sessions-dir /sessions --real-limit 1 --compactors pi-vcc --jsonl; docker run --rm pi-vcc-bench --compactors pi-vcc --assert.
Add optional layer-diff diagnostics and a real-session-shaped regression case so cache churn can be inspected without manually parsing large JSON outputs. The diagnostics showed legitimate scope additions in Session Goal and highlighted noisy evidence extraction as the next churn source. Tighten evidence extraction to avoid broad documentation paths, environment-style constants, and unlabeled decimal/hex values as stable handles. Overflow suffixes now avoid exact count churn, and brief-only fresh updates survive summary merges so volatile status remains in transcript instead of disappearing. Validation: node --check on changed benchmark and summary files; git diff --check; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker real-session replay with --show-layer-diff; focused Bun tests for build-sections and compile.
Add --assert-cache as a separate benchmark gate for synthetic cache-stability probes. Correctness assertions remain focused on recovery/leak checks, while cache assertions verify volatile-only updates do not rewrite early stable prompt layers or collapse the stable prefix below the configured threshold. Validation: node --check bench/compaction/offline-runner.ts scripts/bench-compaction.ts; git diff --check; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache.
Render later scope changes in a Current Scope section instead of appending them to Session Goal. This keeps the original objective stable for cache reuse while preserving legitimate user scope extensions and keeping status-like updates volatile. Also keep brief-only fresh updates during summary merges so status/next-step turns are not dropped when they do not produce header sections. Validation: node --check on changed summary and test files; git diff --check; focused Bun tests for extract-goals, build-sections, format, and compile; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; real-session replay with --show-layer-diff.
When merging with an existing summary, demote fresh goal-like lines into Current Scope so Session Goal remains the stable original objective. Status-only windows keep the prior Current Scope, and direct preference/status-table lines are filtered from stable goals. This moves the sampled real-session first changed layer from Session Goal to Current Scope while preserving scope and continuation terms in the active prompt. Validation: node --check src/core/summarize.ts src/extract/goals.ts tests/compile.test.ts tests/extract-goals.test.ts; focused Bun tests for compile and extract-goals; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; real-session replay with --show-layer-diff.
Skip copied error/stack-trace lines during preference extraction so phrases like 'always include the lines below' do not become durable user preferences. Real-session diagnostics still show legitimate preference growth, but the bogus SYNTAX_ERROR stack-trace line is filtered out. Validation: node --check src/extract/preferences.ts tests/extract-preferences.test.ts; focused Bun preference tests; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; real-session replay with --show-layer-diff.
Exclude pasted Kubernetes/config fragments, shell prompts, and structured log lines from goal and current-scope extraction. This keeps copied diagnostic output from bloating Current Scope while preserving real user scope updates. Validation: node --check src/extract/goals.ts tests/extract-goals.test.ts; focused Bun extract-goals tests; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; real-session replay with --show-layer-diff.
Add a Docker-backed ref comparison runner that builds isolated git worktrees for a baseline and head ref, runs the same compaction benchmark in each image, and writes paired JSONL plus a Markdown delta report. Document the original-vs-implementation workflow and use 53dc551 as the practical runnable baseline for the current benchmark harness. Validation: node --check scripts/compare-compaction-refs.mjs; git diff --check; node scripts/compare-compaction-refs.mjs --head HEAD --compactors pi-vcc --case-filter cache-bust --out /tmp/pi-vcc-ref-compare.gKFg5K; node scripts/compare-compaction-refs.mjs --head HEAD --compactors pi-vcc --real-only --real-sessions-dir ~/.pi/agent/sessions --real-limit 1 --show-layer-diff --out /tmp/pi-vcc-ref-compare-real.lUVm68.
Add compileWithLayers so benchmarks can consume the production-rendered compaction layers directly instead of maintaining benchmark-side parsing of the final summary text. The existing compile API remains a text-only wrapper with unchanged output. Update the pi-vcc offline compactor to use the production layer metadata while preserving activePromptState and existing benchmark metrics. Validation: node --check src/core/summarize.ts bench/compaction/offline-runner.ts tests/compile.test.ts; focused Docker Bun compile tests; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; git diff --check; node scripts/compare-compaction-refs.mjs --head HEAD --compactors pi-vcc --case-filter cache-bust --out /tmp/pi-vcc-layer-ref.nFbWTN; real-session Docker replay with --show-layer-diff.
Introduce a structured compaction state between extracted section data and rendered summaries. The renderer owns deterministic current-section ordering plus separate history and recall layers, while compile() preserves the existing text output. compileWithLayers now builds from the structured state before merging, which gives the cache benchmark a production representation to compare as later cache-aware rendering becomes more layered. Validation: node --check src/core/compaction-state.ts src/core/summarize.ts tests/compaction-state.test.ts tests/compile.test.ts; Docker Bun tests for compaction-state and compile; git diff --check; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; node scripts/compare-compaction-refs.mjs --head HEAD --compactors pi-vcc --case-filter cache-bust --out /tmp/pi-vcc-state-ref.MFhfNq.
Parse merged summary text back into CompactionState and render the final text/layers through the structured renderer. This removes the remaining ad hoc final layer construction from summarize while preserving compile() output. The structured path now covers fresh extraction, merged state reconstruction, deterministic rendering, and compileWithLayers metadata, preparing the implementation for section-level patching without changing the public summary format. Validation: node --check src/core/compaction-state.ts src/core/summarize.ts tests/compaction-state.test.ts tests/compile.test.ts; Docker Bun tests for compaction-state and compile; git diff --check; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; node scripts/compare-compaction-refs.mjs --head HEAD --compactors pi-vcc --case-filter cache-bust --out /tmp/pi-vcc-state-parse-ref.b3CuCT.
Move high-volatility Current Scope after the stable current sections in the structured compaction renderer. This preserves the public summary format while pushing ordinary scope churn later in the prompt prefix. Sampled real-session replay now first changes at Evidence Handles instead of Current Scope, with stablePrefixTokens 248 and 284 for cycles 2 and 3. Validation: node --check src/core/compaction-state.ts src/core/summarize.ts tests/compaction-state.test.ts tests/compile.test.ts; Docker Bun tests for compaction-state and compile; git diff --check; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; real-session Docker replay with --show-layer-diff.
Add a synthetic case where stable work state remains fixed while new evidence handles appear across compactions. The probe captures the current bottleneck: Evidence Handles is the first changed prompt layer while correctness terms remain preserved. A split evidence-layer experiment was tested and reverted because it regressed cache metrics on both the new probe and sampled real-session replay. Validation: node --check bench/compaction/synthetic-cases.ts; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --case-filter cache-bust-evidence-growth --show-layer-diff --jsonl; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache.
Normalize path evidence before it enters the compacted state: strip punctuation variants and drop broad absolute directories while retaining specific files and tmp artifacts. This reduces noisy Evidence Handles churn without changing the current summary structure. The evidence layer split experiment was not kept because it regressed stable-prefix metrics. The focused evidence-growth probe remains as the RED signal for this bottleneck. Validation: node --check src/extract/evidence.ts tests/extract-evidence.test.ts; Docker Bun extract-evidence tests; git diff --check; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --case-filter cache-bust-evidence-growth --show-layer-diff --jsonl; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; sampled real-session replay; ref comparisons in /tmp/pi-vcc-evidence-noise-ref.G8zNvv and /tmp/pi-vcc-evidence-noise-real-ref.5GQES8.
Keep the existing Evidence Handles section stable when merging with a previous summary and render newly discovered handles in a later Recent Evidence Handles section. This preserves evidence recoverability while pushing evidence-only churn later in the prompt. Evidence-growth diagnostics now first change at Recent Evidence Handles instead of Evidence Handles, and sampled real-session replay first changes at User Preferences with stablePrefixTokens 328/338 for cycles 2/3. Validation: node --check src/core/compaction-state.ts src/core/summarize.ts tests/compaction-state.test.ts tests/compile.test.ts; Docker Bun tests for compaction-state and compile; git diff --check; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --case-filter cache-bust-evidence-growth --show-layer-diff --jsonl; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; sampled real-session replay with --show-layer-diff.
Keep stable User Preferences byte-identical when a later compaction only discovers additive preferences, and place those new preferences in a later Recent User Preferences section. Corrections still update the stable preference section so stale preferences are removed. Sampled real-session replay now first changes at Current Scope instead of User Preferences, with stablePrefixTokens 339/339 for cycles 2/3. Validation: node --check src/core/compaction-state.ts src/core/summarize.ts tests/compaction-state.test.ts tests/compile.test.ts; Docker Bun tests for compaction-state and compile; git diff --check; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; sampled real-session replay with --show-layer-diff.
Add a scope-growth cache probe and preserve established Current Scope when later compactions discover additive scope updates. New additive scope lines are rendered in Recent Scope Updates so durable scope remains recoverable without rewriting the earlier scope section. The new probe now first changes at Recent Scope Updates with no missing current terms. Sampled real-session replay first changed at Recent Scope Updates with stablePrefixTokens 369/379. Validation: node --check src/core/compaction-state.ts src/core/summarize.ts tests/compaction-state.test.ts tests/compile.test.ts bench/compaction/synthetic-cases.ts; Docker Bun tests for compaction-state and compile; git diff --check; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --case-filter cache-bust-scope-growth --show-layer-diff --jsonl; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; sampled real-session replay with --show-layer-diff.
Extend cache assertions from a single early-layer heuristic to explicit per-case boundaries. Scope, evidence, and volatile-next-step probes now require their first changed prompt layer to land at the intended recent or volatile section with a minimum stable-prefix token floor. Update the ref comparison summary to use the same cache-boundary failure logic and document the expected boundaries in the benchmark README. Validation: node --check bench/compaction/offline-runner.ts scripts/compare-compaction-refs.mjs; git diff --check; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; docker run --rm pi-vcc-bench --compactors pi-vcc --case-filter cache-bust --show-layer-diff --jsonl; node scripts/compare-compaction-refs.mjs --head HEAD --compactors pi-vcc --out /tmp/pi-vcc-cache-gates-ref.g1aMbt.
Add a mutable-tail growth probe and cap rendered recent scope, preference, and evidence sections to the latest items. Cache assertions now enforce the mutable-tail boundary plus maximum recent layer sizes. This keeps stable sections byte-stable while preventing the recent mutable area from growing without bound; older recent details remain recoverable through transcript/recall. Validation: node --check src/core/compaction-state.ts bench/compaction/offline-runner.ts bench/compaction/synthetic-cases.ts scripts/compare-compaction-refs.mjs tests/compaction-state.test.ts; Docker Bun tests for compaction-state and compile; git diff --check; docker build -t pi-vcc-bench .; docker run --rm pi-vcc-bench --compactors pi-vcc --assert; docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache; docker run --rm pi-vcc-bench --compactors pi-vcc --case-filter cache-bust-mutable-tail-growth --show-layer-diff --jsonl; ref comparisons in /tmp/pi-vcc-tail-caps-ref.coNiHu and /tmp/pi-vcc-tail-caps-real.PPkbEn.
Add outlier sections to the ref comparison report for broader real-session runs: worst stable-prefix deltas, largest full-prompt growth, earliest changed head layers, and largest recent mutable layers. A real-limit 5 run shows aggregate improvement but also highlights the next bottlenecks: Commits is often the earliest changed stable layer, and Recent Evidence Handles can still be large in real sessions. Validation: node --check scripts/compare-compaction-refs.mjs; git diff --check; node scripts/compare-compaction-refs.mjs --head HEAD --compactors pi-vcc --real-only --real-sessions-dir ~/.pi/agent/sessions --real-limit 5 --show-layer-diff --out /tmp/pi-vcc-real-limit-5-report-1777388942.
Add project-level agent guidance that frames pi-vcc compaction around expected continuation value: recall fidelity, semantic coherence, working room, retrieval dependence, and cache preservation. The guidance records the current stable/recent layout and benchmark commands future agents should use before claiming cache or correctness improvements. Validation: git diff --check; reviewed AGENTS.md for durable project guidance; reviewer subagent found no must-fix issues and suggested making the baseline ref explicit, which is included.
Emit a separate pi-vcc custom message after extension-driven compaction so users can sanity-check what changed without patching Pi's built-in compaction card. The report stores section policy/status, stable-vs-recent churn, cap warnings, source/kept counts, and machine-readable details on both the compaction details and the UI message.\n\nThe hook skips prior pi-vcc report cards while summarizing to avoid report self-churn, and the existing compile/compileWithLayers APIs are preserved via an internal compilation helper.\n\nValidation:\n- docker run --rm -v "/home/fl/code/personal/pi-vcc":/app -v /home/fl/.npm/_npx/86d717fff1af7182/node_modules:/app/node_modules:ro -w /app oven/bun:1.3.13 bun test tests/before-compact-hook.test.ts tests/compaction-report.test.ts tests/compaction-state.test.ts tests/compile.test.ts\n- docker build -t pi-vcc-bench .\n- docker run --rm pi-vcc-bench --compactors pi-vcc --assert\n- docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache
Add /pi-vcc-report as the follow-up channel for pi-vcc's compact report card. The command can list reports, write Markdown/JSON artifacts for the latest report, show an inline expanded report, or print raw JSON when explicitly requested. Report discovery reads both compaction details and the rendered report-card custom messages while deduping duplicate records.\n\nAlso expose the same report data in the offline benchmark via --include-report and add --explain for human-readable per-cycle rationale, so synthetic and real-session runs can be inspected outside the TUI.\n\nValidation:\n- docker run --rm -v "/home/fl/code/personal/pi-vcc":/app -v /home/fl/.npm/_npx/86d717fff1af7182/node_modules:/app/node_modules:ro -w /app oven/bun:1.3.13 bun test tests/compaction-report-command.test.ts tests/compaction-report-history.test.ts tests/compaction-report.test.ts tests/before-compact-hook.test.ts tests/compile.test.ts\n- docker run --rm -v "/home/fl/code/personal/pi-vcc":/app -w /app oven/bun:1.3.13 bun scripts/bench-compaction.ts --compactors pi-vcc --case-filter cache-bust-scope-growth --include-report --jsonl\n- docker run --rm -v "/home/fl/code/personal/pi-vcc":/app -w /app oven/bun:1.3.13 bun scripts/bench-compaction.ts --compactors pi-vcc --case-filter cache-bust-scope-growth --explain\n- docker build -t pi-vcc-bench .\n- docker run --rm pi-vcc-bench --compactors pi-vcc --assert\n- docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache
Add RED cache-boundary probes for two real-session outliers: additive commits rewriting the stable Commits layer and a single long evidence line bloating Recent Evidence Handles. The probes failed before the implementation because commits changed Pi VCC Commits and long path lists exceeded the recent evidence cap.\n\nRoute additive commits to bounded Recent Commits while keeping established Commits stable, and clip evidence values/lines with a stable (+more) suffix so recent evidence remains useful without growing unbounded. Update docs and report policy to include Recent Commits.\n\nValidation:\n- docker run --rm -v "/home/fl/code/personal/pi-vcc":/app -v /home/fl/.npm/_npx/86d717fff1af7182/node_modules:/app/node_modules:ro -w /app oven/bun:1.3.13 bun test tests/compaction-state.test.ts tests/compile.test.ts tests/extract-evidence.test.ts tests/compaction-report.test.ts\n- docker run --rm -v "/home/fl/code/personal/pi-vcc":/app -w /app oven/bun:1.3.13 bun scripts/bench-compaction.ts --compactors pi-vcc --case-filter cache-bust-commit-growth --assert-cache --show-layer-diff --jsonl\n- docker run --rm -v "/home/fl/code/personal/pi-vcc":/app -w /app oven/bun:1.3.13 bun scripts/bench-compaction.ts --compactors pi-vcc --case-filter cache-bust-long-evidence-line --assert-cache --show-layer-diff --jsonl\n- docker build -t pi-vcc-bench .\n- docker run --rm pi-vcc-bench --compactors pi-vcc --assert\n- docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache
Add cache-boundary probes for verbose Recent Scope Updates and Recent User Preferences entries. The probes failed on layer-size gates before the fix, which showed that a few long lines could bloat late mutable sections even when the first changed layer was correct.\n\nRender recent scope and preference items with bounded middle clipping and a stable (+more) marker, and lower their recent item caps so older verbose details remain recoverable through history/recall instead of occupying active prompt space. The clipping keeps leading identifiers and a short tail to preserve useful continuation cues.\n\nValidation:\n- docker run --rm -v "/home/fl/code/personal/pi-vcc":/app -v /home/fl/.npm/_npx/86d717fff1af7182/node_modules:/app/node_modules:ro -w /app oven/bun:1.3.13 bun test tests/compaction-state.test.ts tests/compile.test.ts tests/extract-evidence.test.ts tests/compaction-report.test.ts\n- docker run --rm -v "/home/fl/code/personal/pi-vcc":/app -w /app oven/bun:1.3.13 bun scripts/bench-compaction.ts --compactors pi-vcc --case-filter cache-bust-long-scope-line --assert-cache --show-layer-diff --jsonl\n- docker run --rm -v "/home/fl/code/personal/pi-vcc":/app -w /app oven/bun:1.3.13 bun scripts/bench-compaction.ts --compactors pi-vcc --case-filter cache-bust-long-preference-line --assert-cache --show-layer-diff --jsonl\n- docker build -t pi-vcc-bench .\n- docker run --rm pi-vcc-bench --compactors pi-vcc --assert\n- docker run --rm pi-vcc-bench --compactors pi-vcc --assert-cache
Move the model-reference stitcher into shared code and use it for live MRC output so KEEP chunks are rendered verbatim with stable ids and REF/bundle entries remain addressable. Add read-context chunks and previous KEEP carry-forward with collision aliases, while dropping stale prior preference chunks after corrections. The benchmark selector now shares the live stitcher path. Validation: model-reference-selector --assert, pi-vcc --assert, pi-vcc --assert-cache, and live MRC smoke all passed.
Move dynamic MRC reference handles out of the compaction summary and into hidden append-only reference journal messages created after normal agent turns. This keeps compaction summaries more cache-stable while preserving discoverable handles in the transcript suffix. Add vcc_lookup for exact handle/query/list retrieval from reference journal details, and skip MRC reference journal messages during compaction summarization. Validation: pi-vcc --assert, pi-vcc --assert-cache, model-reference-selector --assert, and live summary/reference journal smoke all passed.
Persist tiny per-turn MRC anchors for compaction continuity while keeping full reference bodies in non-context state. The model-call postfix now renders only the latest compaction stash, filters refs already visible in context, and gives explicit guidance to keep handles internal. Record the stashed ref index in compaction details so vcc_lookup can resolve handles after compaction. Source-derived read-context refs are reduced to path/symbol locators so agents reread repository source instead of relying on stale copied bodies. Validation: docker build -t pi-vcc-bench .; pi-vcc --assert; pi-vcc --assert-cache; model-reference-selector --assert; anchor/stash/no-precompaction/guidance/source-locator smokes.
Make the public extension surface MRC-only: package/docs move to pi-mrc, the command surface becomes /pi-mrc plus pi-mrc report/debug controls, config moves to pi-mrc-config.json, and mrc_lookup replaces vcc_lookup. Remove fuzzy recall from the registered public surface and remove strategy toggles. The extension now uses model-reference compaction by default, with exact handle lookup as the retrieval path; the old structured compactor remains only as an internal benchmark baseline. Validation: docker build -t pi-mrc-bench .; model-reference-selector --assert; legacy pi-vcc --assert and --assert-cache while retained in the harness; public surface smoke confirmed no recall/vcc registrations and mrc_lookup availability.
Update the README install paths and package repository metadata to use the renamed BadLiveware/pi-model-reference-compactor repository. Remove the obsolete npm-oriented install guidance while keeping clone/local install examples.
There was a problem hiding this comment.
30 issues found across 55 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/extract/goals.ts">
<violation number="1" location="src/extract/goals.ts:14">
P2: `PREFERENCE_WITH_TASK_RE` is a near-duplicate of `TASK_RE`, differing only by the omission of "test". If unintentional, "don't use X, test with Y" will be wrongly filtered. Either reference `TASK_RE` directly (if semantics should match) or add a comment explaining the deliberate difference.</violation>
<violation number="2" location="src/extract/goals.ts:46">
P2: `isPreferenceOnly` unconditionally returns `true` when `DIRECT_PREFERENCE_RE` matches, even if the text also contains a task verb. This causes lines like "please use X to fix Y" or "I prefer to implement Z" to be incorrectly filtered out as preference-only, losing legitimate goals.</violation>
</file>
<file name="src/commands/pi-vcc-report.ts">
<violation number="1" location="src/commands/pi-vcc-report.ts:32">
P2: Empty `catch {}` silently swallows errors from `getEntries()`. Consider at minimum adding a comment explaining why errors are expected, or logging at debug level so failures are diagnosable.</violation>
</file>
<file name="src/commands/pi-vcc-strategy.ts">
<violation number="1" location="src/commands/pi-vcc-strategy.ts:20">
P2: The function name `registerPiVccMrCommand` is misleading — it registers three commands (`pi-vcc-mr`, `pi-vcc-pv`, `pi-vcc-off`), not just the model-reference one. Consider renaming to something like `registerPiVccStrategyCommands`.</violation>
</file>
<file name="scripts/bench-compaction.ts">
<violation number="1" location="scripts/bench-compaction.ts:21">
P2: Validate that `realLimitRaw` parses to a valid integer. `Number.parseInt` returns `NaN` for non-numeric input, and passing `NaN` as a limit could silently produce empty results or unexpected behavior downstream.</violation>
</file>
<file name="bench/compaction/real-sessions.ts">
<violation number="1" location="bench/compaction/real-sessions.ts:15">
P3: Use `join` from `node:path` instead of manual string concatenation for constructing file paths. The module is already imported and `join` handles trailing-slash normalization automatically.</violation>
</file>
<file name="src/tools/lookup.ts">
<violation number="1" location="src/tools/lookup.ts:85">
P2: The `list` parameter is defined in the schema but never read in `execute`. Listing happens as a fallthrough when neither `ref` nor `query` is set, so `list` has no effect on behavior. If the intent is to let `list: true` force listing mode (even alongside other params), add an explicit check; otherwise remove the unused parameter to avoid misleading callers.</violation>
</file>
<file name="README.md">
<violation number="1" location="README.md:141">
P2: The features list still claims "5 semantic sections" but the updated sections table now documents 8+ distinct sections (adding Evidence Handles, Current Scope, and multiple Recent\* sections). Update the feature bullet to reflect the actual section count.</violation>
</file>
<file name="bench/compaction/offline-runner.ts">
<violation number="1" location="bench/compaction/offline-runner.ts:273">
P2: Duplicate function: `leakProbe` is identical to `termProbe`. Consider reusing `termProbe` directly where `leakProbe` is called, or differentiate the semantics if leak detection should behave differently (e.g., checking presence regardless of source applicability).</violation>
</file>
<file name="src/strategies/model-reference.ts">
<violation number="1" location="src/strategies/model-reference.ts:66">
P2: The empty `catch {}` silently swallows all errors when reading the auth file (missing file, permission denied, malformed JSON). Consider logging at debug level so users can diagnose why the real classifier isn't being used.</violation>
<violation number="2" location="src/strategies/model-reference.ts:97">
P2: The `classifierMs` stat actually measures the entire `compactWithModelReference` function execution time (normalization, filtering, chunking, classification, and rendering), not just the classifier latency. Rename to `totalMs` or measure only the classification step separately.</violation>
</file>
<file name="bench/compaction/synthetic-cases.ts">
<violation number="1" location="bench/compaction/synthetic-cases.ts:128">
P2: Duplicate `recallTerms` property in object literal — the second silently overwrites the first. If both are intentionally identical, remove the duplicate.</violation>
</file>
<file name="src/core/classifier.ts">
<violation number="1" location="src/core/classifier.ts:150">
P1: Parsing bug: the first line after the SUBGOALS section is always skipped. When a non-subgoal line (e.g., `KEEP:`) is encountered while `inSubgoals` is true, the code sets `inSubgoals = false` but still executes `continue`, so that line is never processed by subsequent matchers.</violation>
</file>
<file name="src/core/chunk-model.ts">
<violation number="1" location="src/core/chunk-model.ts:28">
P3: The doc comment example `"goal:0"` is misleading — the actual ID generated for goals would be `"sessionGoal:0"` (since the `section` parameter is `"sessionGoal"`). Update the example to match the real IDs.</violation>
<violation number="2" location="src/core/chunk-model.ts:128">
P2: `RefIndex.entries` uses an inline type that duplicates `RefIndexEntry`. Use the named interface to avoid drift between the two definitions.</violation>
</file>
<file name="src/core/mock-classifier.ts">
<violation number="1" location="src/core/mock-classifier.ts:73">
P2: Operator precedence bug: `&&` binds tighter than `||`, so the `text.length < 120` guard only applies to the `text.includes("/")` branch. The regex alone (which matches nearly any text with a dot, like "e.g", "v1.0") will unconditionally return `SCORE.FILE_PATH`. Add parentheses to clarify or gate both branches with the length check.</violation>
</file>
<file name="src/commands/pi-vcc-dump-context.ts">
<violation number="1" location="src/commands/pi-vcc-dump-context.ts:39">
P2: ``includes("--raw")`` also matches `--raw-context`. Consider using a word-boundary check (e.g., `/\b--raw\b/` or splitting args) to avoid the false positive. Currently safe only because `isRawContext` is checked first, but this coupling is fragile.</violation>
<violation number="2" location="src/commands/pi-vcc-dump-context.ts:107">
P2: The `--raw` check is placed after context extraction, so if extraction fails the function returns early with an error before reaching the `--raw` handler. Since `dumpRawSessionJsonl` reads the session file directly and doesn't need `extracted`, move the `--raw` check before context extraction (similar to how `--raw-context` is handled).</violation>
</file>
<file name="src/core/dump-context.ts">
<violation number="1" location="src/core/dump-context.ts:311">
P2: Unused `key` variable — deduplication is case-sensitive here unlike the identical pattern used for decisions above. `key` should be used for case-insensitive deduplication, or a `Set` should be employed for O(1) lookups (matching the `seenDecisions` pattern).</violation>
</file>
<file name="src/core/compaction-report-history.ts">
<violation number="1" location="src/core/compaction-report-history.ts:27">
P2: Type guard `isPiVccCompactionReport` validates only 5 of ~15 required fields on `PiVccCompactionReport`. If a partially-formed object satisfies the minimal checks (has `compactor`, `version`, `sections`, `sourceMessageCount`, `tokensBefore`), downstream code in `recordKeyOf` will read `keptMessageCount` and `summaryChars` as `undefined`, causing silent deduplication inconsistencies. Consider validating at least the fields used in `recordKeyOf`.</violation>
</file>
<file name="src/hooks/before-compact.ts">
<violation number="1" location="src/hooks/before-compact.ts:262">
P2: Second `getSessionStrategy() === "off"` check contradicts the earlier "always handle explicit /pi-vcc marker" logic. When a user explicitly runs `/pi-vcc` with session strategy "off", the function passes the first guard, computes stats, then silently bails here—giving no feedback to the user about why their command didn't work. Either this check should also be guarded with `!isPiVcc`, or the first check's comment is misleading.</violation>
<violation number="2" location="src/hooks/before-compact.ts:284">
P1: Model-reference strategy path returns `details` without `compactor: "pi-vcc"` and never sets `pendingReport`, so the compaction report message is silently lost. The `session_compact` handler can't extract the report from either `event.compactionEntry.details` (no `compactor` field) or `pendingReport` (null). Also `lastCompactWasPiVcc` is not set, breaking toast suppression for explicit `/pi-vcc` invocations.</violation>
</file>
<file name="src/core/model-reference-stitch.ts">
<violation number="1" location="src/core/model-reference-stitch.ts:41">
P3: `KNOWN_KINDS` includes `"recall"` but `KIND_ORDER` does not define a sort position for it. If recall chunks ever appear in keep sections, they'll sort unpredictably at position 99. Consider adding an explicit entry or a comment explaining the omission.</violation>
<violation number="2" location="src/core/model-reference-stitch.ts:200">
P2: Convert `keepIds` to a `Set` before filtering to avoid O(n×m) lookups. `bundledIds` already uses a Set on the previous line — apply the same pattern here.</violation>
</file>
<file name="src/core/compaction-state.ts">
<violation number="1" location="src/core/compaction-state.ts:215">
P3: Redundant filter: when `includeRecallNote` is false, no recall layer exists in `layers`, so the `.filter(...)` branch is always a no-op. This makes the intent unclear to a reader. Consider simplifying to `const bodyLayers = layers;` (the recall layer is only ever added when `includeRecallNote` is true).</violation>
</file>
<file name="scripts/compare-compaction-refs.mjs">
<violation number="1" location="scripts/compare-compaction-refs.mjs:213">
P2: Use `!= null` (loose) instead of `!== null` (strict) to also exclude `undefined` values. When `stablePrefixTokens` is absent from a JSONL row, `undefined !== null` passes the filter, producing `NaN` deltas that corrupt the sort and report output.</violation>
<violation number="2" location="scripts/compare-compaction-refs.mjs:317">
P2: Docker images built during the run (`pi-vcc-bench-baseline-*`, `pi-vcc-bench-head-*`) are never removed. Consider adding `docker rmi` calls in the `finally` block (or gated behind `!keepWorktrees`) to avoid accumulating stale images on repeated runs.</violation>
</file>
<file name="src/core/build-sections.ts">
<violation number="1" location="src/core/build-sections.ts:101">
P2: Single `pendingReadPath` variable doesn't handle parallel tool calls correctly. When multiple read tool_calls appear consecutively (common with parallel tool use), each overwrites the previous path. This causes the first result to be attributed to the wrong file and subsequent results to be skipped entirely.
Consider using a queue/map keyed by tool call index, or pairing tool_calls with tool_results by `sourceIndex`.</violation>
</file>
<file name="src/core/compaction-report.ts">
<violation number="1" location="src/core/compaction-report.ts:77">
P2: These hardcoded section-name sets can silently fall out of sync with `CURRENT_SECTION_ORDER`. Consider deriving them from `CURRENT_SECTION_ORDER` or adding a compile-time/test assertion that their union equals `CURRENT_SECTION_ORDER` to prevent silent misclassification of new sections.</violation>
<violation number="2" location="src/core/compaction-report.ts:199">
P3: Duplicated limit-lookup logic between `capOf` and the `limit` field in `buildCompactionReport`. Extract a shared `limitOf(title)` helper so the two cannot diverge.</violation>
</file>
Partial review: This PR has more than 50 files, so cubic reviewed the highest-priority files first. During the trial, paid plans get a higher file limit.
You can try an ultrareview to bypass the file limit, comment @cubic-dev-ai ultrareview. Learn more.
Shadow auto-approve: would not auto-approve because issues were found.
Fix all with cubic
There was a problem hiding this comment.
5 issues found across 31 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="package.json">
<violation number="1" location="package.json:15">
P2: The npm package scope (`@sting8k`) doesn't match the GitHub repository owner (`BadLiveware`). This inconsistency makes it harder for consumers to verify package provenance and locate the canonical source.</violation>
</file>
<file name="src/core/model-reference-stitch.ts">
<violation number="1" location="src/core/model-reference-stitch.ts:5">
P2: The recall note no longer mentions `bundle:*` handles, though `renderRetrievableIndex` in this file still emits `bundle:*` entries. Consider adding a bullet or expanding the `ref:*` bullet to also cover `bundle:*` handle lookup, so the model knows both handle types are resolvable via `mrc_lookup`.</violation>
</file>
<file name="src/core/format.ts">
<violation number="1" location="src/core/format.ts:23">
P2: This `RECALL_NOTE` is appended to all compaction summaries (including the default/pv strategy path via `renderCompactionState(..., { includeRecallNote: true })` in `summarize.ts`). However, its new text references MRC-specific concepts ("MRC handles", "pi-mrc") that only exist when the model-reference strategy is active. Non-MRC sessions will receive a confusing instruction about handles that aren't present in their summary.</violation>
</file>
<file name="src/hooks/before-compact.ts">
<violation number="1" location="src/hooks/before-compact.ts:208">
P2: Anchor messages (`PI_MRC_ANCHOR_TYPE`) are collected by `messageFromEntry` but not filtered by `isInternalMessage`, so they leak into the summarizer input. Import and include `isMrcAnchorMessage` in the filter to match how report and reference messages are already excluded.</violation>
</file>
<file name="src/core/mrc-reference-journal.ts">
<violation number="1" location="src/core/mrc-reference-journal.ts:216">
P2: Missing per-ref validation in `refsFromMrcReferenceEntries` — unlike `refsFromCompactionDetails` which filters `ref?.id && ref?.text`, this function pushes all entries from the array without checking required fields. Add a similar filter to guard against malformed stored data.</violation>
</file>
Shadow auto-approve: would not auto-approve because issues were found.
Fix all with cubic
Fix true PR review findings across MRC classification, reports, lookup, and benchmarks. This includes the file-path precedence bug, SUBGOALS parser fallthrough, exact list-mode handling in mrc_lookup, malformed ref filtering, report limit/type-guard drift, benchmark real-limit validation, duplicate synthetic recall terms, and parallel read attribution for read-context locators. Dismissed stale or intentionally incompatible comments: removed strategy/recall surfaces no longer exist after the pi-mrc rename, and MRC anchor messages intentionally remain visible to compaction as handle breadcrumbs. Validation: docker build -t pi-mrc-bench .; model-reference-selector --assert; legacy pi-vcc --assert; legacy pi-vcc --assert-cache; focused smokes for invalid --real-limit, public surface, and parallel read source-locator refs.
There was a problem hiding this comment.
5 issues found across 21 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="scripts/bench-compaction.ts">
<violation number="1" location="scripts/bench-compaction.ts:22">
P2: `--real-limit` validation is bypassed for malformed numeric strings because `parseInt` truncates input (for example `1.5` or `10abc`).</violation>
</file>
<file name="src/core/classifier.ts">
<violation number="1" location="src/core/classifier.ts:175">
P2: Unconditionally ending `SUBGOALS` on the first non-matching line can silently discard later valid subgoal entries after one malformed status line.</violation>
</file>
<file name="src/core/build-sections.ts">
<violation number="1" location="src/core/build-sections.ts:111">
P2: Failed read results are not dequeued, which desynchronizes `pendingReadPaths` and can pair later read outputs with the wrong file path.</violation>
</file>
<file name="src/hooks/before-compact.ts">
<violation number="1" location="src/hooks/before-compact.ts:185">
P3: The report reason text mislabels `totalMs` as classifier/heuristic time, even though it is end-to-end compaction time.</violation>
</file>
<file name="src/tools/lookup.ts">
<violation number="1" location="src/tools/lookup.ts:83">
P2: `list` currently takes precedence over `ref`, so requests containing both fields skip exact handle lookup. Prefer handling `ref` when present (or explicitly gate list to ref-absent calls).</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Shadow auto-approve: would not auto-approve because issues were found.
Fix all with cubic
Apply the true follow-up PR feedback: make dump-context flag parsing exact and allow --raw to bypass extraction, consume failed read results from the read-context queue, prefer exact lookup when ref is present, tighten --real-limit validation, preserve later subgoals after malformed lines, and clean up small docs/report/comment issues. Also simplify redundant compaction-state filtering and use path.join for real-session traversal. Validation: docker build -t pi-mrc-bench .; model-reference-selector --assert; legacy pi-vcc --assert; legacy pi-vcc --assert-cache; focused smokes for malformed --real-limit values, dump-context flags, lookup precedence, and failed-read queue handling.
Consume a read queue slot even when a read tool call has no extractable path so later read results cannot inherit the wrong path. Also add the missing cache-boundary cases to the comparison script's local cross-ref gate copy. Validation: docker build -t pi-mrc-bench .; model-reference-selector --assert; legacy pi-vcc --assert; legacy pi-vcc --assert-cache; focused smokes for pathless read queue handling and comparison cache-boundary coverage.
Move cache boundary thresholds into one shared JSON file used by both the offline runner and ref comparison report, and treat a null firstChangedPromptLayer as no disallowed layer in comparison metrics. Remove the unused classifyWithFallback export. Make /pi-mrc-dump-context --raw-context write only the latest captured context payload, preserving full AgentMessage structure without markdown, truncation, or wrapper metadata. Validation: docker build -t pi-mrc-bench .; model-reference-selector --assert; legacy pi-vcc --assert; legacy pi-vcc --assert-cache; local node --check for compare-compaction-refs.mjs; focused Docker smokes for raw-context payload shape, classifier exports, shared cache boundaries, and long cache-boundary gate.
Buffer the before_provider_request payload alongside the pre-provider AgentMessage context so users can audit what Pi is about to send to the model provider. Add /pi-mrc-dump-context --raw-provider (aliases --raw-request and --raw-model) for the provider payload, while keeping --raw-context as the raw AgentMessage[] context payload. Validation: docker build -t pi-mrc-bench .; focused Docker smoke verified context/provider dumps and before_provider_request hook registration.
There was a problem hiding this comment.
1 issue found across 5 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="scripts/compare-compaction-refs.mjs">
<violation number="1" location="scripts/compare-compaction-refs.mjs:91">
P2: Cache failure counting no longer flags missing `firstChangedPromptLayer`, diverging from the benchmark’s cache gate definition and underreporting regressions.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Shadow auto-approve: would not auto-approve because issues were found.
Fix all with cubic
Count a missing firstChangedPromptLayer as a cache failure in compare-compaction-refs so comparison reports match the offline benchmark cache gate definition. Validation: node --check scripts/compare-compaction-refs.mjs; parsed shared cache-boundaries.json; git diff --check.
There was a problem hiding this comment.
2 issues found across 4 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="index.ts">
<violation number="1" location="index.ts:28">
P1: Provider request payloads are being buffered to disk unconditionally, bypassing the existing debug opt-in and increasing privacy risk for sensitive prompt/tool data.</violation>
</file>
<file name="src/core/context-buffer.ts">
<violation number="1" location="src/core/context-buffer.ts:97">
P1: Redact sensitive fields before persisting provider request payloads; this currently stores full request data in plaintext under `/tmp`.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Shadow auto-approve: would not auto-approve because issues were found.
Fix all with cubic
Keep the durable Overarching section, but replace finite SUBGOALS statuses with priority-ordered CURRENT threads and anti-rework COMPLETED threads. The classifier no longer asks for UPCOMING/DEFERRED statuses and the renderer emits Current Threads / Completed Threads. The parser accepts legacy SUBGOALS for resilience but maps only CURRENT and COMPLETED into the new thread model, ignoring old project-board statuses. Validation: docker build -t pi-mrc-bench .; thread parser/render smokes; legacy SUBGOALS compatibility smoke; model-reference-selector --assert; pi-vcc --assert; pi-vcc --assert-cache.
Keep the Overarching plus CURRENT/COMPLETED prioritization model, but rename the type, classifier output, parsed field, and rendered sections back to subgoals for clearer product terminology. Render Current Subgoals and Completed Subgoals; retain THREADS parsing only as a compatibility fallback. Validation: docker build -t pi-mrc-bench .; subgoal parse/render smoke; legacy THREADS compatibility smoke; model-reference-selector --assert; pi-vcc --assert; pi-vcc --assert-cache.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6fdb77b. Configure here.
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/core/model-reference-stitch.ts">
<violation number="1" location="src/core/model-reference-stitch.ts:131">
P2: Completed subgoal bullets reuse the `- <id> —` shape reserved for KEEP chunks, which can cause prior-summary KEEP parsers to misclassify subgoal labels as chunk IDs.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Shadow auto-approve: would not auto-approve because issues were found.
Fix all with cubic
Only buffer raw provider request payloads when pi-mrc debug mode is enabled, and redact sensitive keys before persisting them under /tmp. Make /pi-mrc-on and /pi-mrc-off track disabled state by session file instead of a process-global boolean, and propagate the per-session check to compaction and reference journal hooks. Render completed subgoals with a COMPLETED: prefix rather than KEEP-like bullet syntax to avoid prior-summary chunk parser ambiguity. Validation: docker build -t pi-mrc-bench .; focused smokes for provider opt-in/redaction, per-session controls, and completed subgoal rendering; model-reference-selector --assert; pi-vcc --assert; pi-vcc --assert-cache.

Note
High Risk
High risk because it replaces the extension’s compaction path with a new model-assisted MRC flow, adds hidden-reference state/lookup tooling, and introduces optional external LLM API calls that affect session data handling and runtime behavior.
Overview
Switches the extension from
pi-vcctopi-mrc(Model-Reference Compactor), updating naming, package metadata, and user docs to center on KEEP/REF/DROP chunk classification, exact handle lookup viamrc_lookup, and cache-stable prompt shaping.Adds a new MRC runtime surface: compaction interception controls (
/pi-mrc,/pi-mrc-off,/pi-mrc-on), compaction report capture +/pi-mrc-reporthistory/artifacts, and/pi-mrc-dump-contextfor exporting structured context (including optional raw context/provider payload dumps). The before-compact hook now builds an MRC cut, filters internal ref/report messages, produces an MRC report, and records a latest-compaction ref index for lookup/continuity.Introduces a benchmark + Docker workflow: new offline compaction runner, synthetic/real-session fixtures, cache-boundary gates, and scripts to run/compare compactors (including a
model-reference-selectorcompactor that can use a real OpenAI-compatible classifier or a mock).Reviewed by Cursor Bugbot for commit d6bf9db. Bugbot is set up for automated code reviews on this repo. Configure here.