perf(summarize): skip mem::summarize when a fresh summary exists#572
perf(summarize): skip mem::summarize when a fresh summary exists#572efenex wants to merge 3 commits into
Conversation
Stop hooks fire summarize on every assistant turn. For chatty sessions that summarize successfully every few seconds, re-running the whole LLM-driven summarize on the next turn is mostly wasted work — the observation delta at sub-minute granularity rarely changes the narrative meaningfully, and on slow LLM providers (e.g. serverless OpenAI-compat) it stacks: by the time the previous summarize finishes, the next one is already in flight, both contending for the same provider rate budget. This change adds a dedup gate at the top of mem::summarize: if an existing SessionSummary for the requested sessionId is younger than SUMMARIZE_DEDUP_WINDOW_MS (default 90,000 ms, set to 0 to disable), return the cached summary with `skipped: "fresh"` and skip the LLM work entirely. Smoke-validated: - First call against a small session (6 obs): 6.4s, real LLM run, qualityScore 100 → summary stored. - Second call within 90s: 0.02s, `skipped: "fresh"`, ageMs: 77, no LLM calls — returned the cached summary directly. The MetricsStore is still pinged on the dedup path (as a successful call) so per-function latency stats reflect reality. The window is env- configurable but not exposed in any config schema yet; bump it to something like 600,000 ms (10 min) for very chatty sessions if the default ever feels too aggressive. Combined with the new Stop-hook force-exit, the typical summarize call chain (Stop hook → fetch dedup-hit → server returns ~50ms) is now effectively zero-cost on the Claude Code turn boundary, with the LLM work amortized at the dedup window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughmem::summarize now performs a configurable "fresh-summary" deduplication check: if a session summary in KV.summaries is newer than SUMMARIZE_DEDUP_WINDOW_MS (default 90s), it returns the cached summary and skips LLM summarization. ChangesFresh Summary Deduplication
Sequence DiagramsequenceDiagram
participant Handler as mem::summarize
participant Env as Environment
participant KV as KV.summaries
participant Metrics as metricsStore
Handler->>Env: read SUMMARIZE_DEDUP_WINDOW_MS
Handler->>KV: fetch session summary
KV-->>Handler: return summary (with createdAt)
Handler->>Handler: compute ageMs = now - createdAt
alt ageMs <= window && window > 0
Handler->>Metrics: record skip timing (if present)
Metrics-->>Handler: ack
Handler-->>Handler: return { success: true, skipped: "fresh", ageMs, summary }
else
Handler-->>Handler: proceed to LLM summarization flow
end
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/functions/summarize.ts (1)
231-237: ⚡ Quick winReplace the explanatory block comment with self-descriptive code/docstring-level brevity.
This comment explains behavior in detail (“what/why”) inline; keep this concise in code and move rationale to PR/docs if needed.
As per coding guidelines, "Avoid code comments explaining WHAT — use clear naming instead".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/summarize.ts` around lines 231 - 237, Replace the long explanatory block comment above the dedup logic with a concise, self-descriptive comment or docstring and rely on clear naming (e.g., SUMMARIZE_DEDUP_WINDOW_MS, summarize, or fresh-summary dedup variables) to convey behavior; remove the "what/why" discussion and instead write a short one-line comment like "Skip LLM summarization if existing summary is younger than SUMMARIZE_DEDUP_WINDOW_MS" and move the rationale into the PR/docs per guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/functions/summarize.ts`:
- Around line 231-237: Replace the long explanatory block comment above the
dedup logic with a concise, self-descriptive comment or docstring and rely on
clear naming (e.g., SUMMARIZE_DEDUP_WINDOW_MS, summarize, or fresh-summary dedup
variables) to convey behavior; remove the "what/why" discussion and instead
write a short one-line comment like "Skip LLM summarization if existing summary
is younger than SUMMARIZE_DEDUP_WINDOW_MS" and move the rationale into the
PR/docs per guidelines.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d20c4a99-e8bd-4e79-94f1-9d97ce36669a
📒 Files selected for processing (1)
src/functions/summarize.ts
CodeRabbit nitpick on rohitg00#572 — the original block explained the same thing twice. Keep the WHY (sub-minute delta noise + Stop-hook cadence) in two sentences; the env var name + default already document the knob. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (4)
src/functions/summarize.ts (4)
231-234: 💤 Low valueConsider focusing comments on WHY rather than WHAT.
The comment explains what the code does ("Skip the LLM call when..."), but the code itself is fairly self-documenting. The second part explaining the rationale (stop hooks firing frequently) provides valuable context and should be preserved.
As per coding guidelines: "Avoid code comments explaining WHAT — use clear naming instead."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/summarize.ts` around lines 231 - 234, Replace the current two-line comment about skipping the LLM call with a short WHY-focused comment that explains the rationale for the deduplication window (SUMMARIZE_DEDUP_WINDOW_MS) — e.g., that summaries are rate-limited to avoid redundant LLM calls because stop hooks fire on every assistant turn and sub-minute deltas rarely change the observation — and remove the WHAT-level restatement that the code already expresses; update the comment near SUMMARIZE_DEDUP_WINDOW_MS and the stop-hook context around the summarization logic in summarize.ts accordingly.
235-239: ⚡ Quick winExtract deduplication window parsing into a helper function.
The inline parsing of
SUMMARIZE_DEDUP_WINDOW_MSis inconsistent with the existinggetChunkSize()andgetChunkConcurrency()helper functions. Additionally,Number(dedupRaw)is called twice, which is wasteful.♻️ Proposed refactor to match existing pattern
Add this helper function near the other env parsing helpers (after line 51):
function getDedupWindowMs(): number { const raw = process.env.SUMMARIZE_DEDUP_WINDOW_MS; if (raw === undefined) return 90_000; const n = Number(raw); return Number.isFinite(n) && n >= 0 ? n : 90_000; }Then replace lines 235-239 with:
- const dedupRaw = process.env["SUMMARIZE_DEDUP_WINDOW_MS"]; - const dedupWindowMs = - dedupRaw !== undefined && Number.isFinite(Number(dedupRaw)) - ? Math.max(0, Number(dedupRaw)) - : 90_000; + const dedupWindowMs = getDedupWindowMs();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/summarize.ts` around lines 235 - 239, Extract the SUMMARIZE_DEDUP_WINDOW_MS parsing into a helper named getDedupWindowMs() similar to getChunkSize() and getChunkConcurrency(): add a function that reads process.env.SUMMARIZE_DEDUP_WINDOW_MS, returns 90_000 if undefined, converts once to Number, and returns it only if Number.isFinite(n) and n >= 0 otherwise return 90_000; then replace the inline block that computes dedupWindowMs with a call to getDedupWindowMs() so the code uses that helper and avoids calling Number(dedupRaw) twice.
241-243: ⚡ Quick winLog errors when fetching existing summary fails.
The
.catch(() => null)silently swallows all errors from the KV fetch. While falling back to full summarization is safe, logging the error would help diagnose KV store issues.♻️ Proposed logging addition
- const existing = await kv - .get<SessionSummary>(KV.summaries, sessionId) - .catch(() => null); + const existing = await kv + .get<SessionSummary>(KV.summaries, sessionId) + .catch((err) => { + logger.warn("Failed to fetch existing summary for dedup check", { + sessionId, + error: err instanceof Error ? err.message : String(err), + }); + return null; + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/summarize.ts` around lines 241 - 243, The current .catch(() => null) on kv.get<SessionSummary>(KV.summaries, sessionId) swallows errors; change it to catch the error and log it (e.g., .catch(err => { processLogger.error('Failed to fetch summary', { err, sessionId, bucket: KV.summaries }); return null; })) so failures while reading existing remain non-fatal but record the error and context; use the project's logger (processLogger or existing logger) rather than silent swallowing.
245-252: 💤 Low valueConsolidate timestamp captures per coding guidelines.
Lines 245 and 252 both call
Date.now()within the same code block. As per coding guidelines, timestamps should be captured once and reused rather than calling repeatedly.♻️ Consolidate Date.now() calls
if (existing && existing.createdAt) { - const ageMs = Date.now() - Date.parse(existing.createdAt); + const nowMs = Date.now(); + const ageMs = nowMs - Date.parse(existing.createdAt); if (Number.isFinite(ageMs) && ageMs >= 0 && ageMs < dedupWindowMs) { logger.info("Summarize skipped — fresh summary present", { sessionId, ageMs, dedupWindowMs, }); - const latencyMs = Date.now() - startMs; + const latencyMs = nowMs - startMs; if (metricsStore) {As per coding guidelines: "Capture timestamps once with
new Date().toISOString()and reuse rather than calling repeatedly."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/summarize.ts` around lines 245 - 252, The block calls Date.now() twice; capture the current time once and reuse it: compute a single nowMs (or nowIso if an ISO string is required) before calculating ageMs (use nowMs - Date.parse(existing.createdAt)) and compute latencyMs using nowMs - startMs; then pass the reused timestamp/derived values into logger.info. Update references in this block (existing.createdAt, ageMs, dedupWindowMs, startMs, latencyMs, logger.info) to use the single captured timestamp rather than calling Date.now() twice.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/functions/summarize.ts`:
- Around line 231-234: Replace the current two-line comment about skipping the
LLM call with a short WHY-focused comment that explains the rationale for the
deduplication window (SUMMARIZE_DEDUP_WINDOW_MS) — e.g., that summaries are
rate-limited to avoid redundant LLM calls because stop hooks fire on every
assistant turn and sub-minute deltas rarely change the observation — and remove
the WHAT-level restatement that the code already expresses; update the comment
near SUMMARIZE_DEDUP_WINDOW_MS and the stop-hook context around the
summarization logic in summarize.ts accordingly.
- Around line 235-239: Extract the SUMMARIZE_DEDUP_WINDOW_MS parsing into a
helper named getDedupWindowMs() similar to getChunkSize() and
getChunkConcurrency(): add a function that reads
process.env.SUMMARIZE_DEDUP_WINDOW_MS, returns 90_000 if undefined, converts
once to Number, and returns it only if Number.isFinite(n) and n >= 0 otherwise
return 90_000; then replace the inline block that computes dedupWindowMs with a
call to getDedupWindowMs() so the code uses that helper and avoids calling
Number(dedupRaw) twice.
- Around line 241-243: The current .catch(() => null) on
kv.get<SessionSummary>(KV.summaries, sessionId) swallows errors; change it to
catch the error and log it (e.g., .catch(err => { processLogger.error('Failed to
fetch summary', { err, sessionId, bucket: KV.summaries }); return null; })) so
failures while reading existing remain non-fatal but record the error and
context; use the project's logger (processLogger or existing logger) rather than
silent swallowing.
- Around line 245-252: The block calls Date.now() twice; capture the current
time once and reuse it: compute a single nowMs (or nowIso if an ISO string is
required) before calculating ageMs (use nowMs - Date.parse(existing.createdAt))
and compute latencyMs using nowMs - startMs; then pass the reused
timestamp/derived values into logger.info. Update references in this block
(existing.createdAt, ageMs, dedupWindowMs, startMs, latencyMs, logger.info) to
use the single captured timestamp rather than calling Date.now() twice.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8d69d58b-679e-4b92-bd75-ac7d9b79c3f8
📒 Files selected for processing (1)
src/functions/summarize.ts
CodeRabbit had two nits on the dedup block: 1) The comment restated WHAT the code does. Keep only the WHY (Stop-hook cadence + sub-minute delta = wasted LLM work). 2) The inline raw-then-parse pattern was inconsistent with the existing getChunkSize() / getChunkConcurrency() helpers above and called Number(raw) twice. Extracted getDedupWindowMs() with a DEDUP_WINDOW_MS_DEFAULT constant to match the file's style. Behavior unchanged — invalid/negative values still fall back to 90s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/functions/summarize.ts (1)
53-60: 💤 Low valueParsing inconsistency with sibling helpers.
getChunkSize/getChunkConcurrencyuseparseIntand treat missing/invalid input as default, whilegetDedupWindowMsusesNumber(raw). As a side effect,SUMMARIZE_DEDUP_WINDOW_MS=""(empty string) and" "coerce to0viaNumber(), silently disabling dedup rather than falling back to the 90s default. Sibling helpers would fall back to the default in that case. Consider treating empty/whitespace-only as unset for consistency.♻️ Suggested change
function getDedupWindowMs(): number { const raw = process.env["SUMMARIZE_DEDUP_WINDOW_MS"]; - if (raw === undefined) return DEDUP_WINDOW_MS_DEFAULT; + if (raw === undefined || raw.trim() === "") return DEDUP_WINDOW_MS_DEFAULT; const n = Number(raw); return Number.isFinite(n) && n >= 0 ? n : DEDUP_WINDOW_MS_DEFAULT; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/summarize.ts` around lines 53 - 60, The getDedupWindowMs function currently uses Number(raw) and treats empty or whitespace-only SUMMARIZE_DEDUP_WINDOW_MS as 0, which differs from sibling helpers; update getDedupWindowMs to mirror getChunkSize/getChunkConcurrency behavior by treating empty/whitespace-only values as unset and falling back to DEDUP_WINDOW_MS_DEFAULT: trim the raw env string, if it's empty treat as undefined, then parse with parseInt (or Number but validate via Number.isFinite and n >= 0) and return the parsed non-negative finite value or DEDUP_WINDOW_MS_DEFAULT; refer to getDedupWindowMs and DEDUP_WINDOW_MS_DEFAULT when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/functions/summarize.ts`:
- Around line 53-60: The getDedupWindowMs function currently uses Number(raw)
and treats empty or whitespace-only SUMMARIZE_DEDUP_WINDOW_MS as 0, which
differs from sibling helpers; update getDedupWindowMs to mirror
getChunkSize/getChunkConcurrency behavior by treating empty/whitespace-only
values as unset and falling back to DEDUP_WINDOW_MS_DEFAULT: trim the raw env
string, if it's empty treat as undefined, then parse with parseInt (or Number
but validate via Number.isFinite and n >= 0) and return the parsed non-negative
finite value or DEDUP_WINDOW_MS_DEFAULT; refer to getDedupWindowMs and
DEDUP_WINDOW_MS_DEFAULT when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5858cbbe-afc5-4918-8750-a78ed729cf89
📒 Files selected for processing (1)
src/functions/summarize.ts
Summary
Stop hooks fire `mem::summarize` on every assistant turn. For chatty sessions, summarize completes successfully every few seconds — and the next turn fires another full LLM-driven summarize pass. On serverless / rate-limited LLM providers, these stack: by the time one summarize finishes, the next is already in flight, both contending for the same provider rate budget.
This adds a freshness gate at the top of `mem::summarize`: if an existing `SessionSummary` for the requested sessionId is younger than `SUMMARIZE_DEDUP_WINDOW_MS` (default 90,000 ms / 90 s, set to `0` to disable), return the cached summary with `skipped: "fresh"` and skip the LLM work entirely.
Smoke-validated
`MetricsStore` is still pinged on the dedup path (as a successful call) so per-function latency stats stay accurate.
Why now
This complements:
Combined effect: typical Stop-hook → /agentmemory/summarize → dedup-hit chain is now ~50 ms server-side instead of a multi-LLM-call cycle.
Test plan
Notes
The window is env-configurable but not currently exposed in any config schema. Bump to `600000` (10 min) for very chatty long-running sessions if the default feels too aggressive.
🤖 Generated with Claude Code
Summary by CodeRabbit