Skip to content

perf(summarize): skip mem::summarize when a fresh summary exists#572

Open
efenex wants to merge 3 commits into
rohitg00:mainfrom
efenex:feat/v4-d-summarize-freshness-dedup
Open

perf(summarize): skip mem::summarize when a fresh summary exists#572
efenex wants to merge 3 commits into
rohitg00:mainfrom
efenex:feat/v4-d-summarize-freshness-dedup

Conversation

@efenex
Copy link
Copy Markdown
Contributor

@efenex efenex commented May 20, 2026

Summary

Stop hooks fire `mem::summarize` on every assistant turn. For chatty sessions, summarize completes successfully every few seconds — and the next turn fires another full LLM-driven summarize pass. On serverless / rate-limited LLM providers, these stack: by the time one summarize finishes, the next is already in flight, both contending for the same provider rate budget.

This adds a freshness gate at the top of `mem::summarize`: if an existing `SessionSummary` for the requested sessionId is younger than `SUMMARIZE_DEDUP_WINDOW_MS` (default 90,000 ms / 90 s, set to `0` to disable), return the cached summary with `skipped: "fresh"` and skip the LLM work entirely.

Smoke-validated

Call Wall-clock Result
First call (small session, 6 obs) 6.4 s real LLM run, qualityScore 100 → summary stored
Second call within 90 s 0.02 s `skipped: "fresh"`, `ageMs: 77`, zero LLM calls

`MetricsStore` is still pinged on the dedup path (as a successful call) so per-function latency stats stay accurate.

Why now

This complements:

Combined effect: typical Stop-hook → /agentmemory/summarize → dedup-hit chain is now ~50 ms server-side instead of a multi-LLM-call cycle.

Test plan

  • `npm test` passes
  • Manual smoke against a small session: first call ran, second call within 90 s returned `skipped: "fresh"` with `ageMs` populated
  • `SUMMARIZE_DEDUP_WINDOW_MS=0` disables the gate
  • Default 90 s tunable per environment

Notes

The window is env-configurable but not currently exposed in any config schema. Bump to `600000` (10 min) for very chatty long-running sessions if the default feels too aggressive.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Performance
    • Summary generation now reuses a recent summary when one exists within a configurable freshness window, avoiding unnecessary reprocessing.
    • Results in faster summary retrieval, reduced resource use, and a more responsive experience when viewing or requesting summaries.

Review Change Stack

Stop hooks fire summarize on every assistant turn. For chatty sessions
that summarize successfully every few seconds, re-running the whole
LLM-driven summarize on the next turn is mostly wasted work — the
observation delta at sub-minute granularity rarely changes the
narrative meaningfully, and on slow LLM providers (e.g. serverless
OpenAI-compat) it stacks: by the time the previous summarize finishes,
the next one is already in flight, both contending for the same
provider rate budget.

This change adds a dedup gate at the top of mem::summarize: if an
existing SessionSummary for the requested sessionId is younger than
SUMMARIZE_DEDUP_WINDOW_MS (default 90,000 ms, set to 0 to disable),
return the cached summary with `skipped: "fresh"` and skip the LLM
work entirely. Smoke-validated:

- First call against a small session (6 obs): 6.4s, real LLM run,
  qualityScore 100 → summary stored.
- Second call within 90s: 0.02s, `skipped: "fresh"`, ageMs: 77, no
  LLM calls — returned the cached summary directly.

The MetricsStore is still pinged on the dedup path (as a successful
call) so per-function latency stats reflect reality. The window is env-
configurable but not exposed in any config schema yet; bump it to
something like 600,000 ms (10 min) for very chatty sessions if the
default ever feels too aggressive.

Combined with the new Stop-hook force-exit, the typical summarize call
chain (Stop hook → fetch dedup-hit → server returns ~50ms) is now
effectively zero-cost on the Claude Code turn boundary, with the LLM
work amortized at the dedup window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 20, 2026

@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

mem::summarize now performs a configurable "fresh-summary" deduplication check: if a session summary in KV.summaries is newer than SUMMARIZE_DEDUP_WINDOW_MS (default 90s), it returns the cached summary and skips LLM summarization.

Changes

Fresh Summary Deduplication

Layer / File(s) Summary
Parse dedup window
src/functions/summarize.ts
Adds getDedupWindowMs() and DEDUP_WINDOW_MS_DEFAULT to parse SUMMARIZE_DEDUP_WINDOW_MS as a non-negative number, defaulting to 90,000 ms.
Fresh Summary Deduplication Gate
src/functions/summarize.ts
Adds early KV deduplication: load KV.summaries for the session, compute ageMs from createdAt, and if within the window return the cached summary with skipped: "fresh", recording timing to metricsStore when provided.

Sequence Diagram

sequenceDiagram
  participant Handler as mem::summarize
  participant Env as Environment
  participant KV as KV.summaries
  participant Metrics as metricsStore
  Handler->>Env: read SUMMARIZE_DEDUP_WINDOW_MS
  Handler->>KV: fetch session summary
  KV-->>Handler: return summary (with createdAt)
  Handler->>Handler: compute ageMs = now - createdAt
  alt ageMs <= window && window > 0
    Handler->>Metrics: record skip timing (if present)
    Metrics-->>Handler: ack
    Handler-->>Handler: return { success: true, skipped: "fresh", ageMs, summary }
  else
    Handler-->>Handler: proceed to LLM summarization flow
  end
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I checked the cache before the run,

Ninety seconds said "you're done".
No extra LLM chase today,
Fresh summary keeps work at bay.
Hop along — the cache won the fun.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf(summarize): skip mem::summarize when a fresh summary exists' directly and specifically describes the main change: a performance optimization that adds freshness deduplication to skip unnecessary LLM calls when a recent summary exists.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/functions/summarize.ts (1)

231-237: ⚡ Quick win

Replace the explanatory block comment with self-descriptive code/docstring-level brevity.

This comment explains behavior in detail (“what/why”) inline; keep this concise in code and move rationale to PR/docs if needed.

As per coding guidelines, "Avoid code comments explaining WHAT — use clear naming instead".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/summarize.ts` around lines 231 - 237, Replace the long
explanatory block comment above the dedup logic with a concise, self-descriptive
comment or docstring and rely on clear naming (e.g., SUMMARIZE_DEDUP_WINDOW_MS,
summarize, or fresh-summary dedup variables) to convey behavior; remove the
"what/why" discussion and instead write a short one-line comment like "Skip LLM
summarization if existing summary is younger than SUMMARIZE_DEDUP_WINDOW_MS" and
move the rationale into the PR/docs per guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/functions/summarize.ts`:
- Around line 231-237: Replace the long explanatory block comment above the
dedup logic with a concise, self-descriptive comment or docstring and rely on
clear naming (e.g., SUMMARIZE_DEDUP_WINDOW_MS, summarize, or fresh-summary dedup
variables) to convey behavior; remove the "what/why" discussion and instead
write a short one-line comment like "Skip LLM summarization if existing summary
is younger than SUMMARIZE_DEDUP_WINDOW_MS" and move the rationale into the
PR/docs per guidelines.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d20c4a99-e8bd-4e79-94f1-9d97ce36669a

📥 Commits

Reviewing files that changed from the base of the PR and between 93d1bdd and 67305f4.

📒 Files selected for processing (1)
  • src/functions/summarize.ts

CodeRabbit nitpick on rohitg00#572 — the original block explained the same
thing twice. Keep the WHY (sub-minute delta noise + Stop-hook cadence)
in two sentences; the env var name + default already document the knob.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
src/functions/summarize.ts (4)

231-234: 💤 Low value

Consider focusing comments on WHY rather than WHAT.

The comment explains what the code does ("Skip the LLM call when..."), but the code itself is fairly self-documenting. The second part explaining the rationale (stop hooks firing frequently) provides valuable context and should be preserved.

As per coding guidelines: "Avoid code comments explaining WHAT — use clear naming instead."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/summarize.ts` around lines 231 - 234, Replace the current
two-line comment about skipping the LLM call with a short WHY-focused comment
that explains the rationale for the deduplication window
(SUMMARIZE_DEDUP_WINDOW_MS) — e.g., that summaries are rate-limited to avoid
redundant LLM calls because stop hooks fire on every assistant turn and
sub-minute deltas rarely change the observation — and remove the WHAT-level
restatement that the code already expresses; update the comment near
SUMMARIZE_DEDUP_WINDOW_MS and the stop-hook context around the summarization
logic in summarize.ts accordingly.

235-239: ⚡ Quick win

Extract deduplication window parsing into a helper function.

The inline parsing of SUMMARIZE_DEDUP_WINDOW_MS is inconsistent with the existing getChunkSize() and getChunkConcurrency() helper functions. Additionally, Number(dedupRaw) is called twice, which is wasteful.

♻️ Proposed refactor to match existing pattern

Add this helper function near the other env parsing helpers (after line 51):

function getDedupWindowMs(): number {
  const raw = process.env.SUMMARIZE_DEDUP_WINDOW_MS;
  if (raw === undefined) return 90_000;
  const n = Number(raw);
  return Number.isFinite(n) && n >= 0 ? n : 90_000;
}

Then replace lines 235-239 with:

-      const dedupRaw = process.env["SUMMARIZE_DEDUP_WINDOW_MS"];
-      const dedupWindowMs =
-        dedupRaw !== undefined && Number.isFinite(Number(dedupRaw))
-          ? Math.max(0, Number(dedupRaw))
-          : 90_000;
+      const dedupWindowMs = getDedupWindowMs();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/summarize.ts` around lines 235 - 239, Extract the
SUMMARIZE_DEDUP_WINDOW_MS parsing into a helper named getDedupWindowMs() similar
to getChunkSize() and getChunkConcurrency(): add a function that reads
process.env.SUMMARIZE_DEDUP_WINDOW_MS, returns 90_000 if undefined, converts
once to Number, and returns it only if Number.isFinite(n) and n >= 0 otherwise
return 90_000; then replace the inline block that computes dedupWindowMs with a
call to getDedupWindowMs() so the code uses that helper and avoids calling
Number(dedupRaw) twice.

241-243: ⚡ Quick win

Log errors when fetching existing summary fails.

The .catch(() => null) silently swallows all errors from the KV fetch. While falling back to full summarization is safe, logging the error would help diagnose KV store issues.

♻️ Proposed logging addition
-        const existing = await kv
-          .get<SessionSummary>(KV.summaries, sessionId)
-          .catch(() => null);
+        const existing = await kv
+          .get<SessionSummary>(KV.summaries, sessionId)
+          .catch((err) => {
+            logger.warn("Failed to fetch existing summary for dedup check", {
+              sessionId,
+              error: err instanceof Error ? err.message : String(err),
+            });
+            return null;
+          });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/summarize.ts` around lines 241 - 243, The current .catch(() =>
null) on kv.get<SessionSummary>(KV.summaries, sessionId) swallows errors; change
it to catch the error and log it (e.g., .catch(err => {
processLogger.error('Failed to fetch summary', { err, sessionId, bucket:
KV.summaries }); return null; })) so failures while reading existing remain
non-fatal but record the error and context; use the project's logger
(processLogger or existing logger) rather than silent swallowing.

245-252: 💤 Low value

Consolidate timestamp captures per coding guidelines.

Lines 245 and 252 both call Date.now() within the same code block. As per coding guidelines, timestamps should be captured once and reused rather than calling repeatedly.

♻️ Consolidate Date.now() calls
         if (existing && existing.createdAt) {
-          const ageMs = Date.now() - Date.parse(existing.createdAt);
+          const nowMs = Date.now();
+          const ageMs = nowMs - Date.parse(existing.createdAt);
           if (Number.isFinite(ageMs) && ageMs >= 0 && ageMs < dedupWindowMs) {
             logger.info("Summarize skipped — fresh summary present", {
               sessionId,
               ageMs,
               dedupWindowMs,
             });
-            const latencyMs = Date.now() - startMs;
+            const latencyMs = nowMs - startMs;
             if (metricsStore) {

As per coding guidelines: "Capture timestamps once with new Date().toISOString() and reuse rather than calling repeatedly."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/summarize.ts` around lines 245 - 252, The block calls
Date.now() twice; capture the current time once and reuse it: compute a single
nowMs (or nowIso if an ISO string is required) before calculating ageMs (use
nowMs - Date.parse(existing.createdAt)) and compute latencyMs using nowMs -
startMs; then pass the reused timestamp/derived values into logger.info. Update
references in this block (existing.createdAt, ageMs, dedupWindowMs, startMs,
latencyMs, logger.info) to use the single captured timestamp rather than calling
Date.now() twice.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/functions/summarize.ts`:
- Around line 231-234: Replace the current two-line comment about skipping the
LLM call with a short WHY-focused comment that explains the rationale for the
deduplication window (SUMMARIZE_DEDUP_WINDOW_MS) — e.g., that summaries are
rate-limited to avoid redundant LLM calls because stop hooks fire on every
assistant turn and sub-minute deltas rarely change the observation — and remove
the WHAT-level restatement that the code already expresses; update the comment
near SUMMARIZE_DEDUP_WINDOW_MS and the stop-hook context around the
summarization logic in summarize.ts accordingly.
- Around line 235-239: Extract the SUMMARIZE_DEDUP_WINDOW_MS parsing into a
helper named getDedupWindowMs() similar to getChunkSize() and
getChunkConcurrency(): add a function that reads
process.env.SUMMARIZE_DEDUP_WINDOW_MS, returns 90_000 if undefined, converts
once to Number, and returns it only if Number.isFinite(n) and n >= 0 otherwise
return 90_000; then replace the inline block that computes dedupWindowMs with a
call to getDedupWindowMs() so the code uses that helper and avoids calling
Number(dedupRaw) twice.
- Around line 241-243: The current .catch(() => null) on
kv.get<SessionSummary>(KV.summaries, sessionId) swallows errors; change it to
catch the error and log it (e.g., .catch(err => { processLogger.error('Failed to
fetch summary', { err, sessionId, bucket: KV.summaries }); return null; })) so
failures while reading existing remain non-fatal but record the error and
context; use the project's logger (processLogger or existing logger) rather than
silent swallowing.
- Around line 245-252: The block calls Date.now() twice; capture the current
time once and reuse it: compute a single nowMs (or nowIso if an ISO string is
required) before calculating ageMs (use nowMs - Date.parse(existing.createdAt))
and compute latencyMs using nowMs - startMs; then pass the reused
timestamp/derived values into logger.info. Update references in this block
(existing.createdAt, ageMs, dedupWindowMs, startMs, latencyMs, logger.info) to
use the single captured timestamp rather than calling Date.now() twice.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8d69d58b-679e-4b92-bd75-ac7d9b79c3f8

📥 Commits

Reviewing files that changed from the base of the PR and between 67305f4 and afa7855.

📒 Files selected for processing (1)
  • src/functions/summarize.ts

CodeRabbit had two nits on the dedup block:

1) The comment restated WHAT the code does. Keep only the WHY
   (Stop-hook cadence + sub-minute delta = wasted LLM work).
2) The inline raw-then-parse pattern was inconsistent with the
   existing getChunkSize() / getChunkConcurrency() helpers above
   and called Number(raw) twice. Extracted getDedupWindowMs() with
   a DEDUP_WINDOW_MS_DEFAULT constant to match the file's style.

Behavior unchanged — invalid/negative values still fall back to 90s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/functions/summarize.ts (1)

53-60: 💤 Low value

Parsing inconsistency with sibling helpers.

getChunkSize/getChunkConcurrency use parseInt and treat missing/invalid input as default, while getDedupWindowMs uses Number(raw). As a side effect, SUMMARIZE_DEDUP_WINDOW_MS="" (empty string) and " " coerce to 0 via Number(), silently disabling dedup rather than falling back to the 90s default. Sibling helpers would fall back to the default in that case. Consider treating empty/whitespace-only as unset for consistency.

♻️ Suggested change
 function getDedupWindowMs(): number {
   const raw = process.env["SUMMARIZE_DEDUP_WINDOW_MS"];
-  if (raw === undefined) return DEDUP_WINDOW_MS_DEFAULT;
+  if (raw === undefined || raw.trim() === "") return DEDUP_WINDOW_MS_DEFAULT;
   const n = Number(raw);
   return Number.isFinite(n) && n >= 0 ? n : DEDUP_WINDOW_MS_DEFAULT;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/summarize.ts` around lines 53 - 60, The getDedupWindowMs
function currently uses Number(raw) and treats empty or whitespace-only
SUMMARIZE_DEDUP_WINDOW_MS as 0, which differs from sibling helpers; update
getDedupWindowMs to mirror getChunkSize/getChunkConcurrency behavior by treating
empty/whitespace-only values as unset and falling back to
DEDUP_WINDOW_MS_DEFAULT: trim the raw env string, if it's empty treat as
undefined, then parse with parseInt (or Number but validate via Number.isFinite
and n >= 0) and return the parsed non-negative finite value or
DEDUP_WINDOW_MS_DEFAULT; refer to getDedupWindowMs and DEDUP_WINDOW_MS_DEFAULT
when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/functions/summarize.ts`:
- Around line 53-60: The getDedupWindowMs function currently uses Number(raw)
and treats empty or whitespace-only SUMMARIZE_DEDUP_WINDOW_MS as 0, which
differs from sibling helpers; update getDedupWindowMs to mirror
getChunkSize/getChunkConcurrency behavior by treating empty/whitespace-only
values as unset and falling back to DEDUP_WINDOW_MS_DEFAULT: trim the raw env
string, if it's empty treat as undefined, then parse with parseInt (or Number
but validate via Number.isFinite and n >= 0) and return the parsed non-negative
finite value or DEDUP_WINDOW_MS_DEFAULT; refer to getDedupWindowMs and
DEDUP_WINDOW_MS_DEFAULT when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5858cbbe-afc5-4918-8750-a78ed729cf89

📥 Commits

Reviewing files that changed from the base of the PR and between afa7855 and f410990.

📒 Files selected for processing (1)
  • src/functions/summarize.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant