You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The only throttle today is the 1/day demo-IP quota on /api/demo/extractions (consumeDemoQuota, apps/api/src/worker.ts:4296). Token-less public namespace MCP reads (/mcp, worker.ts:800) and the LLM-backed endpoints — /api/memwal/chat (worker.ts:634/3498), aiQueryRun (worker.ts:1349), /api/memwal/recall (worker.ts:631) — are completely uncapped: an open cost/abuse vector that blocks flipping the repo public (docs/launch-checklist.md).
Goal / user story
As the platform owner, I want per-IP and per-token rate limits on public reads and LLM endpoints so a single client can't run up unbounded Workers AI / OpenRouter cost or scrape the directory, returning clear 429s.
Acceptance criteria
A rateLimit(env, key, { limit, windowSec }) helper returns { allowed, remaining, resetAt }, implemented via Cloudflare's Rate Limiting binding (preferred) or a D1 sliding window mirroring consumeDemoQuota.
Distinct buckets: anonymous (per-IP via clientIp, worker.ts:4313) vs authed (per read-token/account) with higher authed limits.
Enforced on: public MCP reads (/mcp), /api/memwal/chat, aiQueryRun, /api/memwal/recall, and the directory listing (/api/directory, worker.ts:687).
On limit: HTTP 429 with Retry-After and a typed code (reuse the statusError(..., 429, "RATE_LIMITED") shape, see worker.ts:4302); X-RateLimit-Remaining/-Reset headers on success.
Limits are env-configurable via wrangler.jsonc vars (e.g. RATE_LIMIT_CHAT_PER_MIN) with safe defaults; staging can override.
Tests cover: under-limit passes, over-limit 429s, authed bucket > anon bucket, and limits are per-key not global.
Implementation notes
The Cloudflare Rate Limiting binding needs a ratelimit/[[unsafe.bindings]] entry in wrangler.jsonc (both top-level and env.staging); document the binding in .env.example/README. If the binding is awkward under tests, fall back to a D1 token-bucket keyed ${windowStart}:${ipHashOrToken}:${endpoint} reusing the upsert at worker.ts:4304. Keep this separate from the usage ledger (rate-limit = ephemeral sliding window; ledger = durable audit), though you may emit a rate_limited ledger event for abuse visibility. Do not regress the existing SSRF guard (isPrivateIpv4, worker.ts:4288).
Sui Overflow angle
A public hackathon demo with an open, unauthenticated MCP + LLM surface is a guaranteed cost/abuse incident the moment the link is shared. Capping it is what makes it safe to flip the repo public and submit to the Smithery/Claude/Cursor marketplaces (docs/launch-checklist.md D5) — it unblocks the growth/demo loop.
Dependencies
Per-token/per-account buckets get stronger once the accounts/owner-auth issue lands, but per-IP limiting ships independently. None blocking.
Part of the ContextMEM roadmap (#4) • Sui Overflow build.
Context
The only throttle today is the 1/day demo-IP quota on
/api/demo/extractions(consumeDemoQuota,apps/api/src/worker.ts:4296). Token-less public namespace MCP reads (/mcp,worker.ts:800) and the LLM-backed endpoints —/api/memwal/chat(worker.ts:634/3498),aiQueryRun(worker.ts:1349),/api/memwal/recall(worker.ts:631) — are completely uncapped: an open cost/abuse vector that blocks flipping the repo public (docs/launch-checklist.md).Goal / user story
As the platform owner, I want per-IP and per-token rate limits on public reads and LLM endpoints so a single client can't run up unbounded Workers AI / OpenRouter cost or scrape the directory, returning clear 429s.
Acceptance criteria
rateLimit(env, key, { limit, windowSec })helper returns{ allowed, remaining, resetAt }, implemented via Cloudflare's Rate Limiting binding (preferred) or a D1 sliding window mirroringconsumeDemoQuota.clientIp,worker.ts:4313) vs authed (per read-token/account) with higher authed limits./mcp),/api/memwal/chat,aiQueryRun,/api/memwal/recall, and the directory listing (/api/directory,worker.ts:687).Retry-Afterand a typed code (reuse thestatusError(..., 429, "RATE_LIMITED")shape, seeworker.ts:4302);X-RateLimit-Remaining/-Resetheaders on success.wrangler.jsoncvars (e.g.RATE_LIMIT_CHAT_PER_MIN) with safe defaults; staging can override.Implementation notes
The Cloudflare Rate Limiting binding needs a
ratelimit/[[unsafe.bindings]]entry inwrangler.jsonc(both top-level andenv.staging); document the binding in.env.example/README. If the binding is awkward under tests, fall back to a D1 token-bucket keyed${windowStart}:${ipHashOrToken}:${endpoint}reusing the upsert atworker.ts:4304. Keep this separate from the usage ledger (rate-limit = ephemeral sliding window; ledger = durable audit), though you may emit arate_limitedledger event for abuse visibility. Do not regress the existing SSRF guard (isPrivateIpv4,worker.ts:4288).Sui Overflow angle
A public hackathon demo with an open, unauthenticated MCP + LLM surface is a guaranteed cost/abuse incident the moment the link is shared. Capping it is what makes it safe to flip the repo public and submit to the Smithery/Claude/Cursor marketplaces (
docs/launch-checklist.mdD5) — it unblocks the growth/demo loop.Dependencies
Per-token/per-account buckets get stronger once the accounts/owner-auth issue lands, but per-IP limiting ships independently. None blocking.
Part of the ContextMEM roadmap (#4) • Sui Overflow build.