Skip to content

Rate-limit and abuse-protect public MCP reads and LLM endpoints #11

Description

@harrymove-ctrl

Context

The only throttle today is the 1/day demo-IP quota on /api/demo/extractions (consumeDemoQuota, apps/api/src/worker.ts:4296). Token-less public namespace MCP reads (/mcp, worker.ts:800) and the LLM-backed endpoints — /api/memwal/chat (worker.ts:634/3498), aiQueryRun (worker.ts:1349), /api/memwal/recall (worker.ts:631) — are completely uncapped: an open cost/abuse vector that blocks flipping the repo public (docs/launch-checklist.md).

Goal / user story

As the platform owner, I want per-IP and per-token rate limits on public reads and LLM endpoints so a single client can't run up unbounded Workers AI / OpenRouter cost or scrape the directory, returning clear 429s.

Acceptance criteria

  • A rateLimit(env, key, { limit, windowSec }) helper returns { allowed, remaining, resetAt }, implemented via Cloudflare's Rate Limiting binding (preferred) or a D1 sliding window mirroring consumeDemoQuota.
  • Distinct buckets: anonymous (per-IP via clientIp, worker.ts:4313) vs authed (per read-token/account) with higher authed limits.
  • Enforced on: public MCP reads (/mcp), /api/memwal/chat, aiQueryRun, /api/memwal/recall, and the directory listing (/api/directory, worker.ts:687).
  • On limit: HTTP 429 with Retry-After and a typed code (reuse the statusError(..., 429, "RATE_LIMITED") shape, see worker.ts:4302); X-RateLimit-Remaining/-Reset headers on success.
  • Limits are env-configurable via wrangler.jsonc vars (e.g. RATE_LIMIT_CHAT_PER_MIN) with safe defaults; staging can override.
  • Tests cover: under-limit passes, over-limit 429s, authed bucket > anon bucket, and limits are per-key not global.

Implementation notes

The Cloudflare Rate Limiting binding needs a ratelimit/[[unsafe.bindings]] entry in wrangler.jsonc (both top-level and env.staging); document the binding in .env.example/README. If the binding is awkward under tests, fall back to a D1 token-bucket keyed ${windowStart}:${ipHashOrToken}:${endpoint} reusing the upsert at worker.ts:4304. Keep this separate from the usage ledger (rate-limit = ephemeral sliding window; ledger = durable audit), though you may emit a rate_limited ledger event for abuse visibility. Do not regress the existing SSRF guard (isPrivateIpv4, worker.ts:4288).

Sui Overflow angle

A public hackathon demo with an open, unauthenticated MCP + LLM surface is a guaranteed cost/abuse incident the moment the link is shared. Capping it is what makes it safe to flip the repo public and submit to the Smithery/Claude/Cursor marketplaces (docs/launch-checklist.md D5) — it unblocks the growth/demo loop.

Dependencies

Per-token/per-account buckets get stronger once the accounts/owner-auth issue lands, but per-IP limiting ships independently. None blocking.

Part of the ContextMEM roadmap (#4) • Sui Overflow build.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0Demo-blocking: required for a working Sui Overflow demofeatureUser- or agent-facing capabilityplatformBackend platform plumbing: Worker, D1, queues, secrets, metering

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions