Rate-limit and abuse-protect public MCP reads and LLM endpoints

## Context
The only throttle today is the 1/day demo-IP quota on `/api/demo/extractions` (`consumeDemoQuota`, `apps/api/src/worker.ts:4296`). Token-less public namespace MCP reads (`/mcp`, `worker.ts:800`) and the LLM-backed endpoints — `/api/memwal/chat` (`worker.ts:634/3498`), `aiQueryRun` (`worker.ts:1349`), `/api/memwal/recall` (`worker.ts:631`) — are completely uncapped: an open cost/abuse vector that blocks flipping the repo public (`docs/launch-checklist.md`).

## Goal / user story
As the platform owner, I want per-IP and per-token rate limits on public reads and LLM endpoints so a single client can't run up unbounded Workers AI / OpenRouter cost or scrape the directory, returning clear 429s.

## Acceptance criteria
- [ ] A `rateLimit(env, key, { limit, windowSec })` helper returns `{ allowed, remaining, resetAt }`, implemented via Cloudflare's Rate Limiting binding (preferred) or a D1 sliding window mirroring `consumeDemoQuota`.
- [ ] Distinct buckets: anonymous (per-IP via `clientIp`, `worker.ts:4313`) vs authed (per read-token/account) with higher authed limits.
- [ ] Enforced on: public MCP reads (`/mcp`), `/api/memwal/chat`, `aiQueryRun`, `/api/memwal/recall`, and the directory listing (`/api/directory`, `worker.ts:687`).
- [ ] On limit: HTTP 429 with `Retry-After` and a typed code (reuse the `statusError(..., 429, "RATE_LIMITED")` shape, see `worker.ts:4302`); `X-RateLimit-Remaining`/`-Reset` headers on success.
- [ ] Limits are env-configurable via `wrangler.jsonc` vars (e.g. `RATE_LIMIT_CHAT_PER_MIN`) with safe defaults; staging can override.
- [ ] Tests cover: under-limit passes, over-limit 429s, authed bucket > anon bucket, and limits are per-key not global.

## Implementation notes
The Cloudflare Rate Limiting binding needs a `ratelimit`/`[[unsafe.bindings]]` entry in `wrangler.jsonc` (both top-level and `env.staging`); document the binding in `.env.example`/README. If the binding is awkward under tests, fall back to a D1 token-bucket keyed `${windowStart}:${ipHashOrToken}:${endpoint}` reusing the upsert at `worker.ts:4304`. Keep this separate from the usage ledger (rate-limit = ephemeral sliding window; ledger = durable audit), though you may emit a `rate_limited` ledger event for abuse visibility. Do not regress the existing SSRF guard (`isPrivateIpv4`, `worker.ts:4288`).

## Sui Overflow angle
A public hackathon demo with an open, unauthenticated MCP + LLM surface is a guaranteed cost/abuse incident the moment the link is shared. Capping it is what makes it safe to flip the repo public and submit to the Smithery/Claude/Cursor marketplaces (`docs/launch-checklist.md` D5) — it unblocks the growth/demo loop.

## Dependencies
Per-token/per-account buckets get stronger once the accounts/owner-auth issue lands, but per-IP limiting ships independently. None blocking.

_Part of the ContextMEM roadmap (#4) • Sui Overflow build._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rate-limit and abuse-protect public MCP reads and LLM endpoints #11

Context

Goal / user story

Acceptance criteria

Implementation notes

Sui Overflow angle

Dependencies

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Rate-limit and abuse-protect public MCP reads and LLM endpoints #11

Description

Context

Goal / user story

Acceptance criteria

Implementation notes

Sui Overflow angle

Dependencies

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions