Skip to content

Rate limiter: P0–P4 follow-ups identified during multi-tenant rollout #49

@gandhipratik203

Description

@gandhipratik203

Context

Surfaced while landing the multi-tenant rate-limiter changes in cpex-rate-limiter 0.0.4. This issue tracks the follow-up work that emerged during implementation, code review, and end-to-end validation. Items are prioritized by impact on production confidence vs. operator UX vs. nice-to-have.

P0 — Required for production confidence

  • Prometheus / OTEL metrics (G10) — Zero observability today; operators grep logs to audit enforcement. Need rate_limiter_decisions_total{result, dim}, backend-latency histograms, span attributes for active traces. Scope: cpex-plugins.
  • Propagation verification with backoff — Replaces the fixed 7s wait with a poll-until-confirmed loop across replicas. Kills the rapid-toggle test flake observed during integration tests. Scope: main repo (test helper + optional plugin-manager ACK channel).

P1 — Operator UX improvements

  • Rate-limit introspection API — Admin endpoint returning current counter state, tripped dimension, TTL. Operators currently can't answer "why was this user blocked?" without Redis CLI access. Scope: cpex-plugins + gateway admin.
  • Reset-counters admin action — On-call scenarios: clear abuse trip after investigation, reset during outage recovery. Currently requires direct Redis access. Scope: cpex-plugins.
  • Runtime config update via admin API — Operators edit plugins/config.yaml and rely on restart/reload. Expose rate configs through the runtime-management API (same pattern as the existing mode-toggle endpoint). Scope: main repo + cpex-plugins.
  • Structured schema validation — Today we warn on unknown keys, but invalid rate strings ("60/hour" vs "60/h") and typo'd algorithm names are silently accepted. JSON-schema validation at config-load time. Scope: cpex-plugins.

P2 — Correctness hardening

  • Redis health check / circuit breaker — Active probe + shed-load path when Redis degrades, instead of per-request timeout hits. Scope: cpex-plugins.
  • E2E coverage for sliding_window / token_bucket Redis paths — Integration tests currently exercise only fixed_window; other algorithms are unit-tested only. Scope: main repo integration suite.

P3 — Performance / features

  • Multi-dimension batch evaluation — Replace per-dimension Redis roundtrips with MGET / Lua script for 2-3× throughput on multi-dim configs. Scope: cpex-plugins.
  • Cascade rate limits — Compose per-tool inside per-user inside per-tenant so tightening the tenant cap automatically caps all inner dimensions. Scope: cpex-plugins.
  • Token-aware rate dimension for LLM-context-bound traffic — Extend RateLimiterPlugin to support a by_tokens dimension keyed off (a) tool-result payload size, (b) prompts/get content, (c) A2A payload. Composes additively with existing by_user / by_tenant / by_tool request-based dimensions. Useful for deployments where MCP traffic feeds LLM contexts and the operator wants to cap aggregate token throughput per tenant. Not a replacement for request-based limits — adds a complementary cost-dimension that matches LLM-shaped traffic. Scope: cpex-plugins.
  • Quota accounting (not just blocking) — Emit counter deltas for billing/reporting pipelines. Operators want "how much quota did Team X consume this month" even when not near the limit. Scope: cpex-plugins + pipeline integration.

P4 — Exploratory

  • Soft-block mode — Beyond fail-open/closed: fall back to in-memory with a header/metric flagging "degraded enforcement." Partial enforcement instead of binary choice when Redis degrades. Scope: cpex-plugins.
  • Admin UI for rate-limit visualization — Top consumers, predicted breaches, per-tenant counter charts. Scope: main repo admin UI.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions