From 4c3eb39fc0fd214c58f8803d285f72f3ff13840d Mon Sep 17 00:00:00 2001 From: Gagan Trivedi Date: Mon, 4 May 2026 13:47:17 +0530 Subject: [PATCH 1/3] docs: add Edge Proxy operational guide Adds identity overrides, troubleshooting, production deployment, and architecture/scaling sections to the Edge Proxy doc, addressing the recurring support patterns catalogued in #7207. --- .../deployment-self-hosting/edge-proxy.md | 225 +++++++++++++++++- 1 file changed, 213 insertions(+), 12 deletions(-) diff --git a/docs/docs/deployment-self-hosting/edge-proxy.md b/docs/docs/deployment-self-hosting/edge-proxy.md index 58a4b216d3f3..da68d3b1b73b 100644 --- a/docs/docs/deployment-self-hosting/edge-proxy.md +++ b/docs/docs/deployment-self-hosting/edge-proxy.md @@ -5,8 +5,8 @@ sidebar_position: 3 --- The [Edge Proxy](/performance/edge-proxy) runs as a -[Docker container](https://hub.docker.com/repository/docker/flagsmith/edge-proxy) with no external dependencies. -It connects to the Flagsmith API to download environment documents, and your Flagsmith client applications connect to it +[Docker container](https://hub.docker.com/repository/docker/flagsmith/edge-proxy) with no external dependencies. It +connects to the Flagsmith API to download environment documents, and your Flagsmith client applications connect to it using [remote flag evaluation](/integrating-with-flagsmith/sdks#remote-evaluation). The examples below assume you have a configuration file located at `./config.json`. Your Flagsmith client applications @@ -159,8 +159,8 @@ When set to `true`, the Edge Proxy will use the `X-Forwarded-For` and `X-Forward client IP addresses. This is useful if the Edge Proxy is running behind a reverse proxy, and you want the [access logs](#loggingoverride) to show the real IP addresses of your clients. -By default, only the loopback address is trusted. This can be changed with the [`FORWARDED_ALLOW_IPS` environment -variable](#environment-variables). +By default, only the loopback address is trusted. This can be changed with the +[`FORWARDED_ALLOW_IPS` environment variable](#environment-variables). ```json "server": { @@ -270,9 +270,9 @@ specified by the [`"logging.log_format"`](#logginglog_format) setting. The Edge Proxy exposes two health check endpoints: -* `/proxy/health/liveness`: Always responds with a 200 status code. Use this health check to determine if the Edge - Proxy is alive and able to respond to requests. -* `/proxy/health/readiness`: Responds with a 200 status if the Edge Proxy was able to fetch all its configured +- `/proxy/health/liveness`: Always responds with a 200 status code. Use this health check to determine if the Edge Proxy + is alive and able to respond to requests. +- `/proxy/health/readiness`: Responds with a 200 status if the Edge Proxy was able to fetch all its configured environment documents within a configurable grace period. This allows the Edge Proxy to continue reporting as healthy even if the Flagsmith API is temporarily unavailable. This health check is also available at `/proxy/health`. @@ -304,11 +304,212 @@ return 200 Some Edge Proxy settings can only be set using environment variables: -- `WEB_CONCURRENCY` The number of [Uvicorn](https://www.uvicorn.org/) workers. Defaults to `1`, which is +- `WEB_CONCURRENCY` The number of [Uvicorn](https://www.uvicorn.org/) workers. Defaults to `1`, which is [recommended when running multiple Edge Proxy containers behind a load balancer](https://fastapi.tiangolo.com/deployment/docker/#one-load-balancer-multiple-worker-containers). - If running on a single node, set this [based on your number of CPU cores and threads](https://docs.gunicorn.org/en/latest/design.html#how-many-workers). -- `HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`, `NO_PROXY`: These variables let you configure an HTTP proxy that the - Edge Proxy should use for all its outgoing HTTP requests. - [Learn more](https://www.python-httpx.org/environment_variables) + If running on a single node, set this + [based on your number of CPU cores and threads](https://docs.gunicorn.org/en/latest/design.html#how-many-workers). +- `HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`, `NO_PROXY`: These variables let you configure an HTTP proxy that the Edge + Proxy should use for all its outgoing HTTP requests. [Learn more](https://www.python-httpx.org/environment_variables) - `FORWARDED_ALLOW_IPS`: Which IPs to trust for determining client IP addresses when using the `proxy_headers` option. For more details, see the [Uvicorn documentation](https://www.uvicorn.org/settings/#http). + +## Identity overrides + +Identity overrides defined in the dashboard are evaluated by the Edge Proxy. They are embedded in the environment +document the proxy fetches from the Flagsmith API, and applied during local evaluation by the Flagsmith engine. + +For overrides to flow through to the Edge Proxy: + +- The environment must have **Use identity overrides in local evaluation** enabled. This is the default for new + environments. +- The Edge Proxy must be able to fetch a fresh environment document. Polling frequency is controlled by + [`api_poll_frequency_seconds`](#api_poll_frequency_seconds). + +:::warning Edge Proxy version + +Deleting an identity override in the dashboard only propagates to the Edge Proxy on +[v2.21.1](https://github.com/Flagsmith/edge-proxy/releases/tag/v2.21.1) and newer. Earlier versions kept the deleted +override in their cached environment document, so the proxy returned the old overridden value. Pin to `v2.21.1` or +later, or use `:latest`, to pick up override deletions. + +::: + +When an identity has both an override and a matching segment override, the identity override takes precedence — this +matches the behaviour of [Local Evaluation Mode](/integrating-with-flagsmith/integration-overview). + +## Troubleshooting + +### 401 Unauthorized from the Edge Proxy + +The Edge Proxy returns `401 {"status": "unauthorized", "message": "unknown key ..."}` when the `X-Environment-Key` +header sent by your client does not match any key configured in [`environment_key_pairs`](#environment_key_pairs). + +Check that: + +- Your client is using the **client-side** environment key, not the server-side key. +- The client-side key in your SDK exactly matches the `client_side_key` in the proxy's configuration. +- If you rotated keys in the dashboard, the proxy configuration was updated and the proxy was restarted. + +### 403 Forbidden in Edge Proxy logs + +A 403 in the proxy's `error_fetching_document` log line comes from the **upstream Flagsmith API** rejecting the +configured server-side key when the proxy polls for an environment document. The proxy itself does not return 403; it +surfaces the upstream error and keeps serving the last cached document if one exists. + +Diagnose in this order: + +1. **Key prefix and presence.** `server_side_key` values must be non-empty and start with `ser.`. The proxy validates + this at startup and refuses to launch otherwise — a blank or whitespace-only server key fails the same check. +2. **Key type.** Confirm the key was created as **Server-side Environment Key** in **Environment settings → SDK Keys**. + Client-side keys cannot fetch environment documents. +3. **Key freshness.** If the key was rotated or deleted in the dashboard, the proxy's cached value is now invalid. +4. **`api_url`.** When self-hosting, [`api_url`](#api_url) must point at your Flagsmith API (e.g. + `https://flagsmith.example.com/api/v1`). Pointing a self-hosted proxy at `edge.api.flagsmith.com` will 403 because + the key does not exist on Flagsmith's hosted Edge. +5. **Edge enablement.** On self-hosted deployments where Edge is enabled per-project, ensure the project the environment + belongs to is permitted to serve environment documents. + +### Restart loops in ECS, Kubernetes, or other orchestrators + +The most common cause is the orchestrator's readiness probe firing before the proxy has fetched its first environment +document, or fluctuating to unhealthy whenever the upstream API is briefly slow. + +- Point readiness probes at [`/proxy/health/readiness`](#health-checks) and liveness probes at `/proxy/health/liveness`. + **Do not** point liveness at readiness — a transient upstream outage will then kill the container instead of letting + it serve cached documents. +- Increase the readiness probe's `initialDelaySeconds` (Kubernetes) or `startPeriod` (ECS) to comfortably exceed the + time it takes to fetch all configured environment documents on a cold start. +- If you serve many environments from a single proxy, raise + [`health_check.environment_update_grace_period_seconds`](#health_checkenvironment_update_grace_period_seconds) or set + it to `null` to keep the proxy healthy when the upstream API is intermittently unavailable. + +### Stale flags after a dashboard change + +The proxy serves cached environment documents and only re-fetches every +[`api_poll_frequency_seconds`](#api_poll_frequency_seconds) (default 10s). It also uses `If-Modified-Since` and will log +a 304 when the upstream document hasn't changed. + +To diagnose: + +- Set [`logging.log_level`](#logginglog_level) to `DEBUG` and watch for `environment_updated` log events after you + publish a change. +- Verify the proxy can reach the upstream API. A 5xx, timeout, or 403 from the upstream API will leave the proxy serving + the last successfully-fetched document. +- For very fast propagation requirements, lower `api_poll_frequency_seconds`, but be aware this increases load on the + upstream API proportionally. + +### Identity-based evaluation returns the wrong value + +If your client is hitting the proxy and the result differs from a direct API call: + +- Confirm you are sending the full set of traits on every request. The proxy is stateless and does not persist traits + between calls — see [Managing Traits](/performance/edge-proxy#managing-traits). +- If the result was correct before and is now stale after deleting an identity override, upgrade to Edge Proxy v2.21.1 + or newer (see [Identity overrides](#identity-overrides)). +- Disable [endpoint caches](#endpoint_caches) temporarily to rule out a cached response. + +## Production deployment + +### Behind a reverse proxy or load balancer + +- Set [`server.proxy_headers`](#serverproxy_headers) to `true` so access logs record the real client IP. +- Use the [`FORWARDED_ALLOW_IPS`](#environment-variables) environment variable to list the load balancer's IPs. +- Run multiple Edge Proxy containers behind the load balancer with `WEB_CONCURRENCY=1` per container, as recommended by + FastAPI. The proxy is stateless, so any instance can serve any request. +- Health-check path on the load balancer should be `/proxy/health/readiness`. + +### ECS / Fargate + +- Map container port `8000` and front the service with an ALB or NLB. +- Set the ECS health check `command` or the target group health check path to `/proxy/health/readiness`. +- Use `startPeriod` on the ECS health check (typically 30–60s) so the task is not killed during initial document + fetches. +- The task needs outbound internet (or VPC routing) to reach the Flagsmith API. If you use a forward proxy, set + `HTTPS_PROXY` and `NO_PROXY` on the task definition. +- Mount your `config.json` as a file (for example, via a sidecar that pulls from S3 or AWS Secrets Manager) rather than + baking server-side keys into the image. + +### Kubernetes + +The Edge Proxy is a stateless Deployment. There is no official Helm chart at the time of writing; a minimal manifest +looks like this: + +```yaml title="edge-proxy.yaml" +apiVersion: apps/v1 +kind: Deployment +metadata: + name: edge-proxy +spec: + replicas: 2 + selector: + matchLabels: { app: edge-proxy } + template: + metadata: + labels: { app: edge-proxy } + spec: + containers: + - name: edge-proxy + image: flagsmith/edge-proxy:latest + ports: + - containerPort: 8000 + readinessProbe: + httpGet: { path: /proxy/health/readiness, port: 8000 } + initialDelaySeconds: 10 + livenessProbe: + httpGet: { path: /proxy/health/liveness, port: 8000 } + volumeMounts: + - name: config + mountPath: /app/config.json + subPath: config.json + volumes: + - name: config + secret: + secretName: edge-proxy-config +``` + +Store `config.json` in a `Secret` (it contains server-side keys). Scale with `replicas` or an HPA on CPU. + +### Managing configuration in CI/CD + +`config.json` contains server-side environment keys and should be treated as a secret: + +- Keep the file out of version control. Render it at deploy time from your secrets store (Vault, AWS Secrets Manager, + GCP Secret Manager, Kubernetes `Secret`, etc.). +- If a static-analysis tool flags committed keys, rotate them in the dashboard immediately and move the new keys into + your secrets store. +- An empty `client_side_key` is a configuration error — both keys are required for the pair to be usable. + +## Architecture and scaling + +The Edge Proxy is stateless: each instance independently polls the Flagsmith API and serves cached environment +documents, so it scales linearly behind a load balancer. + +When sizing a fleet: + +- Each proxy instance polls the upstream API once per environment per + [`api_poll_frequency_seconds`](#api_poll_frequency_seconds), so adding instances multiplies the polling load on the + upstream API. With `If-Modified-Since` (Edge Proxy v2.19.0+, Flagsmith API v2.176.0+) most polls return 304 and cost + very little. +- Enable [`endpoint_caches`](#endpoint_caches) for `flags` and `identities` if you have many repeating requests. Caches + are scoped per-process and cleared whenever the environment document changes, so they cannot serve stale data after a + dashboard change. +- The proxy is CPU-bound on `flags` and `identities` (engine evaluation) and bandwidth-bound on `environment-document` + (large response body). Scale on CPU for the first two, and on outbound network for the third. + +### Reference throughput per instance + +The numbers below come from internal benchmarks of `flagsmith/edge-proxy:2.21.2` running as a single-worker container on +a 1 vCPU / 2 GB AWS Fargate task, with endpoint caches **disabled** (worst case — every request runs a full evaluation). +Use them as starting-point sizing; real throughput depends on project shape, segment complexity, and trait counts. + +Project profile: 50 features, 15 segments, every feature overridden by every segment (750 segment overrides total), each +segment matching on 15 trait conditions. + +| Endpoint | Peak RPS | Sweet spot (concurrency) | +| ---------------------------------- | -------: | -----------------------: | +| `POST /api/v1/identities/` | ~72 | 25 | +| `GET /api/v1/flags/` | ~63 | 10 | +| `GET /api/v1/environment-document` | ~570 | 25 | + +To raise per-instance throughput, run more containers behind the load balancer with `WEB_CONCURRENCY=1` per container, +or increase `WEB_CONCURRENCY` and the container's CPU allocation when running a single container per node. From 7bed04cc1c1bc1f2023fad1fa999eaac405a48c6 Mon Sep 17 00:00:00 2001 From: Gagan Trivedi Date: Mon, 4 May 2026 14:00:34 +0530 Subject: [PATCH 2/3] docs: drop tautological polling note from identity overrides --- docs/docs/deployment-self-hosting/edge-proxy.md | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/docs/docs/deployment-self-hosting/edge-proxy.md b/docs/docs/deployment-self-hosting/edge-proxy.md index da68d3b1b73b..0cd3965701f1 100644 --- a/docs/docs/deployment-self-hosting/edge-proxy.md +++ b/docs/docs/deployment-self-hosting/edge-proxy.md @@ -318,12 +318,8 @@ Some Edge Proxy settings can only be set using environment variables: Identity overrides defined in the dashboard are evaluated by the Edge Proxy. They are embedded in the environment document the proxy fetches from the Flagsmith API, and applied during local evaluation by the Flagsmith engine. -For overrides to flow through to the Edge Proxy: - -- The environment must have **Use identity overrides in local evaluation** enabled. This is the default for new - environments. -- The Edge Proxy must be able to fetch a fresh environment document. Polling frequency is controlled by - [`api_poll_frequency_seconds`](#api_poll_frequency_seconds). +For overrides to flow through to the Edge Proxy, the environment must have **Use identity overrides in local +evaluation** enabled. This is the default for new environments. :::warning Edge Proxy version From 619c8d0235f9a9a428d05bc8495d53238bf0cb34 Mon Sep 17 00:00:00 2001 From: Gagan Trivedi Date: Mon, 4 May 2026 14:04:10 +0530 Subject: [PATCH 3/3] docs: link to Local Evaluation Mode anchor, not the whole page --- docs/docs/deployment-self-hosting/edge-proxy.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs/deployment-self-hosting/edge-proxy.md b/docs/docs/deployment-self-hosting/edge-proxy.md index 0cd3965701f1..b8d70a2a142d 100644 --- a/docs/docs/deployment-self-hosting/edge-proxy.md +++ b/docs/docs/deployment-self-hosting/edge-proxy.md @@ -331,7 +331,7 @@ later, or use `:latest`, to pick up override deletions. ::: When an identity has both an override and a matching segment override, the identity override takes precedence — this -matches the behaviour of [Local Evaluation Mode](/integrating-with-flagsmith/integration-overview). +matches the behaviour of [Local Evaluation Mode](/integrating-with-flagsmith/integration-overview#local-evaluation-mode). ## Troubleshooting