Gatekeeper is a tiny per-namespace reverse proxy for Kubernetes that scales your workloads to zero when they are idle and wakes them on the next request - holding that request until the backend is ready, so the caller sees a slightly slow first response instead of an error. It can optionally authenticate every request with a shared token.
It ships as a single ~25 MB static binary on distroless, uses tens of MB of RAM, starts instantly, and talks to the Kubernetes API with its own in-cluster ServiceAccount. Everything is configured through environment variables.
Idle environments (per-branch/preview/staging/demo deployments, internal tools, rarely-used services) burn CPU and memory around the clock. Gatekeeper sits in front of them and:
- Scales to zero every selected Deployment and StatefulSet after an idle
period, remembering each one's replica count (set
IDLE_TIMEOUT=0to disable and run as a pure wake-on-request proxy). - Wakes on demand: the next request restores those replicas - in dependency
order when workloads declare a
DEPENDS_ON_ANNOTATION(a workload's dependencies are scaled up and ready before it is, so an app never starts against a database that isn't up yet) - then waits for every managed workload in the namespace to become ready before proxying through (Knative activator style), so an app is never sent traffic before its dependencies are up. It keeps holding the request for as long as the pods are legitimately starting, and only gives up early if a pod is wedged in a state it won't recover from (bad/missing image, crash loop) - rather than failing on a fixed timer. Websocket upgrades and streaming responses are supported. - Optionally authenticates requests with a shared token via a header or cookie, with an optional redirect to an external login.
+------------------------- namespace -------------------------+
request | Gatekeeper --proxy--> Service --> Pod(s) |
------> | | |
| +- auth (optional): token in header/cookie, else 401 |
| +- idle > IDLE_TIMEOUT -> scale targets to 0 |
| +- request while asleep -> restore replicas, wait, |
| then proxy |
+-------------------------------------------------------------+
Gatekeeper routes by Host header using a table you provide (ROUTES_JSON). The
awake/asleep state is held in memory (run a single replica) and seeded from the
cluster at startup; each workload's pre-sleep replica count is saved on an
annotation, so a restart recovers cleanly.
-
Label the workloads you want managed (the default selector is opt-in):
kubectl label deploy/my-app gatekeeper.dev/scale-to-zero=true
-
Edit
deploy/(namespace, thegatekeeper-routesConfigMap, and any env), then apply:kubectl apply -f deploy/
-
Point your Ingress / Gateway at the
gatekeeperService (port 80) for the hostnames in your routes table.
A complete, runnable example (a sample app + Gatekeeper + assertions for the full
auth/sleep/wake cycle) lives in e2e/. Run it against any local cluster:
./e2e/run.sh # uses kube context "orbstack"; override with KUBE_CONTEXTAll configuration is via environment variables.
| Env | Default | Purpose |
|---|---|---|
NAMESPACE |
(required) | Namespace Gatekeeper manages. Inject via the downward API. |
ROUTES_JSON |
(required) | {"host":{"service":"svc","port":80}, ...} host -> upstream map. |
PORT |
8080 |
Listen port. |
HEALTH_PATH |
/healthz |
Unauthenticated health/probe path. |
LOG_LEVEL |
info |
debug / info / warn / error. JSON logs to stdout. |
| Env | Default | Purpose |
|---|---|---|
TARGET_SELECTOR |
gatekeeper.dev/scale-to-zero=true |
Label selector for managed Deployments/StatefulSets. Empty selects every workload in the namespace. |
SELF_NAME |
gatekeeper |
Workload name Gatekeeper never scales (itself). |
WAKE_REPLICAS_ANNOTATION |
gatekeeper.dev/wake-replicas |
Annotation storing the pre-sleep replica count. |
DEPENDS_ON_ANNOTATION |
gatekeeper.dev/depends-on |
Annotation (comma-separated workload names) declaring a workload's dependencies. Wake happens in dependency order: a workload's dependencies are scaled up and ready before it is. Deps naming an unmanaged workload are ignored; a cycle falls back to waking all at once. |
IDLE_TIMEOUT |
30m |
Idle duration before scaling to zero (Go duration). Set to 0 to disable scale-to-zero: the namespace is never auto-slept, but requests still wake one that is already asleep. |
IDLE_CHECK_INTERVAL |
30s |
How often idleness is checked. |
WAKE_TIMEOUT |
5m |
Backstop for how long a request is held while the namespace wakes (all managed workloads become ready) before giving up (503 + Retry-After). Generous so slow-but-healthy starts (large image pulls, cold nodes) aren't cut off; a wake that hits a wedged pod fails fast well before this. |
Most "it deployed but nothing works" cases come from one of these drifting out of sync:
HEALTH_PATHmust equal your readiness/liveness probe path. The probe hits this path on the pod IP; if Gatekeeper doesn't recognize it as the health path the request falls through to host-routing, 404s (no route for host: <pod-ip>:8080in the logs), and the pod never goes Ready - so the Service has no endpoints.TARGET_SELECTORmust match the labels on the workloads you want scaled (andSELF_NAMEmust be Gatekeeper's own Deployment name so it never scales itself). If the selector matches nothing, idle scaling silently does nothing.
Authentication is off unless AUTH_TOKEN is set - Gatekeeper is then a plain
scale-to-zero proxy. When set, every request except the health and callback paths
must carry the token.
| Env | Default | Purpose |
|---|---|---|
AUTH_TOKEN |
(empty = auth off) | Shared secret required on every request. |
AUTH_HEADER |
X-Gatekeeper-Token |
Header carrying the token. |
AUTH_COOKIE |
gatekeeper_session |
Cookie carrying the token. |
LOGIN_URL |
(empty) | If set, unauthenticated browsers are redirected here as ?redirect=<original-url>; if empty, they get 401. |
AUTH_CALLBACK_PATH |
/_gatekeeper/auth |
Page that reads ?token=&next=, sets the cookie, and redirects to next. |
COOKIE_DOMAIN |
(empty) | Scope the cookie to .<domain> (shared across subdomains); empty = host-only. |
Auth modes:
- No auth - leave
AUTH_TOKENunset. - Static token - set
AUTH_TOKEN(and optionallyAUTH_HEADER). Callers send the header; missing/invalid gets 401. Good for service-to-service traffic or an upstream gateway that injects the header. - Browser login - also set
LOGIN_URL(and usuallyCOOKIE_DOMAIN). Unauthenticated browsers are sent to your login, which authenticates the user and then redirects tohttps://<host>{AUTH_CALLBACK_PATH}?token=<token>&next=<original>to drop the cookie. Subsequent requests carry the cookie.
Gatekeeper runs as its own ServiceAccount and needs a namespaced Role:
rules:
- apiGroups: ["apps"]
resources: ["deployments", "statefulsets"]
verbs: ["get", "list", "watch", "patch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["list"]patch on the workloads sets spec.replicas and the wake annotation in one merge
patch; their status.readyReplicas is polled to know when the namespace is ready.
pods are listed on wake to fail fast when a managed pod is wedged (bad image,
crash loop) instead of waiting out WAKE_TIMEOUT. deploy/ contains the full set
(ServiceAccount, Role, RoleBinding).
API-server egress (CNIs that enforce NetworkPolicy: AWS VPC CNI
aws-node, Calico, Cilium, ...). Under a default-deny egress policy, Gatekeeper's scale calls to the Kubernetes API server are dropped - you'll seedial tcp <apiserver-ip>:443: i/o timeout- until you allow it. Applydeploy/networkpolicy-apiserver-egress.yaml, which permits egress to0.0.0.0/0:443,6443for the Gatekeeper pod only. Two subtleties bite here:
- A broad egress policy that
ipBlocks0.0.0.0/0with anexceptfor RFC1918 ranges still blocks the API server, since its ClusterIP/ENI lives in those ranges.- Worse: if that
except-bearing policy also selects the Gatekeeper pod, the AWS VPC CNI agent enforces eachexceptas a longest-prefix-match deny that shadows this policy's0.0.0.0/0allow (the/12deny beats the/0allow). Adding the allow is then not enough - keep theexcept-bearing policy off the Gatekeeper pod (e.g.podSelector: { matchExpressions: [{ key: app, operator: NotIn, values: [gatekeeper] }] }) or make the API-server allow more specific than theexcept(e.g. the service CIDR/16).On Cilium, a plain
ipBlockmay not match the API-server identity - use aCiliumNetworkPolicywithtoEntities: [kube-apiserver]instead.
make all # gofmt check + vet + test + build
make docker # build the container image
./e2e/run.sh # end-to-end test on a local cluster (OrbStack / kind / docker-desktop)Go 1.26+. The module is github.com/autonoma-ai/gatekeeper.
MIT - see LICENSE.