Skip to content

security: gateway serves plaintext Mongo wire protocol with tls.gateway.mode=Disabled, contradicting docs #356

@xgerman

Description

@xgerman

Summary

spec.tls.gateway.mode: Disabled causes the DocumentDB gateway sidecar to serve the Mongo wire protocol in plaintext, contradicting the public docs which promise that TLS is always on. This is a silent footgun — clients who trust status.connectionString (which always contains tls=true) are safe, but any attacker with on-cluster pod-network access can bypass TLS entirely by connecting with tls=false.

Recommend removing Disabled from the GatewayTLS.Mode enum so unencrypted traffic is impossible by construction.

Evidence of the contradiction

Docs promise encryption on all modesdocs/operator-public-documentation/preview/configuration/tls.md:43:

Disabled mode means the operator does not manage TLS certificates. However, the gateway still encrypts all connections using an internally generated self-signed certificate. Clients must connect with tls=true&tlsAllowInvalidCertificates=true.

E2E test proves plaintext is actually servedtest/e2e/tests/tls/tls_disabled_test.go:17-19, 41-54:

// The gateway still listens but accepts plain-text mongo wire
// protocol. This spec verifies the happy-path: a freshly-created
// DocumentDB with TLS disabled accepts an unencrypted connection
// from the mongo driver.
...
client, err := mongohelper.NewClient(connectCtx, mongohelper.ClientOptions{
    Host:     host,
    Port:     port,
    User:     tlsCredentialUser,
    Password: tlsCredentialPassword,
    TLS:      false,
})
...
Eventually(func() error {
    return mongohelper.Ping(connectCtx, client)
}, ...).Should(Succeed(), "plaintext ping should succeed when TLS is disabled")

Ping succeeds, so the gateway is genuinely listening without TLS on port 10260.

Root cause

operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle.go:188-232:

  • When gatewayTLSSecret parameter is absent (which is what Mode=Disabled produces — operator/src/internal/controller/certificate_controller.go:64-74 short-circuits the cert reconciler and leaves status.TLS.Ready=false), the sidecar is started without --cert-path/--key-file CLI args and without the CERT_PATH/KEY_FILE/TLS_CERT_DIR env vars.
  • The upstream gateway binary then falls back to plaintext, not to self-generated self-signed. The docs' promise is wrong.

Why this matters

  1. Defense in depth violated. Mongo credentials travel on the pod network in the clear. Any compromised workload in the same cluster / same VNet can trivially credential-harvest.
  2. Docs encourage the belief that TLS is always on, so users selecting Mode=Disabled for "dev" likely think they're getting a self-signed cert they can skip-verify. They're actually getting plaintext.
  3. status.connectionString lies. It contains tls=true unconditionally (operator/src/internal/utils/util.go:423) regardless of Mode=Disabled. Clients pasting the published string work; clients who read the CR spec do not. The two contracts diverge silently.
  4. SCRAM-SHA-256 authentication over plaintext still leaks enough for offline brute force given a modern wordlist.

Proposed fix

Option A (recommended): remove unencrypted traffic as a possibility.

  1. Drop Disabled from the GatewayTLS.Mode enum in operator/src/api/preview/documentdb_types.go:237 — change validation to Enum=SelfSigned;CertManager;Provided.
  2. Default an unset Mode to SelfSigned so existing CRs that omit the field keep working (and keep encryption on).
  3. Update the cert controller's "empty or Disabled" branch in operator/src/internal/controller/certificate_controller.go:64 to treat empty as SelfSigned.
  4. Remove test/e2e/tests/tls/tls_disabled_test.go and test/e2e/manifests/mixins/tls_disabled.yaml.template.
  5. Remove the "Disabled" tab from docs/operator-public-documentation/preview/configuration/tls.md.
  6. Note in CHANGELOG as a breaking change for pre-GA; migration path is "remove mode: Disabled (or the whole tls: block) to get SelfSigned behavior."

Option B (fallback if Disabled must remain for some out-of-tree user): make it mean what the docs claim.

  1. Have the sidecar injector generate a self-signed cert in-cluster when Mode=Disabled (or omitted) and mount it. This is effectively what Mode=SelfSigned already does, so at that point Disabled and SelfSigned are synonyms and Option A is cleaner.

Option A is strictly better because it removes the attack surface instead of relying on correct configuration.

Migration impact

  • API change pre-GA (preview apiVersion), so the usual "no breaking changes" bar doesn't apply.
  • Users with mode: Disabled get a behavior change (plaintext → self-signed TLS). Since the documented contract was already "TLS is always on", this aligns behavior with the documented contract.

Out of scope

  • Same audit should confirm the Postgres wire protocol inside the pod is TLS'd between the gateway sidecar and the Postgres container. The gateway is launched with --pg-port 5432 against 127.0.0.1, so same-pod IPC over loopback — lower risk but worth a follow-up issue.

Companion doc bug

Independent of this fix, docs/operator-public-documentation/preview/configuration/tls.md:43 must be corrected to match reality. If Option A is taken, the section is removed entirely.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions