Skip to content

fix(helm): add pgdata init container and PGDATA env for PostgreSQL StatefulSet#464

Open
nethi wants to merge 2 commits intojonwiggins:mainfrom
nethi:feat/public-fix-postgresql-pgdata
Open

fix(helm): add pgdata init container and PGDATA env for PostgreSQL StatefulSet#464
nethi wants to merge 2 commits intojonwiggins:mainfrom
nethi:feat/public-fix-postgresql-pgdata

Conversation

@nethi
Copy link
Copy Markdown
Contributor

@nethi nethi commented Apr 20, 2026

Summary

Fixes PostgreSQL pod startup failures when TLS is enabled. When using K8s in Docker desktop on MacOS, postgres pod failed with permission errors because the /var/lib/postgresql/data/pgdata subdirectory did not exist before PostgreSQL initialization.

PostgreSQL 18 Requirement: PostgreSQL 18+ enforces stricter requirements for the data directory structure. When using persistent volumes, PostgreSQL expects PGDATA to point to a subdirectory (not the mount root) to avoid conflicts with volume metadata files like lost+found. Without this, PostgreSQL initialization fails.

Changes

  • helm/optio/templates/postgres.yaml:
    • Add init-pgdata init container to pre-create /var/lib/postgresql/data/pgdata subdirectory with correct ownership (postgres user UID 999)
    • Set PGDATA=/var/lib/postgresql/data/pgdata environment variable to use subdirectory instead of mount root
    • Add fsGroupChangePolicy: OnRootMismatch to avoid slow permission changes on every pod restart
    • Init container runs as non-root with dropped capabilities for security

Testing

  • Tests pass (pnpm turbo test)
  • Typechecks pass (pnpm turbo typecheck)

Verified PostgreSQL pod starts successfully with TLS enabled in both fresh deployments and upgrades.

Related

Resolves pod crash loop when deploying with postgresql.tls.enabled=true. Required for PostgreSQL 18 compatibility and persistent volume best practices.

…atefulSet

The PostgreSQL pod failed to start with permission errors when TLS was
enabled because /var/lib/postgresql/data/pgdata did not exist before the
main container ran. Adds a busybox init container to pre-create and chmod
the directory, and sets PGDATA explicitly so PostgreSQL uses the subdirectory
rather than the mount root.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jonwiggins
Copy link
Copy Markdown
Owner

Thanks for this — the intent (standard PGDATA-as-subdirectory pattern) is right, but two issues before this can merge:

1. init-pgdata is scoped under the TLS conditional. In postgres.yaml the new init container sits inside {{- if .Values.postgresql.tls.enabled }} alongside init-tls, so when TLS is disabled the pgdata subdir isn't pre-created. It should live outside that block (either as its own unconditional initContainers: section, or by restructuring so init-tls becomes conditional within a shared list).

2. PGDATA=/var/lib/postgresql/data/pgdata is a breaking change for existing deployments. Postgres will see an empty pgdata subdir inside the existing PVC, ignore the database at the mount root, and initialize a fresh empty cluster. For fresh installs this is fine, but anyone upgrading loses their data.

Options:

  • Gate the PGDATA change behind a values toggle (e.g. postgresql.useSubdirectory) defaulting to false for back-compat, true for new installs.
  • Or add an init step that detects an existing cluster at the mount root and skips the subdir rewrite / migrates it.
  • At minimum, document the required manual migration in the chart notes/CHANGELOG and bump the chart version.

The fsGroupChangePolicy: OnRootMismatch addition is a nice startup-time win — keep that either way.

@nethi
Copy link
Copy Markdown
Contributor Author

nethi commented Apr 20, 2026

Good points. will update PR with the suggested fixes

Addresses PR jonwiggins#464 feedback:
- Issue jonwiggins#1: init containers no longer inside TLS conditional
- Issue jonwiggins#2: PGDATA change now opt-in via usePgdataSubdirectory toggle
- Added migration documentation in NOTES.txt and CHANGELOG
- Bumped chart version 0.1.0 → 0.1.1

Root Cause Fix (Docker Desktop & other platforms):
The PostgreSQL container with capabilities.drop=ALL cannot chmod its data
directory on Docker Desktop and some K8s distros. Fixed by adding init-pgdata
container running as root to pre-fix permissions before PostgreSQL starts.

Changes:

1. values.yaml:
   - Add postgresql.usePgdataSubdirectory (default: false)
   - Backward compatible - existing deployments preserved

2. postgres.yaml:
   - Add init-pgdata container (always runs as root)
   - Fixes ownership/permissions: chown 999:999 + chmod 700
   - Conditionally creates /var/lib/postgresql/data/pgdata subdirectory
   - Combines permission fix + subdirectory in single container
   - Make PGDATA env conditional (only set when subdirectory enabled)
   - Add fsGroupChangePolicy: OnRootMismatch (performance optimization)

3. NOTES.txt:
   - Show migration warning when subdirectory disabled
   - Explain PostgreSQL 18+ requirements
   - Reference CHANGELOG for migration steps

4. Chart.yaml:
   - Bump version: 0.1.0 → 0.1.1
   - Update appVersion: 0.1.0 → 0.3.1

5. CHANGELOG.md:
   - Document new feature in [Unreleased] section
   - Note breaking change warning

Init Container Strategy:
- init-pgdata: Always runs, fixes permissions + optional subdirectory
- init-tls: Only when postgresql.tls.enabled=true

PostgreSQL Version Support:
- PostgreSQL 16: Works with or without subdirectory
- PostgreSQL 18: Requires usePgdataSubdirectory=true (Docker image requirement)

Tested Scenarios:
✓ PostgreSQL 16 + subdirectory disabled + TLS enabled
✓ PostgreSQL 16 + subdirectory enabled + TLS enabled
✓ PostgreSQL 18 + subdirectory enabled + TLS enabled
✓ Migration warning displays correctly in NOTES.txt
✓ Fresh deployments on Docker Desktop
✓ Helm templating inside shell args verified

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@jonwiggins
Copy link
Copy Markdown
Owner

Thanks for tackling this — the flag-gated approach is the right shape. A few things to address before merge:

Issues

1. PR description contradicts the code
The description says "Init container runs as non-root with dropped capabilities for security", but the actual init-pgdata securityContext is:

securityContext:
  runAsUser: 0
  runAsNonRoot: false

No capabilities.drop: [ALL] either. Either tighten the securityContext to match, or correct the PR body.

2. init-pgdata runs unconditionally, even when the flag is off
Previously initContainers: only existed under if .Values.postgresql.tls.enabled. Now init-pgdata always runs and does chown -R 999:999 /var/lib/postgresql/data + chmod 700 ... on every pod start.

  • chown -R is O(files) — meaningful restart latency on larger DBs.
  • Redundant with the newly-added fsGroupChangePolicy: OnRootMismatch (that field exists precisely so the kubelet handles this once).
  • Redundant with the existing init-tls chown/chmod when TLS is on.
  • Behavior change for every existing deployment, not just opt-in ones.

Suggest gating the container behind usePgdataSubdirectory and limiting it to the mkdir, letting fsGroupChangePolicy handle perms.

3. chmod 700 on the PVC mount root with flag off
When usePgdataSubdirectory: false, the init still does chmod 700 /var/lib/postgresql/data. That directory may contain lost+found owned by root — the recursive chown can fail on it, and tightening perms on the mount root is fragile. Avoiding this is the whole reason PG18 wants the subdirectory.

4. "PostgreSQL 18 requirement" framing is misleading
The chart still defaults to postgres:16 and the flag defaults to false, so nothing here actually enables PG18 by default. Either bump the default image (separate PR) or soften the description to "opt-in prep for PG18."

5. Chart versioning
0.1.0 → 0.1.1 is a patch bump, but this adds a new value and changes pod structure for everyone. Strict SemVer would say minor (0.2.0). The appVersion jump from 0.1.0 to 0.3.1 is a catch-up worth calling out in the CHANGELOG.

6. No upgrade-path test evidence
The checklist covers fresh deploys and upgrades generically, but not specifically: existing deployment with usePgdataSubdirectory: false after this change — does the now-always-running init container cause issues on a populated PVC? That's the migration risk most users will hit.

Smaller notes

  • CHANGELOG entry is good. Consider mirroring into "Changed" since existing pods get a new init container regardless of the flag.
  • NOTES.txt wording is solid.
  • fsGroupChangePolicy: OnRootMismatch is a nice addition on its own.

Suggested minimal fixes

  1. Gate init-pgdata behind if .Values.postgresql.usePgdataSubdirectory so default-off deployments see no change.
  2. Drop the chown -R — let fsGroupChangePolicy handle it.
  3. Either actually drop capabilities + run non-root, or remove that claim from the PR description.
  4. Clarify the description: this prepares for PG18, it doesn't enable it.

@nethi
Copy link
Copy Markdown
Contributor Author

nethi commented Apr 22, 2026

Thanks @jonwiggins will address the issues.
Also, I see this PR b94c666 for the issue #468

I was originally trying to tackle the above issue as I ran into postgres in crashloop with fresh install. Now that that is fixed, may be I should just convert this to "enabling/supporting postgres 18", assuming that there is still an issue.

@jonwiggins
Copy link
Copy Markdown
Owner

@nethi — circling back on this. Looks like #468 (the original crashloop motivator) is already fixed in b94c666, and you mentioned retargeting this as PG18 enablement instead. That sounds like the right call — drop the unconditional init-pgdata container, keep everything behind the usePgdataSubdirectory flag, and frame it as opt-in PG18 prep with a clear migration note in NOTES.txt and CHANGELOG.

Want to push that pivot when you have a chance? Happy to merge once it's reframed; it's been sitting for a couple weeks and I don't want it to bitrot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants