fix: reap HEALTHCHECK zombies by running tini as PID 1#4251
Closed
JamBalaya56562 wants to merge 1 commit intoDokploy:canaryfrom
Closed
fix: reap HEALTHCHECK zombies by running tini as PID 1#4251JamBalaya56562 wants to merge 1 commit intoDokploy:canaryfrom
JamBalaya56562 wants to merge 1 commit intoDokploy:canaryfrom
Conversation
The Dockerfile's CMD uses `exec pnpm start`, which makes Node.js PID 1 inside the container. Node.js only reaps children it spawned via `child_process`; it does not install a generic waitpid reaper, so processes spawned into the container by other means (notably the `HEALTHCHECK` curl firing every 10s) accumulate as `<defunct>` zombies under PID 1. Over days, hundreds of `[curl] <defunct>` build up. Install tini and set it as ENTRYPOINT so it reaps any orphaned child regardless of origin while still forwarding signals to Node.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug report
On a running Dokploy host (observed on v0.29.0),
[curl] <defunct>zombie processes accumulate on the host and share the same parent: thepnpm startNode.js process running as PID 1 inside thedokploy/dokploycontainer. Over several days the zombie count grows into the hundreds, eventually threatening the per-PID-namespace process limit.Root cause
Dockerfiledeclares aHEALTHCHECKthat runscurl -fs http://localhost:3000/api/trpc/settings.healthevery 10 seconds.CMDis["sh", "-c", "pnpm run wait-for-postgres && exec pnpm start"]. Theexecreplacesshwith Node.js, so Node.js becomes PID 1 inside the container.SIGCHLDto PID 1.child_process.*. It does not install a genericwaitpid(-1, …, WNOHANG)reaper, so any process parented to it by the container runtime stays in the zombie state until PID 1 exits.This is the well-known "Node.js as PID 1" containerization pitfall — not a bug in application code. (Confirmed by grepping the repository: no
child_processcall in any Node file invokescurl; the only runtimecurlunder PID 1 comes from theHEALTHCHECKdirective.)Fix
Run the container under an init that correctly reaps orphaned children. This PR installs
tini(already packaged in Debian, ~28 KB) and sets it as the containerENTRYPOINT:tinihandlesSIGCHLDand reaps any child process regardless of which component inside the container spawned it, while still forwarding signals (SIGTERM/SIGINT) to the Node.js process. This is the same mechanism Docker uses internally fordocker run --init.The
HEALTHCHECKdirective itself is preserved unchanged — it is still useful as a Swarm rolling-update readiness gate (it waits for the DB-backedsettings.healthtRPC endpoint, not just process start) and fordocker psoperator visibility. Withtiniin place thecurlprocesses it spawns are now reaped normally.Only the main
Dockerfileis modified.Dockerfile.cloud,Dockerfile.schedule, andDockerfile.serverdo not declare aHEALTHCHECKand do not reproduce the issue today, so they are out of scope for this PR.Testing
Built the image locally from this branch (
docker build -t dokploy/dokploy:tini-test -f Dockerfile .) and ran two verifications:1.
tiniis installed and is PID 12. Children that would otherwise zombify are reaped
20 short-lived processes were spawned as children of PID 1 via
docker exec -d. After 3 seconds:3. HEALTHCHECK firing for 65 seconds leaves no zombies
With the image's baked-in HEALTHCHECK firing (no Postgres, so the check intentionally fails — we only care about the curl process lifecycle, not the health result):
Six HEALTHCHECK cycles completed, every curl process was reaped immediately, and
docker stopterminated the container promptly (signal forwarding works).Greptile Summary
This PR fixes zombie process accumulation caused by Node.js running as PID 1 inside the
dokploy/dokploycontainer. It installstiniviaapt-getand sets it as the containerENTRYPOINT, which correctly reaps all orphaned child processes (including those spawned byHEALTHCHECK) while forwarding signals to Node.js. The change is minimal, well-reasoned, and follows standard container best practices.Confidence Score: 5/5
Safe to merge — the change is a single-file, minimal fix using a well-established init solution.
No P0 or P1 issues found. The tini installation path (/usr/bin/tini) is correct for Debian-based images, the -- sentinel correctly separates tini options from the CMD, and signal forwarding to Node.js works as expected.
No files require special attention.
Reviews (1): Last reviewed commit: "fix: reap HEALTHCHECK zombies by running..." | Re-trigger Greptile