Skip to content

fix: reap HEALTHCHECK zombies by running tini as PID 1#4251

Closed
JamBalaya56562 wants to merge 1 commit intoDokploy:canaryfrom
JamBalaya56562:fix/tini-pid1-zombie-reap
Closed

fix: reap HEALTHCHECK zombies by running tini as PID 1#4251
JamBalaya56562 wants to merge 1 commit intoDokploy:canaryfrom
JamBalaya56562:fix/tini-pid1-zombie-reap

Conversation

@JamBalaya56562
Copy link
Copy Markdown
Contributor

@JamBalaya56562 JamBalaya56562 commented Apr 19, 2026

Bug report

On a running Dokploy host (observed on v0.29.0), [curl] <defunct> zombie processes accumulate on the host and share the same parent: the pnpm start Node.js process running as PID 1 inside the dokploy/dokploy container. Over several days the zombie count grows into the hundreds, eventually threatening the per-PID-namespace process limit.

$ ps -ef | grep defunct | head
root   12345  1234  0 ...  [curl] <defunct>
root   12346  1234  0 ...  [curl] <defunct>
...

Root cause

  1. The Dockerfile declares a HEALTHCHECK that runs curl -fs http://localhost:3000/api/trpc/settings.health every 10 seconds.
  2. The final CMD is ["sh", "-c", "pnpm run wait-for-postgres && exec pnpm start"]. The exec replaces sh with Node.js, so Node.js becomes PID 1 inside the container.
  3. The HEALTHCHECK command runs inside the container's PID namespace. When its processes exit, the kernel ultimately delivers SIGCHLD to PID 1.
  4. Node.js only reaps children it spawned itself via child_process.*. It does not install a generic waitpid(-1, …, WNOHANG) reaper, so any process parented to it by the container runtime stays in the zombie state until PID 1 exits.

This is the well-known "Node.js as PID 1" containerization pitfall — not a bug in application code. (Confirmed by grepping the repository: no child_process call in any Node file invokes curl; the only runtime curl under PID 1 comes from the HEALTHCHECK directive.)

Fix

Run the container under an init that correctly reaps orphaned children. This PR installs tini (already packaged in Debian, ~28 KB) and sets it as the container ENTRYPOINT:

RUN apt-get update && apt-get install -y tini curl unzip zip \
    apache2-utils iproute2 rsync git-lfs && \
    git lfs install && rm -rf /var/lib/apt/lists/*
...
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["sh", "-c", "pnpm run wait-for-postgres && exec pnpm start"]

tini handles SIGCHLD and reaps any child process regardless of which component inside the container spawned it, while still forwarding signals (SIGTERM/SIGINT) to the Node.js process. This is the same mechanism Docker uses internally for docker run --init.

The HEALTHCHECK directive itself is preserved unchanged — it is still useful as a Swarm rolling-update readiness gate (it waits for the DB-backed settings.health tRPC endpoint, not just process start) and for docker ps operator visibility. With tini in place the curl processes it spawns are now reaped normally.

Only the main Dockerfile is modified. Dockerfile.cloud, Dockerfile.schedule, and Dockerfile.server do not declare a HEALTHCHECK and do not reproduce the issue today, so they are out of scope for this PR.

Testing

Built the image locally from this branch (docker build -t dokploy/dokploy:tini-test -f Dockerfile .) and ran two verifications:

1. tini is installed and is PID 1

$ docker run --rm dokploy/dokploy:tini-test /usr/bin/tini --version
tini version 0.19.0

$ CID=$(docker run -d --rm dokploy/dokploy:tini-test sleep infinity)
$ docker exec "$CID" ps -o pid,ppid,comm -p 1
  PID  PPID COMMAND
    1     0 tini

2. Children that would otherwise zombify are reaped

20 short-lived processes were spawned as children of PID 1 via docker exec -d. After 3 seconds:

$ docker exec "$CID" sh -c 'ps -eo stat,cmd | awk "\$1 ~ /^Z/" | wc -l'
0

$ docker exec "$CID" ps -eo pid,ppid,stat,cmd
  PID  PPID STAT CMD
    1     0 Ss   /usr/bin/tini -- sleep infinity
    7     1 S    sleep infinity

3. HEALTHCHECK firing for 65 seconds leaves no zombies

With the image's baked-in HEALTHCHECK firing (no Postgres, so the check intentionally fails — we only care about the curl process lifecycle, not the health result):

$ docker inspect --format '{{.State.Health.Status}} streak={{.State.Health.FailingStreak}}' "$CID"
starting streak=6

$ docker exec "$CID" sh -c 'ps -eo stat,cmd | awk "\$1 ~ /^Z/" | wc -l'
0

$ docker exec "$CID" ps -eo pid,ppid,stat,cmd
  PID  PPID STAT CMD
    1     0 Ss   /usr/bin/tini -- sleep infinity
    7     1 S    sleep infinity

Six HEALTHCHECK cycles completed, every curl process was reaped immediately, and docker stop terminated the container promptly (signal forwarding works).

Greptile Summary

This PR fixes zombie process accumulation caused by Node.js running as PID 1 inside the dokploy/dokploy container. It installs tini via apt-get and sets it as the container ENTRYPOINT, which correctly reaps all orphaned child processes (including those spawned by HEALTHCHECK) while forwarding signals to Node.js. The change is minimal, well-reasoned, and follows standard container best practices.

Confidence Score: 5/5

Safe to merge — the change is a single-file, minimal fix using a well-established init solution.

No P0 or P1 issues found. The tini installation path (/usr/bin/tini) is correct for Debian-based images, the -- sentinel correctly separates tini options from the CMD, and signal forwarding to Node.js works as expected.

No files require special attention.

Reviews (1): Last reviewed commit: "fix: reap HEALTHCHECK zombies by running..." | Re-trigger Greptile

The Dockerfile's CMD uses `exec pnpm start`, which makes Node.js PID 1
inside the container. Node.js only reaps children it spawned via
`child_process`; it does not install a generic waitpid reaper, so
processes spawned into the container by other means (notably the
`HEALTHCHECK` curl firing every 10s) accumulate as `<defunct>` zombies
under PID 1. Over days, hundreds of `[curl] <defunct>` build up.

Install tini and set it as ENTRYPOINT so it reaps any orphaned child
regardless of origin while still forwarding signals to Node.
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Apr 19, 2026
@github-actions github-actions bot closed this Apr 19, 2026
@dosubot dosubot bot added the bug Something isn't working label Apr 19, 2026
@JamBalaya56562 JamBalaya56562 deleted the fix/tini-pid1-zombie-reap branch April 19, 2026 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant