Skip to content

Improved production image pull time by deduplicating chown layers#29017

Merged
acburdine merged 1 commit into
mainfrom
ghost-image-dedupe-chown-layers
Jul 1, 2026
Merged

Improved production image pull time by deduplicating chown layers#29017
acburdine merged 1 commit into
mainfrom
ghost-image-dedupe-chown-layers

Conversation

@acburdine

Copy link
Copy Markdown
Member

Summary

Investigating E2E CI timing showed the biggest per-shard time-sink is loading the Ghost image (ghost-e2e, which is FROM the production full image). Digging into the image's layers with docker history revealed why: the production Dockerfile builds app code in one layer, then re-owns it with chown -R, which on overlayfs copies every touched file into a new layer — storing the tiny-file-heavy node_modules and admin build twice.

docker history on the shipped ghost-e2e:latest (amd64):

Uncompressed Instruction
635 MB RUN … pnpm install --prod … (node_modules)
602 MB RUN mkdir … chown -R nobody:nogroup /home/ghost/* ← duplicate of node_modules
81.1 MB COPY core/built/admin
81.1 MB RUN chown -R nobody:nogroup core/built/admin ← duplicate of admin

That's ~680 MB of duplicated, small-file content — the slowest kind to extract — pulled on every one of the 12 E2E shards and every production image pull.

Fix

Set ownership at copy time instead of re-chowning:

  • admin (full stage): COPY --chown=nobody:nogroup … replaces COPY + RUN chown -R.
  • node_modules: moved into a discarded deps stage and pulled in once via COPY --chown=nobody:nogroup --from=deps, so it's stored once with final ownership (and the shipped layer no longer carries build-essential/python3 apt install/purge churn).
  • source: COPY --chown + cp -a (preserves ownership) so base_content/default need no re-chown; only the writable content/log flip to ghost.

The chown -R nobody:nogroup /home/ghost/* and chown -R core/built/admin layers are gone.

Ownership model is unchanged

This touches the production image, so the app-code-owned-by-nobody / content+log-owned-by-ghost hardening is preserved exactly. Verified against a build of both the current and refactored Dockerfiles:

  • Ownership is byte-identical across all paths (home 1000:1000, app code 65534:65534, content/log 1000:1000, base_content/default 65534:65534).
  • Runtime user (ghost) can read app code but cannot modify it (touch on admin → permission denied), and can write content.
  • better-sqlite3's native binary still loads after the multi-stage copy (ran a real in-memory query round-trip).
  • Layer duplication eliminated; synthetic build shrank 478 MB → 374 MB (real savings larger, since real node_modules is ~635 MB).

Validation notes

  • Local validation used a synthetic-but-faithful context (real Dockerfile logic, real better-sqlite3 native build). CI builds the real image via job_build_artifacts, and job_ghost-cli + the E2E lane exercise it end-to-end.
  • Worth a maintainer's eye on the ownership model given this is the shipped image.

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6f88bd2e-7c94-454a-967a-24adb2f82fe5

📥 Commits

Reviewing files that changed from the base of the PR and between dc2b679 and da51832.

📒 Files selected for processing (1)
  • Dockerfile.production
🚧 Files skipped from review as they are similar to previous changes (1)
  • Dockerfile.production

Walkthrough

Changes

The production Dockerfile is restructured into a multi-stage build. A new deps stage installs production node_modules, including compiling better-sqlite3 with temporary build tooling, and is not included in the final image. The core stage now copies node_modules from deps, copies application sources excluding the admin build, preserves ownership during content and theme placement, and limits ownership normalization to default, content, and log. The full stage replaces its previous admin ownership sequence with a single ownership-setting COPY for the admin build.

Sequence Diagram(s)

Not applicable.

Suggested reviewers: rob-ghost

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: reducing production image pull time by removing duplicate chown layers.
Description check ✅ Passed The description is directly related to the Dockerfile refactor and explains the ownership and layer-deduplication changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ghost-image-dedupe-chown-layers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@nx-cloud

nx-cloud Bot commented Jul 1, 2026

Copy link
Copy Markdown

🤖 Nx Cloud AI Fix

Ensure the fix-ci command is configured to always run in your CI pipeline to get automatic fixes in future runs. For more information, please see https://nx.dev/ci/features/self-healing-ci


View your CI Pipeline Execution ↗ for commit dc2b679

Command Status Duration Result
nx run @tryghost/admin:build ✅ Succeeded 7s View ↗
nx run ghost:build:assets ✅ Succeeded 2s View ↗
nx run ghost:build:tsc ✅ Succeeded 6s View ↗
nx run-many -t lint -p ghost-monorepo ✅ Succeeded <1s View ↗
nx run-many --target=build --projects=tag:publi... ✅ Succeeded <1s View ↗

💡 Verify your cache is correct by running tasks in a sandbox. Read docs ↗


☁️ Nx Cloud last updated this comment at 2026-07-01 17:02:45 UTC

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
Dockerfile.production (3)

37-37: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

Make the lockfile contract explicit.

Add --frozen-lockfile so this production image cannot silently resolve a different dependency graph when the lockfile and manifests drift.

Proposed change
-    pnpm install --ignore-scripts --prod --prefer-offline && \
+    pnpm install --frozen-lockfile --ignore-scripts --prod --prefer-offline && \
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.production` at line 37, The production install step in the
Dockerfile currently allows pnpm to resolve dependencies when manifests drift
from the lockfile. Update the pnpm install command to include --frozen-lockfile
so the build fails instead of silently changing the dependency graph, keeping
the lockfile contract explicit in the production image.

59-61: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Drop the duplicate manifest/component copies in core.

Lines 59-60 are copied again by the broad COPY on Line 61, with no intervening cacheable build step. Removing them avoids extra layer content while preserving the final filesystem.

Proposed change
-COPY --chown=nobody:nogroup package.json pnpm-lock.yaml pnpm-workspace.yaml ./
-COPY --chown=nobody:nogroup components ./components
 COPY --chown=nobody:nogroup --exclude=core/built/admin . .
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.production` around lines 59 - 61, The Dockerfile.production build
stage is copying the same manifest and components twice: the explicit COPY steps
for package.json, pnpm-lock.yaml, pnpm-workspace.yaml, and components are
duplicated by the later broad COPY. Remove the redundant earlier COPY
instructions and keep the existing broad COPY with its exclude so the final
filesystem stays the same; use the COPY statements around the core build stage
to locate the change.

61-71: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy lift

Avoid the remaining recursive ownership rewrite for content.

content is copied as nobody:nogroup on Line 61, then rewritten to ghost:ghost on Line 71. That still duplicates the touched content tree in an overlayfs layer; consider excluding content from the broad copy and adding it separately with its final ownership.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.production` around lines 61 - 71, The Dockerfile.production build
still rewrites the entire content tree ownership after copying it, which causes
unnecessary overlayfs duplication. Update the COPY and RUN steps around the
existing `COPY --chown=nobody:nogroup --exclude=core/built/admin` and the `RUN
... chown -R ghost:ghost content log` block so `content` is added separately
with its final ownership instead of being copied broadly and then recursively
chowned. Keep the existing handling for `default`, `base_content`, and `log`,
but avoid any later recursive ownership change on `content`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@Dockerfile.production`:
- Line 37: The production install step in the Dockerfile currently allows pnpm
to resolve dependencies when manifests drift from the lockfile. Update the pnpm
install command to include --frozen-lockfile so the build fails instead of
silently changing the dependency graph, keeping the lockfile contract explicit
in the production image.
- Around line 59-61: The Dockerfile.production build stage is copying the same
manifest and components twice: the explicit COPY steps for package.json,
pnpm-lock.yaml, pnpm-workspace.yaml, and components are duplicated by the later
broad COPY. Remove the redundant earlier COPY instructions and keep the existing
broad COPY with its exclude so the final filesystem stays the same; use the COPY
statements around the core build stage to locate the change.
- Around line 61-71: The Dockerfile.production build still rewrites the entire
content tree ownership after copying it, which causes unnecessary overlayfs
duplication. Update the COPY and RUN steps around the existing `COPY
--chown=nobody:nogroup --exclude=core/built/admin` and the `RUN ... chown -R
ghost:ghost content log` block so `content` is added separately with its final
ownership instead of being copied broadly and then recursively chowned. Keep the
existing handling for `default`, `base_content`, and `log`, but avoid any later
recursive ownership change on `content`.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2f939191-61e6-4721-b973-a0927235888d

📥 Commits

Reviewing files that changed from the base of the PR and between 39f5b2d and dc2b679.

📒 Files selected for processing (1)
  • Dockerfile.production

no ref

- on overlayfs a `chown -R` copies every touched file into a new layer, so the
  core stage's `chown -R nobody:nogroup /home/ghost/*` duplicated node_modules
  (~600MB) and the full stage's `chown -R core/built/admin` duplicated the admin
  build (~80MB) — inflating every pull of the ghost / ghost-e2e images, which the
  12 E2E shards showed spending most of their setup time loading
- set ownership at COPY time (COPY --chown) instead, and move the production
  node_modules into a discarded `deps` stage copied in once via COPY --chown
  --from, so app code is stored once with its final ownership
- ownership model is unchanged (verified byte-for-byte: app code nobody:nogroup,
  content/log ghost, home ghost; runtime user still cannot modify app code, and
  better-sqlite3's native binary still loads after the multi-stage copy)
- build-essential/python3 now live only in the deps stage, so the shipped
  node_modules layer no longer carries apt install/purge churn either
@acburdine acburdine force-pushed the ghost-image-dedupe-chown-layers branch from dc2b679 to da51832 Compare July 1, 2026 17:00
@acburdine acburdine enabled auto-merge (squash) July 1, 2026 17:09
@acburdine acburdine merged commit 4ada2b8 into main Jul 1, 2026
41 checks passed
@acburdine acburdine deleted the ghost-image-dedupe-chown-layers branch July 1, 2026 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant