Improved production image pull time by deduplicating chown layers#29017
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughChangesThe production Dockerfile is restructured into a multi-stage build. A new Sequence Diagram(s)Not applicable. Suggested reviewers: 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| Command | Status | Duration | Result |
|---|---|---|---|
nx run @tryghost/admin:build |
✅ Succeeded | 7s | View ↗ |
nx run ghost:build:assets |
✅ Succeeded | 2s | View ↗ |
nx run ghost:build:tsc |
✅ Succeeded | 6s | View ↗ |
nx run-many -t lint -p ghost-monorepo |
✅ Succeeded | <1s | View ↗ |
nx run-many --target=build --projects=tag:publi... |
✅ Succeeded | <1s | View ↗ |
💡 Verify your cache is correct by running tasks in a sandbox. Read docs ↗
☁️ Nx Cloud last updated this comment at 2026-07-01 17:02:45 UTC
There was a problem hiding this comment.
🧹 Nitpick comments (3)
Dockerfile.production (3)
37-37: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick winMake the lockfile contract explicit.
Add
--frozen-lockfileso this production image cannot silently resolve a different dependency graph when the lockfile and manifests drift.Proposed change
- pnpm install --ignore-scripts --prod --prefer-offline && \ + pnpm install --frozen-lockfile --ignore-scripts --prod --prefer-offline && \🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Dockerfile.production` at line 37, The production install step in the Dockerfile currently allows pnpm to resolve dependencies when manifests drift from the lockfile. Update the pnpm install command to include --frozen-lockfile so the build fails instead of silently changing the dependency graph, keeping the lockfile contract explicit in the production image.
59-61: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winDrop the duplicate manifest/component copies in
core.Lines 59-60 are copied again by the broad
COPYon Line 61, with no intervening cacheable build step. Removing them avoids extra layer content while preserving the final filesystem.Proposed change
-COPY --chown=nobody:nogroup package.json pnpm-lock.yaml pnpm-workspace.yaml ./ -COPY --chown=nobody:nogroup components ./components COPY --chown=nobody:nogroup --exclude=core/built/admin . .🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Dockerfile.production` around lines 59 - 61, The Dockerfile.production build stage is copying the same manifest and components twice: the explicit COPY steps for package.json, pnpm-lock.yaml, pnpm-workspace.yaml, and components are duplicated by the later broad COPY. Remove the redundant earlier COPY instructions and keep the existing broad COPY with its exclude so the final filesystem stays the same; use the COPY statements around the core build stage to locate the change.
61-71: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy liftAvoid the remaining recursive ownership rewrite for
content.
contentis copied asnobody:nogroupon Line 61, then rewritten toghost:ghoston Line 71. That still duplicates the touchedcontenttree in an overlayfs layer; consider excludingcontentfrom the broad copy and adding it separately with its final ownership.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Dockerfile.production` around lines 61 - 71, The Dockerfile.production build still rewrites the entire content tree ownership after copying it, which causes unnecessary overlayfs duplication. Update the COPY and RUN steps around the existing `COPY --chown=nobody:nogroup --exclude=core/built/admin` and the `RUN ... chown -R ghost:ghost content log` block so `content` is added separately with its final ownership instead of being copied broadly and then recursively chowned. Keep the existing handling for `default`, `base_content`, and `log`, but avoid any later recursive ownership change on `content`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@Dockerfile.production`:
- Line 37: The production install step in the Dockerfile currently allows pnpm
to resolve dependencies when manifests drift from the lockfile. Update the pnpm
install command to include --frozen-lockfile so the build fails instead of
silently changing the dependency graph, keeping the lockfile contract explicit
in the production image.
- Around line 59-61: The Dockerfile.production build stage is copying the same
manifest and components twice: the explicit COPY steps for package.json,
pnpm-lock.yaml, pnpm-workspace.yaml, and components are duplicated by the later
broad COPY. Remove the redundant earlier COPY instructions and keep the existing
broad COPY with its exclude so the final filesystem stays the same; use the COPY
statements around the core build stage to locate the change.
- Around line 61-71: The Dockerfile.production build still rewrites the entire
content tree ownership after copying it, which causes unnecessary overlayfs
duplication. Update the COPY and RUN steps around the existing `COPY
--chown=nobody:nogroup --exclude=core/built/admin` and the `RUN ... chown -R
ghost:ghost content log` block so `content` is added separately with its final
ownership instead of being copied broadly and then recursively chowned. Keep the
existing handling for `default`, `base_content`, and `log`, but avoid any later
recursive ownership change on `content`.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2f939191-61e6-4721-b973-a0927235888d
📒 Files selected for processing (1)
Dockerfile.production
no ref - on overlayfs a `chown -R` copies every touched file into a new layer, so the core stage's `chown -R nobody:nogroup /home/ghost/*` duplicated node_modules (~600MB) and the full stage's `chown -R core/built/admin` duplicated the admin build (~80MB) — inflating every pull of the ghost / ghost-e2e images, which the 12 E2E shards showed spending most of their setup time loading - set ownership at COPY time (COPY --chown) instead, and move the production node_modules into a discarded `deps` stage copied in once via COPY --chown --from, so app code is stored once with its final ownership - ownership model is unchanged (verified byte-for-byte: app code nobody:nogroup, content/log ghost, home ghost; runtime user still cannot modify app code, and better-sqlite3's native binary still loads after the multi-stage copy) - build-essential/python3 now live only in the deps stage, so the shipped node_modules layer no longer carries apt install/purge churn either
dc2b679 to
da51832
Compare

Summary
Investigating E2E CI timing showed the biggest per-shard time-sink is loading the Ghost image (
ghost-e2e, which isFROMthe productionfullimage). Digging into the image's layers withdocker historyrevealed why: the production Dockerfile builds app code in one layer, then re-owns it withchown -R, which on overlayfs copies every touched file into a new layer — storing the tiny-file-heavynode_modulesand admin build twice.docker historyon the shippedghost-e2e:latest(amd64):RUN … pnpm install --prod …(node_modules)RUN mkdir … chown -R nobody:nogroup /home/ghost/*← duplicate of node_modulesCOPY core/built/adminRUN chown -R nobody:nogroup core/built/admin← duplicate of adminThat's ~680 MB of duplicated, small-file content — the slowest kind to extract — pulled on every one of the 12 E2E shards and every production image pull.
Fix
Set ownership at copy time instead of re-chowning:
COPY --chown=nobody:nogroup …replacesCOPY+RUN chown -R.depsstage and pulled in once viaCOPY --chown=nobody:nogroup --from=deps, so it's stored once with final ownership (and the shipped layer no longer carriesbuild-essential/python3apt install/purge churn).COPY --chown+cp -a(preserves ownership) sobase_content/defaultneed no re-chown; only the writablecontent/logflip toghost.The
chown -R nobody:nogroup /home/ghost/*andchown -R core/built/adminlayers are gone.Ownership model is unchanged
This touches the production image, so the app-code-owned-by-
nobody/content+log-owned-by-ghosthardening is preserved exactly. Verified against a build of both the current and refactored Dockerfiles:home1000:1000, app code65534:65534,content/log1000:1000,base_content/default65534:65534).ghost) can read app code but cannot modify it (touchon admin → permission denied), and can writecontent.better-sqlite3's native binary still loads after the multi-stage copy (ran a real in-memory query round-trip).node_modulesis ~635 MB).Validation notes
better-sqlite3native build). CI builds the real image viajob_build_artifacts, andjob_ghost-cli+ the E2E lane exercise it end-to-end.🤖 Generated with Claude Code