refactor: unify deploy.yml and deploy-staging.yml into one workflow (#46)#50
refactor: unify deploy.yml and deploy-staging.yml into one workflow (#46)#50JohnRDOrazio merged 6 commits intomainfrom
Conversation
) Closes #46. Replaces the two near-identical deploy workflows with a single environment-parameterized one. The shared 80% (Node setup, build, SSH key handling, retry loops, timeouts) was drifting between the two files — recent examples: SCP/SSH ConnectTimeout flags landed on staging first; vars.VPS_APP_DIR migration needed two PRs. Single source of truth eliminates that. Triggers - release: published → always production - workflow_dispatch → user picks environment (default: staging) The ENVIRONMENT job-level env is computed as github.event_name == 'release' && 'production' || inputs.environment so release events automatically map to production while workflow_dispatch honors the dropdown choice. Production-only steps gated behind `if: env.ENVIRONMENT == 'production'`: - Prepare WP theme and plugin bundles - Upload WP theme/plugin bundles - Extract WP theme and plugin bundles - Activate plugins Staging shares the production WordPress backend so we never push theme/plugin from staging — that would overwrite production code. Per-environment selection of build env and target dir via the ${{ env.X == 'production' && A || B }} ternary pattern: NEXT_PUBLIC_SITE_URL: vars.NEXT_PUBLIC_SITE_URL_PROD vs vars.NEXT_PUBLIC_SITE_URL_STAGING APP_DIR: vars.VPS_APP_DIR vs vars.VPS_STAGING_APP_DIR Drive-by hardening picked up by unification: - Production now also has the SCP/SSH ConnectTimeout/ServerAlive flags that staging had (previously only on the staging file) - All ${{ secrets.* }} and ${{ vars.* }} references routed through env: blocks per workflow-injection hardening guidance - concurrency.group prevents racing deploys to the same environment Test plan - Workflow_dispatch with environment=staging deploys ONLY the Next.js bundle to vars.VPS_STAGING_APP_DIR - Workflow_dispatch with environment=production (or release event) deploys Next.js + theme + plugin and runs plugin activation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe staging workflow file was removed and its behavior merged into a single parameterized Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions
participant Runner as CI Runner
participant VPS as Remote VPS
participant WP as WordPress REST API
GH->>Runner: trigger (workflow_dispatch or release) -> set ENVIRONMENT
Runner->>Runner: checkout, setup Node, restore cache (env-specific)
Runner->>Runner: npm install, validate RESOLVED_SITE_URL/APP_DIR
Runner->>Runner: npm run build (NEXT_PUBLIC_SITE_URL per ENV)
Runner->>Runner: package Next.js -> /tmp/nextjs-bundle.tar.gz
Runner->>VPS: scp nextjs bundle (retries, timeouts, pinned host key)
Runner->>VPS: ssh extract bundle, create tmp/restart.txt (retries, timeouts)
alt ENVIRONMENT == production
Runner->>Runner: tar WP theme & plugin bundles
Runner->>VPS: scp WP tarballs
Runner->>VPS: ssh extract WP bundles
Runner->>WP: HTTP activate plugins (fail on non-2xx)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 52 minutes and 59 seconds.Comment |
Up to standards ✅🟢 Issues
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/deploy.yml (1)
196-205:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPlugin activation does not gate deployment success.
The step logs HTTP codes but never fails on activation errors, so the workflow can pass with inactive plugins. Add response validation and request timeouts/retries.
Suggested fix
- for PLUGIN in redis-queue/redis-queue cdcf-redis-translations/cdcf-redis-translations; do + for PLUGIN in redis-queue/redis-queue cdcf-redis-translations/cdcf-redis-translations; do SLUG=$(echo "$PLUGIN" | tr '/' '%2F') STATUS=$(curl -s -o /dev/null -w '%{http_code}' \ + --connect-timeout 10 --max-time 30 --retry 3 --retry-delay 2 \ -X POST "$WP_URL/$SLUG" \ -H "Authorization: Basic $AUTH" \ -H "Content-Type: application/json" \ -d '{"status":"active"}') echo "Activate $PLUGIN: HTTP $STATUS" + if [[ ! "$STATUS" =~ ^2 ]]; then + echo "Activation failed for $PLUGIN" + exit 1 + fi done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/deploy.yml around lines 196 - 205, The plugin-activation loop logs HTTP codes but doesn’t fail the job on errors; modify the loop that iterates PLUGIN/SLUG and performs the curl POST (the command that sets STATUS) to add curl retries/timeouts (e.g., --max-time, --retry, --retry-delay, --retry-connrefused) and then validate STATUS after the call (expect 200/201/204); if STATUS is not an accepted success code, print the response and exit 1 so the workflow fails. Keep the same variables (WP_URL, AUTH, SLUG, STATUS, PLUGIN) and implement a small retry wrapper (e.g., up to 3 attempts) around the curl call before final failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/deploy.yml:
- Around line 147-151: The remote deploy step deletes the uploaded tarball too
early (rm /tmp/nextjs-bundle.tar.gz) which prevents retries if a later command
fails; move the tarball removal to the very end of the command chain (after
touch "$APP_DIR/tmp/restart.txt") so extraction and restart steps complete
successfully before cleanup, and apply the same change to the second SSH
extraction sequence referenced in the file (the occurrence around lines
174-179).
- Line 92: The workflow currently uses ssh-keyscan -H "$VPS_HOST" >>
~/.ssh/known_hosts which performs trust-on-first-use; replace this runtime
keyscan with a pinned known_hosts entry supplied via a secret or repository
variable (e.g., add a secret like KNOWN_HOSTS containing the host key and write
that into ~/.ssh/known_hosts before SSH), update references to $VPS_HOST
accordingly, and remove the ssh-keyscan invocation so the job reads the
known_hosts content from the secret instead of scanning at runtime.
---
Outside diff comments:
In @.github/workflows/deploy.yml:
- Around line 196-205: The plugin-activation loop logs HTTP codes but doesn’t
fail the job on errors; modify the loop that iterates PLUGIN/SLUG and performs
the curl POST (the command that sets STATUS) to add curl retries/timeouts (e.g.,
--max-time, --retry, --retry-delay, --retry-connrefused) and then validate
STATUS after the call (expect 200/201/204); if STATUS is not an accepted success
code, print the response and exit 1 so the workflow fails. Keep the same
variables (WP_URL, AUTH, SLUG, STATUS, PLUGIN) and implement a small retry
wrapper (e.g., up to 3 attempts) around the curl call before final failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: df35d8cb-135a-4a4f-a5fb-a1e8f36e75bf
📒 Files selected for processing (2)
.github/workflows/deploy-staging.yml.github/workflows/deploy.yml
💤 Files with no reviewable changes (1)
- .github/workflows/deploy-staging.yml
CodeRabbit caught a real ordering bug: the rm /tmp/<bundle>.tar.gz
calls were positioned mid-chain, BEFORE mkdir/touch (and BEFORE the
plugin extraction in the WP step). If a later command in the chain
fails (disk full, permissions on $APP_DIR/tmp, etc.), the next
retry attempt has no tarball to extract from and fails irrecoverably
at \`tar -xzf\`.
Moved both rm operations to the end of their SSH chain so:
1. They only run after every other operation succeeds
2. If anything else fails, the tarball is still on disk for the
next retry attempt to use
Also collapsed the two rm calls in the WP step into a single
\`rm /tmp/wp-theme.tar.gz /tmp/wp-plugin-redis-translations.tar.gz\`
since they're both at the end now.
Note: CodeRabbit also flagged ssh-keyscan host-key TOFU on this same
PR. That was deliberately left as-is on PR #42 to match the existing
production pattern (the user-level decision was: drop VPS_HOST_KEY
pinning so staging mirrors prod). Not changing here. See discussion
on #42 for context.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…TOFU) Closes the deferred half of CodeRabbit's review. Previously we used \`ssh-keyscan -H "$VPS_HOST"\` which is trust-on-first-use: the runner accepts whatever key the responding server presents the first time. A network-level MITM at that moment could intercept the session. Now reads the pinned host key from a new repo VARIABLE \`VPS_HOST_KEY\` (multi-line, contains all three host-key types captured via \`ssh-keyscan 92.222.13.25\` against the production VPS). The workflow refuses to deploy if the variable is empty — explicit failure beats silently falling back to TOFU. The variable has already been populated in the cdcf-website repo settings. If the server's SSH host key is ever rotated, regenerate with \`ssh-keyscan <host>\` and update the variable; deploys will fail fast on key mismatch until that happens, which is the intended behavior. Was deferred on PR #42 to match production's then-current pattern. With #50 unifying the workflows we get to do this once across both environments. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/deploy.yml (1)
208-214:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail deployment when plugin activation returns non-2xx.
This loop only logs HTTP codes. A 4xx/5xx still leaves the step green, so deployment can report success with plugins not activated.
Suggested fix
- STATUS=$(curl -s -o /dev/null -w '%{http_code}' \ + STATUS=$(curl -sS -o /dev/null -w '%{http_code}' \ -X POST "$WP_URL/$SLUG" \ -H "Authorization: Basic $AUTH" \ -H "Content-Type: application/json" \ -d '{"status":"active"}') echo "Activate $PLUGIN: HTTP $STATUS" + if [[ ! "$STATUS" =~ ^2 ]]; then + echo "ERROR: activation failed for $PLUGIN (HTTP $STATUS)" + exit 1 + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/deploy.yml around lines 208 - 214, The current loop only logs the HTTP code stored in STATUS after the curl POST and doesn't fail the job on non-2xx responses; modify the loop around the curl (the block that sets STATUS and echoes "Activate $PLUGIN: HTTP $STATUS") to check STATUS and exit non-zero when it's not a 2xx (e.g. test if STATUS is <200 or >=300 or if it doesn't match ^2), printing a clear error message including $PLUGIN and $STATUS before calling exit 1 so the workflow fails when activation fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/deploy.yml:
- Line 69: Replace the pseudo-ternary environment selection used for
NEXT_PUBLIC_SITE_URL (and the similar occurrence) which relies on "&& ... ||
..." and treats empty strings as falsy; instead use an explicit non-empty check
for the production var (ensure vars.NEXT_PUBLIC_SITE_URL_PROD is tested for
being non-empty before selecting it) so staging is not silently chosen when prod
is unset, and add a short fail-fast guard in relevant run steps (e.g., check
that the resolved NEXT_PUBLIC_SITE_URL/APP_DIR is non-empty and exit with an
error message if it is) to surface misconfiguration immediately.
---
Outside diff comments:
In @.github/workflows/deploy.yml:
- Around line 208-214: The current loop only logs the HTTP code stored in STATUS
after the curl POST and doesn't fail the job on non-2xx responses; modify the
loop around the curl (the block that sets STATUS and echoes "Activate $PLUGIN:
HTTP $STATUS") to check STATUS and exit non-zero when it's not a 2xx (e.g. test
if STATUS is <200 or >=300 or if it doesn't match ^2), printing a clear error
message including $PLUGIN and $STATUS before calling exit 1 so the workflow
fails when activation fails.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3054aead-01c9-495c-9222-d41a0a6e9e0c
📒 Files selected for processing (1)
.github/workflows/deploy.yml
Non-fatal sanity check: after Setup SSH key, compare the SHA-256 hex fingerprints derived from the pinned VPS_HOST_KEY against the SHA-256 SSHFP records published in DNS for catholicdigitalcommons.org. Emits GitHub-Actions warnings on any divergence so we notice if either side gets out of sync: - "Pinned key not in DNS" → pin is stale, or DNS publication lags - "DNS advertises key not in pin" → server rotated, regen the pin Either case is a warning, not a failure: the pin remains the trust anchor and the deploy proceeds. The check is purely informational defense in depth. Skips gracefully if dig is missing, if no SHA-256 SSHFP records are found, or if VPS_HOST_KEY contains no parseable keys — all result in a warning rather than a block. Verified locally: ssh-keyscan 92.222.13.25 → SHA-256 hex fingerprints match the SSHFP type-2 records exactly, both directions empty. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ivation errors Two CodeRabbit findings on PR #50, both real: 1. The \`\${{ X == 'production' && A || B }}\` ternary used for NEXT_PUBLIC_SITE_URL and APP_DIR treats empty strings as falsy. If the production var was unset, prod deploys would silently fall back to the staging value — production bundle baked with staging URL, prod files extracted into the staging app dir. New "Resolve environment-specific values (fail-fast)" step resolves the env-specific values in shell with explicit non-empty checks, then writes them to GITHUB_ENV as RESOLVED_SITE_URL / RESOLVED_APP_DIR. Build and Extract steps read those instead of the ternary. Misconfiguration now exits the workflow with a clear error before anything ships. 2. The Activate plugins loop logged HTTP status but didn't fail the workflow on non-2xx. WP rejecting activation (auth misconfig, plugin missing, server error) would report green. Now collects failures across all plugins (so the log shows the status of every plugin, not just the first failure), then exits 1 at the end if any returned non-2xx. Uses ::error:: annotation so failures show up in the GH UI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/deploy.yml (1)
277-302:⚠️ Potential issue | 🟠 Major | ⚡ Quick winProduction-only variables
WP_THEME_DIRandWP_PLUGINS_DIRare not validated.These variables are used directly without the fail-fast validation applied to
SITE_URLandAPP_DIR. If either is empty,tar -xzf ... -C ""would extract to the current directory instead of failing, potentially corrupting the deployment.Proposed fix: extend the fail-fast step
Add validation for production-only variables in the "Resolve environment-specific values" step:
env: SITE_URL_PROD: ${{ vars.NEXT_PUBLIC_SITE_URL_PROD }} SITE_URL_STAGING: ${{ vars.NEXT_PUBLIC_SITE_URL_STAGING }} APP_DIR_PROD: ${{ vars.VPS_APP_DIR }} APP_DIR_STAGING: ${{ vars.VPS_STAGING_APP_DIR }} + WP_THEME_DIR: ${{ vars.WP_THEME_DIR }} + WP_PLUGINS_DIR: ${{ vars.WP_PLUGINS_DIR }} run: | # ... existing case statement ... + # Production-only variables + if [ "$ENVIRONMENT" = "production" ]; then + if [ -z "$WP_THEME_DIR" ]; then + echo "ERROR: WP_THEME_DIR is empty (expected vars.WP_THEME_DIR)" + exit 1 + fi + if [ -z "$WP_PLUGINS_DIR" ]; then + echo "ERROR: WP_PLUGINS_DIR is empty (expected vars.WP_PLUGINS_DIR)" + exit 1 + fi + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/deploy.yml around lines 277 - 302, Add fail-fast validation for the production-only env vars WP_THEME_DIR and WP_PLUGINS_DIR in the existing "Resolve environment-specific values" step (the same place where SITE_URL and APP_DIR are validated): check that both WP_THEME_DIR and WP_PLUGINS_DIR are non-empty and exit with an error if either is missing so the later SSH/tar command that uses WP_THEME_DIR and WP_PLUGINS_DIR cannot run with empty -C targets; update the workflow step that sets those envs to fail fast on empty values and include clear error messages referencing WP_THEME_DIR and WP_PLUGINS_DIR.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In @.github/workflows/deploy.yml:
- Around line 277-302: Add fail-fast validation for the production-only env vars
WP_THEME_DIR and WP_PLUGINS_DIR in the existing "Resolve environment-specific
values" step (the same place where SITE_URL and APP_DIR are validated): check
that both WP_THEME_DIR and WP_PLUGINS_DIR are non-empty and exit with an error
if either is missing so the later SSH/tar command that uses WP_THEME_DIR and
WP_PLUGINS_DIR cannot run with empty -C targets; update the workflow step that
sets those envs to fail fast on empty values and include clear error messages
referencing WP_THEME_DIR and WP_PLUGINS_DIR.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 01bf5554-c329-40d9-8f4a-77dba97d2ff5
📒 Files selected for processing (1)
.github/workflows/deploy.yml
CodeRabbit caught the same fail-open class on the production-only
vars I missed in the previous fix: the WP-extract steps use
WP_THEME_DIR and WP_PLUGINS_DIR directly via \${{ vars.* }}. If
either were empty, \`tar -xzf ... -C ""\` extracts to the runner's
current directory instead of failing, which on the VPS would
corrupt the deploy by writing files to the wrong place.
Extends the "Resolve environment-specific values" step to also
validate WP_THEME_DIR and WP_PLUGINS_DIR — but only on production,
since staging deploys don't touch the WP theme/plugin path at all.
Both empty → exit 1 with clear error before any deploy work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Closes #46. Replaces the two near-identical deploy workflows with a
single environment-parameterized one.
The shared 80% (Node setup, build, SSH key handling, retry loops,
timeouts) was drifting between the two files — recent examples:
ConnectTimeoutflags landed in staging first; productionwas hardened only after a separate review pass
vars.VPS_APP_DIRmigration required two PRs (feat(seo): noindex non-production hosts; add staging deploy workflow #42 staging, fix(deploy): read VPS_APP_DIR from vars instead of secrets #43 prod)instead of one
Single source of truth eliminates that drift.
How
Production-only steps gated behind
if: env.ENVIRONMENT == 'production':Staging shares the production WordPress backend, so theme/plugin
never get pushed from a staging deploy.
Per-environment values via the
${{ X == 'production' && A || B }}ternary pattern:
NEXT_PUBLIC_SITE_URLvars.NEXT_PUBLIC_SITE_URL_PRODvars.NEXT_PUBLIC_SITE_URL_STAGINGAPP_DIRvars.VPS_APP_DIRvars.VPS_STAGING_APP_DIRDrive-by hardening
ConnectTimeout,ServerAliveInterval,ServerAliveCountMaxflags that only thestaging file had until now (drift fixed by unification itself)
${{ secrets.* }}/${{ vars.* }}references routed throughenv:blocks per workflow-injection hardening guidanceconcurrency.group: deploy-${env}prevents racing deploys to thesame environment
Test plan
environment=staging→ confirm:- Only
nextjs-bundle.tar.gzis uploaded-
vars.VPS_STAGING_APP_DIRis the extraction target- WP theme/plugin steps are skipped (visible in run logs)
- Plugin activation step is skipped
environment=production→ confirm:- All three tarballs uploaded
-
vars.VPS_APP_DIRis the Next.js extraction target- Theme/plugin steps execute
- Plugin activation step executes
-
curl -I https://catholicdigitalcommons.org/noX-Robots-Tag-
curl -I https://staging.catholicdigitalcommons.org/X-Robots-Tag: noindex, nofollowRelated
deploy-staging.ymldeploy.yml(vars instead of secrets)🤖 Generated with Claude Code
Summary by CodeRabbit