Summary
In production, nginx intermittently fails to route /wp-json/*
requests to PHP-FPM and instead tries to serve them as static files
from disk, returning 404. Same endpoint, same client, same minute —
some requests reach PHP and succeed, others 404 at the filesystem
layer.
Evidence
From logs/cms.catholicdigitalcommons.org/proxy_error_log on 2026-04-29
during a workflow_dispatch deploy:
Successful PHP-routed request (16:25:23):
[error] FastCGI sent in stderr: "PHP message: cdcf_process_translation:
Translation complete for post 858 (es)" while reading response header
from upstream, ... request: "POST /wp-json/cdcf/v1/process-queue
HTTP/2.0", upstream:
"fastcgi://unix:/var/www/vhosts/system/cms.catholicdigitalcommons.org/php-fpm.sock"
404 from same client minutes later (16:28:32):
[error] openat() "/var/www/vhosts/catholicdigitalcommons.org/cms.catholicdigitalcommons.org/wp-json/cdcf/v1/process-queue"
failed (2: No such file or directory),
client: 2001:41d0:52:bff::14c, server: cms.catholicdigitalcommons.org,
request: "POST /wp-json/cdcf/v1/process-queue HTTP/2.0"
The 404 case shows nginx never reached the FastCGI upstream — it tried
openat() on the URL path as a literal file. That only happens when
the WordPress URL-rewrite rules don't match the request, which on a
correctly-configured Plesk + nginx + WP setup should never happen for
/wp-json/*.
Affected endpoints I've seen 404 in the logs include:
POST /wp-json/cdcf/v1/process-queue (the redis-queue cron-like
worker trigger)
GET /wp-json/wp/v2/pages/<id> (verify step's polling reads)
PHP-routed (successful) and filesystem-404 responses are interleaved
seconds apart. The 404 case appears to be ~30-50% of requests during
heavy traffic windows.
Hypotheses
- Plesk's nginx config has a
try_files order that races. A
common Plesk pattern is something like
try_files $uri $uri/ /index.php?$args, which checks for a real
file first. If the request is somehow routed to a stale or
uninitialized worker, try_files may fail without falling through
to PHP.
- Caching or worker-lifecycle issue. During the burst of cron
POSTs from the GitHub Actions runner, nginx may be hitting a
per-worker code path that doesn't have the WP rewrite rules loaded.
- Plesk's "Apache + nginx" hybrid mode interaction. The site uses
Plesk's nginx-in-front-of-Apache setup; each layer has its own URL
rewrite handling and they can fall out of sync after a config
reload.
Workarounds in place
The clients we control already retry — the GitHub Actions verify step
(foundation-docs/.github/workflows/deploy-docs.yml) now uses
curl --retry 3 --retry-all-errors --retry-delay 5 plus an outer
30-minute polling window, which is enough to ride out the 404 wave.
But this masks the underlying issue.
What we don't know
- How exactly the cron driver is set up (it hits
/cdcf/v1/process-queue from 2001:41d0:52:bff::14c, which is a
GitHub Actions IPv6 — but no scheduled workflow in our repos calls
this endpoint, so it must be configured outside the repo, possibly
via the Plesk panel or a third party).
- Whether the 404 rate correlates with any Plesk maintenance event
or nginx config reload.
- Plesk's actual nginx + Apache vhost config for
cms.catholicdigitalcommons.org.
Suggested next steps
- Pull the actual nginx config off the production VPS:
/etc/nginx/plesk.conf.d/vhosts/cms.catholicdigitalcommons.org*.conf
and /var/www/vhosts/system/cms.catholicdigitalcommons.org/conf/
- Check the WordPress permalink rewrite rules are present in the
additional nginx directives slot in the Plesk panel
- If using Plesk's "static files served by nginx" optimization,
verify it doesn't intercept /wp-json/*
- Identify the cron driver hitting
/process-queue (Plesk
scheduler? Cloudflare Workers? UptimeRobot?) and confirm where it's
configured
Severity
Low for now — clients retry, recoveries complete, no end-user impact.
But it's noise in the logs and an availability landmine if/when the
404 rate spikes further.
Related
Summary
In production, nginx intermittently fails to route
/wp-json/*requests to PHP-FPM and instead tries to serve them as static files
from disk, returning 404. Same endpoint, same client, same minute —
some requests reach PHP and succeed, others 404 at the filesystem
layer.
Evidence
From
logs/cms.catholicdigitalcommons.org/proxy_error_logon 2026-04-29during a
workflow_dispatchdeploy:Successful PHP-routed request (16:25:23):
404 from same client minutes later (16:28:32):
The 404 case shows nginx never reached the FastCGI upstream — it tried
openat()on the URL path as a literal file. That only happens whenthe WordPress URL-rewrite rules don't match the request, which on a
correctly-configured Plesk + nginx + WP setup should never happen for
/wp-json/*.Affected endpoints I've seen 404 in the logs include:
POST /wp-json/cdcf/v1/process-queue(the redis-queue cron-likeworker trigger)
GET /wp-json/wp/v2/pages/<id>(verify step's polling reads)PHP-routed (successful) and filesystem-404 responses are interleaved
seconds apart. The 404 case appears to be ~30-50% of requests during
heavy traffic windows.
Hypotheses
try_filesorder that races. Acommon Plesk pattern is something like
try_files $uri $uri/ /index.php?$args, which checks for a realfile first. If the request is somehow routed to a stale or
uninitialized worker,
try_filesmay fail without falling throughto PHP.
POSTs from the GitHub Actions runner, nginx may be hitting a
per-worker code path that doesn't have the WP rewrite rules loaded.
Plesk's nginx-in-front-of-Apache setup; each layer has its own URL
rewrite handling and they can fall out of sync after a config
reload.
Workarounds in place
The clients we control already retry — the GitHub Actions verify step
(
foundation-docs/.github/workflows/deploy-docs.yml) now usescurl --retry 3 --retry-all-errors --retry-delay 5plus an outer30-minute polling window, which is enough to ride out the 404 wave.
But this masks the underlying issue.
What we don't know
/cdcf/v1/process-queuefrom2001:41d0:52:bff::14c, which is aGitHub Actions IPv6 — but no scheduled workflow in our repos calls
this endpoint, so it must be configured outside the repo, possibly
via the Plesk panel or a third party).
or nginx config reload.
cms.catholicdigitalcommons.org.Suggested next steps
/etc/nginx/plesk.conf.d/vhosts/cms.catholicdigitalcommons.org*.confand
/var/www/vhosts/system/cms.catholicdigitalcommons.org/conf/additional nginx directivesslot in the Plesk panelverify it doesn't intercept
/wp-json/*/process-queue(Pleskscheduler? Cloudflare Workers? UptimeRobot?) and confirm where it's
configured
Severity
Low for now — clients retry, recoveries complete, no end-user impact.
But it's noise in the logs and an availability landmine if/when the
404 rate spikes further.
Related
pipeline (see feat(deploy): verify translations completed before declaring success foundation-docs#22, chore(deps): Bump react from 19.2.4 to 19.2.5 #23).