Skip to content

nginx intermittently 404s wp-json/* routes (tries to serve as static files) #41

@JohnRDOrazio

Description

@JohnRDOrazio

Summary

In production, nginx intermittently fails to route /wp-json/*
requests to PHP-FPM and instead tries to serve them as static files
from disk, returning 404. Same endpoint, same client, same minute —
some requests reach PHP and succeed, others 404 at the filesystem
layer.

Evidence

From logs/cms.catholicdigitalcommons.org/proxy_error_log on 2026-04-29
during a workflow_dispatch deploy:

Successful PHP-routed request (16:25:23):

[error] FastCGI sent in stderr: "PHP message: cdcf_process_translation:
  Translation complete for post 858 (es)" while reading response header
  from upstream, ... request: "POST /wp-json/cdcf/v1/process-queue
  HTTP/2.0", upstream:
  "fastcgi://unix:/var/www/vhosts/system/cms.catholicdigitalcommons.org/php-fpm.sock"

404 from same client minutes later (16:28:32):

[error] openat() "/var/www/vhosts/catholicdigitalcommons.org/cms.catholicdigitalcommons.org/wp-json/cdcf/v1/process-queue"
  failed (2: No such file or directory),
  client: 2001:41d0:52:bff::14c, server: cms.catholicdigitalcommons.org,
  request: "POST /wp-json/cdcf/v1/process-queue HTTP/2.0"

The 404 case shows nginx never reached the FastCGI upstream — it tried
openat() on the URL path as a literal file. That only happens when
the WordPress URL-rewrite rules don't match the request, which on a
correctly-configured Plesk + nginx + WP setup should never happen for
/wp-json/*.

Affected endpoints I've seen 404 in the logs include:

  • POST /wp-json/cdcf/v1/process-queue (the redis-queue cron-like
    worker trigger)
  • GET /wp-json/wp/v2/pages/<id> (verify step's polling reads)

PHP-routed (successful) and filesystem-404 responses are interleaved
seconds apart. The 404 case appears to be ~30-50% of requests during
heavy traffic windows.

Hypotheses

  1. Plesk's nginx config has a try_files order that races. A
    common Plesk pattern is something like
    try_files $uri $uri/ /index.php?$args, which checks for a real
    file first. If the request is somehow routed to a stale or
    uninitialized worker, try_files may fail without falling through
    to PHP.
  2. Caching or worker-lifecycle issue. During the burst of cron
    POSTs from the GitHub Actions runner, nginx may be hitting a
    per-worker code path that doesn't have the WP rewrite rules loaded.
  3. Plesk's "Apache + nginx" hybrid mode interaction. The site uses
    Plesk's nginx-in-front-of-Apache setup; each layer has its own URL
    rewrite handling and they can fall out of sync after a config
    reload.

Workarounds in place

The clients we control already retry — the GitHub Actions verify step
(foundation-docs/.github/workflows/deploy-docs.yml) now uses
curl --retry 3 --retry-all-errors --retry-delay 5 plus an outer
30-minute polling window, which is enough to ride out the 404 wave.
But this masks the underlying issue.

What we don't know

  • How exactly the cron driver is set up (it hits
    /cdcf/v1/process-queue from 2001:41d0:52:bff::14c, which is a
    GitHub Actions IPv6 — but no scheduled workflow in our repos calls
    this endpoint, so it must be configured outside the repo, possibly
    via the Plesk panel or a third party).
  • Whether the 404 rate correlates with any Plesk maintenance event
    or nginx config reload.
  • Plesk's actual nginx + Apache vhost config for cms.catholicdigitalcommons.org.

Suggested next steps

  1. Pull the actual nginx config off the production VPS:
    /etc/nginx/plesk.conf.d/vhosts/cms.catholicdigitalcommons.org*.conf
    and /var/www/vhosts/system/cms.catholicdigitalcommons.org/conf/
  2. Check the WordPress permalink rewrite rules are present in the
    additional nginx directives slot in the Plesk panel
  3. If using Plesk's "static files served by nginx" optimization,
    verify it doesn't intercept /wp-json/*
  4. Identify the cron driver hitting /process-queue (Plesk
    scheduler? Cloudflare Workers? UptimeRobot?) and confirm where it's
    configured

Severity

Low for now — clients retry, recoveries complete, no end-user impact.
But it's noise in the logs and an availability landmine if/when the
404 rate spikes further.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions