🐛 Fixed stored XSS and mangled structured data in JSON-LD output#29013
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (4)
WalkthroughThis change moves JSON-LD escaping from field-level construction in schema.js to a single escaping point in ghost_head.js. A new escapeJsonLd function escapes angle brackets and line/paragraph separators in the serialized schema string before it is embedded in the inline application/ld+json script tag, preventing script/comment breakout. schema.js no longer applies escapeExpression to individual fields (names, descriptions, sameAs URLs, headline), relying on raw values that are HTML-escaped once at render time. Corresponding unit tests in schema.test.js are updated to expect raw unescaped values, and a new test file validates escapeJsonLd's breakout-prevention and round-trip JSON parsing behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant SchemaJS as schema.js getSchema
participant GhostHead as ghost_head helper
participant EscapeJsonLd as escapeJsonLd
participant Script as inline script tag
SchemaJS->>GhostHead: raw schema object (unescaped fields)
GhostHead->>GhostHead: JSON.stringify(schema, null, 4 spaces)
GhostHead->>EscapeJsonLd: serialized JSON string
EscapeJsonLd->>EscapeJsonLd: replace <, >, \u2028, \u2029
EscapeJsonLd-->>GhostHead: escaped JSON string
GhostHead->>Script: inject escaped JSON string
Related issues: None specified. Related PRs: None specified. Suggested labels: security, frontend, needs review Suggested reviewers: None specified. Poem: 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| Command | Status | Duration | Result |
|---|---|---|---|
nx run ghost:test:ci:integration |
✅ Succeeded | 2m 34s | View ↗ |
nx run ghost:test:integration |
✅ Succeeded | 2m 49s | View ↗ |
nx run ghost:test:legacy |
✅ Succeeded | 2m 58s | View ↗ |
nx run ghost:test:e2e |
✅ Succeeded | 2m 29s | View ↗ |
nx run-many --target=build --projects=tag:publi... |
✅ Succeeded | <1s | View ↗ |
nx run-many -t lint -p ghost |
✅ Succeeded | 35s | View ↗ |
nx run-many -t test:unit -p ghost |
✅ Succeeded | 29s | View ↗ |
nx run @tryghost/admin:build |
✅ Succeeded | 9s | View ↗ |
Additional runs (2) |
✅ Succeeded | ... | View ↗ |
💡 Verify your cache is correct by running tasks in a sandbox. Read docs ↗
☁️ Nx Cloud last updated this comment at 2026-07-01 17:50:34 UTC
|
This is a regression - see snapshots. Needs reworking. |
ref #28957 - tag names, post keywords, and the site title were serialized raw into the inline <script type="application/ld+json"> block, letting an Editor-controlled value like `foo</script><script>...` break out of the script element and run arbitrary JS for anonymous visitors on tag and post pages - escapes the breakout-relevant characters (< > U+2028 U+2029) as JSON \u escapes at the single serialization boundary in ghost_head, so every field is covered at once instead of relying on per-field escaping that is easy to forget - removed the per-field escapeExpression calls from schema.js: HTML-entity escaping is the wrong layer here — JSON-LD consumers (Google et al.) parse the block as JSON and never HTML-decode, so it silently corrupted structured data (e.g. `Tom & Jerry` was indexed as `Tom & Jerry`). JSON \u escapes are both safe and lossless, so legitimate `& ' "` now round-trip correctly - added a regression test proving breakout is neutralised while data round-trips, and updated the snapshot/assertions that were capturing the old corruption
de1fcb7 to
c1d2b62
Compare

Summary
Ghost builds a schema.org object in
core/frontend/meta/schema.jsand injects it, JSON-serialised, into an inline<script type="application/ld+json">block viaghost_head. Escaping was done per-field withescapeExpression(HTML-entity encoding), which was wrong for this context in two ways:foo</script><script>alert(document.domain)</script>broke out of the inline script and executed for any anonymous visitor on tag/post/home pages.escapeExpressionHTML-entity-encodes the JSON value, so legitimate content was mangled for the search engines that consume JSON-LD: a tagTom & Jerrywas serialised asTom & Jerry, and"quoted"titles/URLs/sameAslinks came out as"/=etc. This has been the behaviour onmainfor every already-"escaped" field (headline, description, author name, social URLs), not just the three missed ones.Fix
Escape once, at the render sink, on the serialised JSON string — mirroring Ghost's existing
jsonhelper:</script>breakout for every field, current and future — no more field-by-field allow-listing that the next new field can silently miss.Tom & Jerryround-trips toTom & Jerry).escapeExpressioncalls are removed fromschema.js, so names, URLs and keywords are no longer double/entity-encoded.This supersedes #28957, which fixed the XSS at the field level but left (and slightly extended) the data-corruption problem — see the snapshot change there flipping a benign
Tom & JerrytoTom & Jerry.Testing
ghost_headregression test proving a</script>payload in a tag name produces no literal</script>in the JSON-LD block, yet still parses back to the original tag name.Tom & Jerrykeywords value round-trips through the rendered JSON-LD, guarding against re-introducing entity encoding.schema.test.js(now asserts raw values — escaping is the sink's job) and theghost-headsnapshot ("→\",Tom & Jerry→Tom & Jerry).pnpm test:singlefor both files: 117 passing; lint clean.