Skip to content

🐛 Fixed stored XSS and mangled structured data in JSON-LD output#29013

Merged
9larsons merged 1 commit into
mainfrom
slars/jsonld-sink-escaping
Jul 1, 2026
Merged

🐛 Fixed stored XSS and mangled structured data in JSON-LD output#29013
9larsons merged 1 commit into
mainfrom
slars/jsonld-sink-escaping

Conversation

@9larsons

@9larsons 9larsons commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Ghost builds a schema.org object in core/frontend/meta/schema.js and injects it, JSON-serialised, into an inline <script type="application/ld+json"> block via ghost_head. Escaping was done per-field with escapeExpression (HTML-entity encoding), which was wrong for this context in two ways:

  1. Stored XSS (the bug 🐛 Fixed stored XSS in structured data via unescaped tag names #28957 targets). Three Editor-controlled fields were missed entirely — tag name, post keywords, and site title. A tag named foo</script><script>alert(document.domain)</script> broke out of the inline script and executed for any anonymous visitor on tag/post/home pages.
  2. Corrupted structured data. escapeExpression HTML-entity-encodes the JSON value, so legitimate content was mangled for the search engines that consume JSON-LD: a tag Tom & Jerry was serialised as Tom &amp; Jerry, and "quoted" titles/URLs/sameAs links came out as &quot; / &#x3D; etc. This has been the behaviour on main for every already-"escaped" field (headline, description, author name, social URLs), not just the three missed ones.

Fix

Escape once, at the render sink, on the serialised JSON string — mirroring Ghost's existing json helper:

JSON.stringify(meta.schema, null, '    ')
    .replace(/</g, '\\u003C')
    .replace(/>/g, '\\u003E')
    .replace(/&/g, '\\u0026')
    .replace(/\u2028/g, '\\u2028')
    .replace(/\u2029/g, '\\u2029');
  • Neutralises the </script> breakout for every field, current and future — no more field-by-field allow-listing that the next new field can silently miss.
  • Uses JS-string unicode escapes, which JSON parsers decode back to the original characters, so structured data is correct again (Tom & Jerry round-trips to Tom & Jerry).
  • The now-redundant per-field escapeExpression calls are removed from schema.js, so names, URLs and keywords are no longer double/entity-encoded.

This supersedes #28957, which fixed the XSS at the field level but left (and slightly extended) the data-corruption problem — see the snapshot change there flipping a benign Tom & Jerry to Tom &amp; Jerry.

Testing

  • Added a ghost_head regression test proving a </script> payload in a tag name produces no literal </script> in the JSON-LD block, yet still parses back to the original tag name.
  • Added an assertion that the benign Tom & Jerry keywords value round-trips through the rendered JSON-LD, guarding against re-introducing entity encoding.
  • Updated schema.test.js (now asserts raw values — escaping is the sink's job) and the ghost-head snapshot (&quot;\", Tom & JerryTom & Jerry).
  • pnpm test:single for both files: 117 passing; lint clean.

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 371a2d00-930a-42e3-9416-8eb92fed44e3

📥 Commits

Reviewing files that changed from the base of the PR and between de1fcb7 and c1d2b62.

⛔ Files ignored due to path filters (1)
  • ghost/core/test/unit/frontend/helpers/__snapshots__/ghost-head.test.js.snap is excluded by !**/*.snap
📒 Files selected for processing (4)
  • ghost/core/core/frontend/helpers/ghost_head.js
  • ghost/core/core/frontend/meta/schema.js
  • ghost/core/test/unit/frontend/helpers/ghost-head-jsonld-escaping.test.js
  • ghost/core/test/unit/frontend/meta/schema.test.js

Walkthrough

This change moves JSON-LD escaping from field-level construction in schema.js to a single escaping point in ghost_head.js. A new escapeJsonLd function escapes angle brackets and line/paragraph separators in the serialized schema string before it is embedded in the inline application/ld+json script tag, preventing script/comment breakout. schema.js no longer applies escapeExpression to individual fields (names, descriptions, sameAs URLs, headline), relying on raw values that are HTML-escaped once at render time. Corresponding unit tests in schema.test.js are updated to expect raw unescaped values, and a new test file validates escapeJsonLd's breakout-prevention and round-trip JSON parsing behavior.

Changes

Cohort / File(s) Summary
ghost_head.js Added escapeJsonLd helper, applied to serialized schema before script injection, exported for testing
schema.js Removed escapeExpression usage; fields now pass through raw values with explanatory comment
schema.test.js Updated sameAs and related test expectations to raw unescaped values; renamed tests
ghost-head-jsonld-escaping.test.js (new) Added unit tests for escapeJsonLd covering breakout prevention and round-trip parsing

Sequence Diagram(s)

sequenceDiagram
  participant SchemaJS as schema.js getSchema
  participant GhostHead as ghost_head helper
  participant EscapeJsonLd as escapeJsonLd
  participant Script as inline script tag

  SchemaJS->>GhostHead: raw schema object (unescaped fields)
  GhostHead->>GhostHead: JSON.stringify(schema, null, 4 spaces)
  GhostHead->>EscapeJsonLd: serialized JSON string
  EscapeJsonLd->>EscapeJsonLd: replace <, >, \u2028, \u2029
  EscapeJsonLd-->>GhostHead: escaped JSON string
  GhostHead->>Script: inject escaped JSON string
Loading

Related issues: None specified.

Related PRs: None specified.

Suggested labels: security, frontend, needs review

Suggested reviewers: None specified.

Poem:
A rabbit hopped through JSON's maze,
Found angle brackets in a script-tag haze,
"Escape me once, at the very end,"
It said, "not field by field, my friend!"
Now schema.js runs raw and free,
While ghost_head guards the boundary. 🐰✨

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: fixing stored XSS and JSON-LD data corruption.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description accurately matches the code and test changes around sink-level JSON-LD escaping and removal of per-field escaping.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch slars/jsonld-sink-escaping

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@nx-cloud

nx-cloud Bot commented Jul 1, 2026

Copy link
Copy Markdown

🤖 Nx Cloud AI Fix

Ensure the fix-ci command is configured to always run in your CI pipeline to get automatic fixes in future runs. For more information, please see https://nx.dev/ci/features/self-healing-ci


View your CI Pipeline Execution ↗ for commit c1d2b62

Command Status Duration Result
nx run ghost:test:ci:integration ✅ Succeeded 2m 34s View ↗
nx run ghost:test:integration ✅ Succeeded 2m 49s View ↗
nx run ghost:test:legacy ✅ Succeeded 2m 58s View ↗
nx run ghost:test:e2e ✅ Succeeded 2m 29s View ↗
nx run-many --target=build --projects=tag:publi... ✅ Succeeded <1s View ↗
nx run-many -t lint -p ghost ✅ Succeeded 35s View ↗
nx run-many -t test:unit -p ghost ✅ Succeeded 29s View ↗
nx run @tryghost/admin:build ✅ Succeeded 9s View ↗
Additional runs (2) ✅ Succeeded ... View ↗

💡 Verify your cache is correct by running tasks in a sandbox. Read docs ↗


☁️ Nx Cloud last updated this comment at 2026-07-01 17:50:34 UTC

@9larsons

9larsons commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

This is a regression - see snapshots. Needs reworking.

ref #28957

- tag names, post keywords, and the site title were serialized raw into the
  inline <script type="application/ld+json"> block, letting an Editor-controlled
  value like `foo</script><script>...` break out of the script element and run
  arbitrary JS for anonymous visitors on tag and post pages
- escapes the breakout-relevant characters (< > U+2028 U+2029) as JSON \u escapes
  at the single serialization boundary in ghost_head, so every field is covered
  at once instead of relying on per-field escaping that is easy to forget
- removed the per-field escapeExpression calls from schema.js: HTML-entity
  escaping is the wrong layer here — JSON-LD consumers (Google et al.) parse the
  block as JSON and never HTML-decode, so it silently corrupted structured data
  (e.g. `Tom & Jerry` was indexed as `Tom &amp; Jerry`). JSON \u escapes are both
  safe and lossless, so legitimate `& ' "` now round-trip correctly
- added a regression test proving breakout is neutralised while data round-trips,
  and updated the snapshot/assertions that were capturing the old corruption
@9larsons 9larsons force-pushed the slars/jsonld-sink-escaping branch from de1fcb7 to c1d2b62 Compare July 1, 2026 17:39
@9larsons 9larsons marked this pull request as ready for review July 1, 2026 18:09
@9larsons 9larsons merged commit 895d95c into main Jul 1, 2026
42 of 43 checks passed
@9larsons 9larsons deleted the slars/jsonld-sink-escaping branch July 1, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant