From 40e85ac77acda3d96722c8ed96abe3ee602ec9a3 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Wed, 20 May 2026 11:39:00 +0000 Subject: [PATCH] script: also prune stale `data.json` language entries when removing orphans The orphan-translation cleanup added in 916beb32 (docs: remove orphaned translated manual pages (#2165)) deletes output HTML files whose upstream source has gone away, but leaves the matching entries in `external/docs/data/docs.json` untouched. That data file is consulted by `layouts/partials/ref/languages.html`, which iterates `data.pages..languages` on every English manual page to build the per-page language picker. A stale entry there therefore generates a dead `/docs//` link from every variant of every English page, including each versioned slice such as `/docs//.html`. That is exactly how the recent batch of orphaned translated pages turned into 2,611 broken-link reports from `lychee`: the manual file deletion in the previous commit removed the HTML, but data.json still listed the languages so the picker kept emitting links into thin air. A symmetric prune from `data.json` was applied by hand in that PR; this commit teaches `script/update-docs.rb` to do the same automatically going forward, so future upstream removals do not re-introduce the same class of dead links. The new loop reuses the existing `seen_translations` set populated by the L10N indexing pass and the same `unless seen_translations.empty?` safety gate, so it is skipped on "everything is up to date" short- circuit runs that never touched any source. Within `data.pages.`, only the `languages.` sub-entries are removed; the page entry itself and any empty `languages: {}` hash are left in place because the same docname may also be populated by the English-only indexing path in `index_doc`. Verified locally by seeding both a fake orphan output file under `external/docs/content/docs/git-thisdocdoesnotexist/de.html` and two stale `data.json` entries (one on a previously non-existent docname, one as an extra `xx_FAKE` language on the real `git-clone` page), then running `RERUN=true bundle exec ruby script/update-docs.rb l10n`. The run logged Removing orphan translation .../git-thisdocdoesnotexist/de.html Removing orphan language entry data.pages.git-clone.languages.xx_FAKE Removing orphan language entry data.pages.git-thisdocdoesnotexist.languages.de and the resulting `data.json` had the fake `xx_FAKE` and `de` keys gone, while all 768 pre-existing redirect-stub files and the real translations of `git-clone` (es, fr, ru, sv, uk, zh_HANS-CN) were left untouched. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- script/update-docs.rb | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/script/update-docs.rb b/script/update-docs.rb index 8d401b7703..5fa1b0f9b9 100644 --- a/script/update-docs.rb +++ b/script/update-docs.rb @@ -385,10 +385,11 @@ def index_l10n_doc(filter_tags, doc_list, get_content) end end - # Clean up orphaned translated outputs whose upstream source has gone away. - # Skipped entirely when no source was iterated (e.g. when the early `next if - # !rerun && l10n["committed"] >= ts` short-circuits the whole tag loop) so - # that an "everything is up to date" run never deletes anything. + # Clean up orphan translated outputs and the matching `data.json` entries + # whose upstream source has gone away. Skipped entirely when no source was + # iterated (e.g. when the early `next if !rerun && l10n["committed"] >= ts` + # short-circuits the whole tag loop) so that an "everything is up to date" + # run never deletes anything. unless seen_translations.empty? Dir.glob("#{SITE_ROOT}external/docs/content/docs/*/*.html").each do |output_path| m = output_path.match(%r{/external/docs/content/docs/([^/]+)/([^/]+)\.html\z}) @@ -407,6 +408,22 @@ def index_l10n_doc(filter_tags, doc_list, get_content) puts "Removing orphan translation #{output_path}" File.delete(output_path) end + + # `layouts/partials/ref/languages.html` iterates + # `data.pages..languages` on every English manual page to build + # the per-page language picker, so a stale entry here generates a dead + # link from every variant of every English page (including each + # versioned `/docs//.html` slice). Prune symmetrically + # with the file-deletion loop above. + data["pages"].each do |docname, page_data| + next unless page_data.is_a?(Hash) + langs = page_data["languages"] + next unless langs.is_a?(Hash) + langs.keys.reject { |lang| seen_translations.include?([docname, lang]) }.each do |lang| + puts "Removing orphan language entry data.pages.#{docname}.languages.#{lang}" + langs.delete(lang) + end + end end File.open(DATA_FILE, "w") do |out|