Skip to content

Dom\HTMLDocument misparses </noscript> in <head> and nests following nodes into <noscript> #21379

@NoNoNo

Description

@NoNoNo

Description

Summary

When parsing HTML with Dom\HTMLDocument::createFromString(), a </noscript> end tag inside <head> is not handled correctly in the HTML5 parser path.

As a result, subsequent head elements (for example <link>) are incorrectly inserted as children of <noscript>.

This is a behavior bug in the Lexbor HTML tree-construction path used by Dom\HTMLDocument, not in legacy DOMDocument::loadHTML().

Affected component

  • PHP ext/dom modern HTML5 parser (Dom\HTMLDocument)
  • Vendored Lexbor tree insertion mode implementation:
    • ext/lexbor/lexbor/html/tree/insertion_mode/in_head_noscript.c

Environment

  • PHP: 8.5.1 (also reproduced while inspecting 8.5.3 source tree)
  • libxml runtime: 2.9.13
  • API used: Dom\HTMLDocument::createFromString()

Reproducer

<?php

$html = '<!DOCTYPE html><html><head>
<noscript>
    <style>body { margin: 0; }</style>
</noscript>
<link href="/style.css" rel="stylesheet">
</head><body></body></html>';

$doc = Dom\HTMLDocument::createFromString($html, LIBXML_NOERROR);
echo $doc->saveHTML(), PHP_EOL;

$link = $doc->getElementsByTagName('link')->item(0);
echo "Link parent: ", $link->parentNode->nodeName, PHP_EOL;

https://3v4l.org/TmBjH#v8.5.3

Actual result

  • Serialized tree effectively moves </noscript> to after <link>.
  • link->parentNode->nodeName is NOSCRIPT.

Example output:

<!DOCTYPE html><html><head>
<noscript>
    <style>body { margin: 0; }</style>

<link href="/style.css" rel="stylesheet">
</noscript></head><body></body></html>
Link parent: NOSCRIPT

Expected result

  • </noscript> should close the <noscript> element.
  • <link> should be a direct child of <head>.
  • link->parentNode->nodeName should be HEAD.

Control comparison

Using legacy parser path:

$d = new DOMDocument();
@$d->loadHTML($html, LIBXML_NOERROR);
echo $d->getElementsByTagName('link')->item(0)->parentNode->nodeName;

Result is head (as expected), confirming issue is specific to modern HTML5 parser path.

Root cause analysis

The closing-tag handler for in-head-noscript insertion mode does not implement handling for </noscript>.

Current code:

  • ext/lexbor/lexbor/html/tree/insertion_mode/in_head_noscript.c:95
  • lxb_html_tree_insertion_mode_in_head_noscript_closed(...)

Behavior:

  1. If closing tag is </br>, it routes to anything_else.
  2. Otherwise it emits parse error (LXB_HTML_RULES_ERROR_UNTO) and returns true.
  3. It never handles LXB_TAG_NOSCRIPT, never pops <noscript>, and never restores tree->mode = in_head.

Because the open-elements stack still has <noscript> as current node, the next <link> token (delegated to in_head) is inserted under <noscript>.

Suggested fix direction

In lxb_html_tree_insertion_mode_in_head_noscript_closed(...), add explicit handling for LXB_TAG_NOSCRIPT:

  1. Verify current node is noscript (or report parse error if not in expected state).
  2. Pop current node from open-elements stack.
  3. Set tree->mode = lxb_html_tree_insertion_mode_in_head.
  4. Return true.

This should match intended HTML5 tree-construction behavior for closing noscript in this insertion mode.

Suggested regression test

Add a DOM test that parses:

<!doctype html><html><head><noscript></noscript><link rel="stylesheet" href="/x.css"></head><body></body></html>

And asserts:

  • getElementsByTagName("link")[0]->parentNode->nodeName === "HEAD"
  • serialization does not place <link> inside <noscript>.

Notes

  • This issue is independent from libxml2 legacy HTML parser behavior.
  • It appears in the Lexbor-based parser path used by Dom\HTMLDocument.

PHP Version

PHP 8.5.1 (cli) (built: Dec 16 2025 15:59:07) (NTS)
Copyright (c) The PHP Group
Built by Homebrew
Zend Engine v4.5.1, Copyright (c) Zend Technologies
    with Zend OPcache v8.5.1, Copyright (c), by Zend Technologies

Also in 8.5.3, compare https://3v4l.org/TmBjH#v8.5.3

Operating System

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions