Optimization of REXML::XPathParser#step by naitoh · Pull Request #310 · ruby/rexml

naitoh · 2026-05-05T06:39:53Z

Benchmark

$ benchmark-driver benchmark/xpath.yaml
                                                           before       after  before(YJIT)  after(YJIT)
  REXML::XPath.match(REXML::Document.new(xml), 'a//a')     7.182k      7.026k        4.507k       4.436k i/s -     100.000 times in 0.013924s 0.014233s 0.022190s 0.022544s
REXML::XPath.match(REXML::Document.new(xml), '//a//a')    567.537      2.663k       947.724       2.884k i/s -     100.000 times in 0.176200s 0.037556s 0.105516s 0.034670s

Comparison:
               REXML::XPath.match(REXML::Document.new(xml), 'a//a')
                                                before:      7181.8 i/s
                                                 after:      7025.9 i/s - 1.02x  slower
                                          before(YJIT):      4506.5 i/s - 1.59x  slower
                                           after(YJIT):      4435.8 i/s - 1.62x  slower

             REXML::XPath.match(REXML::Document.new(xml), '//a//a')
                                           after(YJIT):      2884.3 i/s
                                                 after:      2662.7 i/s - 1.08x  slower
                                          before(YJIT):       947.7 i/s - 3.04x  slower
                                                before:       567.5 i/s - 5.08x  slower

YJIT=ON : 0.98x - 3.04x faster
YJIT=OFF : 0.97x - 4.69x faster

tompng · 2026-05-05T12:14:35Z

+      new_nodes = {}
      nodeset.each do |node|
-        new_nodeset = []
-        new_nodes = {}


With this change, the result of this xpath has changed.

REXML::XPath.match(REXML::Document.new('<a id="a1"><a id="a2"></a></a>'), '//a/descendant::b[1]') # => [ ... </>, ... </>] Expected (in master branch) # => [ ... </>] Actual (this pull request)

Reference:

document.body.innerHTML='<a id="a1"><a id="a2"></a></a>' snapshot = document.evaluate('//a/descendant::b[1]', document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null); [...new Array(snapshot.snapshotLength)].map((_, i) => snapshot.snapshotItem(i)) // => [b#b1, b#b2]

I'm sorry.
Since this change has unintended consequences, I will cancel the change to #descendant.
Thanks!

tompng · 2026-05-05T12:31:59Z

+              next if seen.key?(raw_node.object_id)
+              seen[raw_node.object_id] = true


object_id will lazily assign a new id when it is called. We can avoid this by using seen = {}.compare_by_identity instead.

Thanks for pointing that out.
I’ll fix it.

Using compare_by_identity, Hash will compare by its identity, (same as comparing by object_id, but without assigning actual object_id), so seen.key?(raw_node) and seen[raw_node] = true is enough.

These two code gives the same result

h = {} h[key.object_id] = value h[key.object_id] #=> value h[key.dup.object_id] #=> nil

h = {}.compare_by_identity h[key] = value h[key] #=> value h[key.dup] #=> nil

## Benchmark ``` $ benchmark-driver benchmark/xpath.yaml before after before(YJIT) after(YJIT) REXML::XPath.match(REXML::Document.new(xml), 'a//a') 7.182k 7.026k 4.507k 4.436k i/s - 100.000 times in 0.013924s 0.014233s 0.022190s 0.022544s REXML::XPath.match(REXML::Document.new(xml), '//a//a') 567.537 2.663k 947.724 2.884k i/s - 100.000 times in 0.176200s 0.037556s 0.105516s 0.034670s Comparison: REXML::XPath.match(REXML::Document.new(xml), 'a//a') before: 7181.8 i/s after: 7025.9 i/s - 1.02x slower before(YJIT): 4506.5 i/s - 1.59x slower after(YJIT): 4435.8 i/s - 1.62x slower REXML::XPath.match(REXML::Document.new(xml), '//a//a') after(YJIT): 2884.3 i/s after: 2662.7 i/s - 1.08x slower before(YJIT): 947.7 i/s - 3.04x slower before: 567.5 i/s - 5.08x slower ``` - YJIT=ON : 0.98x - 3.04x faster - YJIT=OFF : 0.97x - 4.69x faster

naitoh requested a review from kou May 5, 2026 06:47

tompng reviewed May 5, 2026

View reviewed changes

naitoh added 2 commits May 6, 2026 01:19

Add test

2d3346f

naitoh force-pushed the fix_XPathParser_step_descendant_performance_xpath branch from 5b3e01a to 2d3346f Compare May 5, 2026 16:20

naitoh changed the title ~~Optimization of REXML::XPathParser#step,#descendant~~ Optimization of REXML::XPathParser#step May 5, 2026

naitoh requested a review from tompng May 5, 2026 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization of REXML::XPathParser#step#310

Optimization of REXML::XPathParser#step#310
naitoh wants to merge 2 commits intoruby:masterfrom
naitoh:fix_XPathParser_step_descendant_performance_xpath

naitoh commented May 5, 2026 •

edited

Loading

Uh oh!

tompng May 5, 2026

Uh oh!

naitoh May 5, 2026

Uh oh!

tompng May 5, 2026

Uh oh!

naitoh May 5, 2026

Uh oh!

tompng May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		next if seen.key?(raw_node.object_id)
		seen[raw_node.object_id] = true

Conversation

naitoh commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Uh oh!

tompng May 5, 2026

Choose a reason for hiding this comment

Uh oh!

naitoh May 5, 2026

Choose a reason for hiding this comment

Uh oh!

tompng May 5, 2026

Choose a reason for hiding this comment

Uh oh!

naitoh May 5, 2026

Choose a reason for hiding this comment

Uh oh!

tompng May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

naitoh commented May 5, 2026 •

edited

Loading