Skip to content

Optimization of REXML::XPathParser#step#310

Open
naitoh wants to merge 2 commits intoruby:masterfrom
naitoh:fix_XPathParser_step_descendant_performance_xpath
Open

Optimization of REXML::XPathParser#step#310
naitoh wants to merge 2 commits intoruby:masterfrom
naitoh:fix_XPathParser_step_descendant_performance_xpath

Conversation

@naitoh
Copy link
Copy Markdown
Contributor

@naitoh naitoh commented May 5, 2026

Benchmark

$ benchmark-driver benchmark/xpath.yaml
                                                           before       after  before(YJIT)  after(YJIT)
  REXML::XPath.match(REXML::Document.new(xml), 'a//a')     7.182k      7.026k        4.507k       4.436k i/s -     100.000 times in 0.013924s 0.014233s 0.022190s 0.022544s
REXML::XPath.match(REXML::Document.new(xml), '//a//a')    567.537      2.663k       947.724       2.884k i/s -     100.000 times in 0.176200s 0.037556s 0.105516s 0.034670s

Comparison:
               REXML::XPath.match(REXML::Document.new(xml), 'a//a')
                                                before:      7181.8 i/s
                                                 after:      7025.9 i/s - 1.02x  slower
                                          before(YJIT):      4506.5 i/s - 1.59x  slower
                                           after(YJIT):      4435.8 i/s - 1.62x  slower

             REXML::XPath.match(REXML::Document.new(xml), '//a//a')
                                           after(YJIT):      2884.3 i/s
                                                 after:      2662.7 i/s - 1.08x  slower
                                          before(YJIT):       947.7 i/s - 3.04x  slower
                                                before:       567.5 i/s - 5.08x  slower
  • YJIT=ON : 0.98x - 3.04x faster
  • YJIT=OFF : 0.97x - 4.69x faster

@naitoh naitoh requested a review from kou May 5, 2026 06:47
Comment thread lib/rexml/xpath_parser.rb
new_nodes = {}
nodeset.each do |node|
new_nodeset = []
new_nodes = {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, the result of this xpath has changed.

REXML::XPath.match(REXML::Document.new('<a id="a1"><b id="b1"><a id="a2"><b id="b2"><b id="b3"></b></b></a></b></a>'), '//a/descendant::b[1]')
# => [<b id='b1'> ... </>, <b id='b2'> ... </>] Expected (in master branch)
# => [<b id='b1'> ... </>] Actual (this pull request)

Reference:

document.body.innerHTML='<a id="a1"><b id="b1"><a id="a2"><b id="b2"><b id="b3"></b></b></a></b></a>'
snapshot = document.evaluate('//a/descendant::b[1]', document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
[...new Array(snapshot.snapshotLength)].map((_, i) => snapshot.snapshotItem(i))
// => [b#b1, b#b2]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry.
Since this change has unintended consequences, I will cancel the change to #descendant.
Thanks!

Comment thread lib/rexml/xpath_parser.rb
Comment on lines +470 to +471
next if seen.key?(raw_node.object_id)
seen[raw_node.object_id] = true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object_id will lazily assign a new id when it is called. We can avoid this by using seen = {}.compare_by_identity instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out.
I’ll fix it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using compare_by_identity, Hash will compare by its identity, (same as comparing by object_id, but without assigning actual object_id), so seen.key?(raw_node) and seen[raw_node] = true is enough.

These two code gives the same result

h = {}
h[key.object_id] = value
h[key.object_id] #=> value
h[key.dup.object_id] #=> nil
h = {}.compare_by_identity
h[key] = value
h[key] #=> value
h[key.dup] #=> nil

naitoh added 2 commits May 6, 2026 01:19
## Benchmark
```
$ benchmark-driver benchmark/xpath.yaml
                                                           before       after  before(YJIT)  after(YJIT)
  REXML::XPath.match(REXML::Document.new(xml), 'a//a')     7.182k      7.026k        4.507k       4.436k i/s -     100.000 times in 0.013924s 0.014233s 0.022190s 0.022544s
REXML::XPath.match(REXML::Document.new(xml), '//a//a')    567.537      2.663k       947.724       2.884k i/s -     100.000 times in 0.176200s 0.037556s 0.105516s 0.034670s

Comparison:
               REXML::XPath.match(REXML::Document.new(xml), 'a//a')
                                                before:      7181.8 i/s
                                                 after:      7025.9 i/s - 1.02x  slower
                                          before(YJIT):      4506.5 i/s - 1.59x  slower
                                           after(YJIT):      4435.8 i/s - 1.62x  slower

             REXML::XPath.match(REXML::Document.new(xml), '//a//a')
                                           after(YJIT):      2884.3 i/s
                                                 after:      2662.7 i/s - 1.08x  slower
                                          before(YJIT):       947.7 i/s - 3.04x  slower
                                                before:       567.5 i/s - 5.08x  slower
```
- YJIT=ON : 0.98x - 3.04x faster
- YJIT=OFF : 0.97x - 4.69x faster
@naitoh naitoh force-pushed the fix_XPathParser_step_descendant_performance_xpath branch from 5b3e01a to 2d3346f Compare May 5, 2026 16:20
@naitoh naitoh changed the title Optimization of REXML::XPathParser#step,#descendant Optimization of REXML::XPathParser#step May 5, 2026
@naitoh naitoh requested a review from tompng May 5, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants