Skip to content

Add monomorphic inline cache for object property reads#1521

Closed
andreasrosdal wants to merge 2 commits into
quickjs-ng:masterfrom
nordstjernen-web:claude/perf-inline-cache
Closed

Add monomorphic inline cache for object property reads#1521
andreasrosdal wants to merge 2 commits into
quickjs-ng:masterfrom
nordstjernen-web:claude/perf-inline-cache

Conversation

@andreasrosdal
Copy link
Copy Markdown
Contributor

@andreasrosdal andreasrosdal commented May 31, 2026

Speed up obj.prop reads by caching, per bytecode site, the resolution of own data-property accesses so repeated reads skip the find_own_property hash probe and prototype-chain walk.

Yet another Claude attempt

Mechanism

  • New opcodes OP_get_field_ic / OP_get_field2_ic (format atom_u16, size 7): the 4-byte atom stays at offset +1 (so existing bytecode atom relocation in (de)serialization works unchanged) plus an appended u16 IC index.
  • resolve_labels() rewrites every eligible OP_get_field / OP_get_field2 into its _ic variant and assigns a unique IC slot index per site (capped at u16; falls back to the plain opcode if exhausted). obj.length keeps its existing OP_get_length specialization.
  • JSFunctionBytecode gains an ic[] array (separate allocation) of { JSShape shape; uint64_t serial; uint32_t prop_offset } slots, sized by the per-function IC count. Allocated in js_create_function from the compiler and rebuilt in the bytecode reader by scanning for IC opcodes (bc_function_alloc_ic). Freed in free_function_bytecode. BC_VERSION bumped 26 -> 27 because the opcode space is renumbered; stale serialized bytecode is now rejected rather than misparsed. The committed gen/.c bootstrap bytecode and the test_bjson fuzz corpus version bytes were regenerated.

Scope: READS ONLY. Writes (put_field) are intentionally not cached. The IC fast path triggers only for own data properties found directly on a non-exotic receiver (prs->flags & JS_PROP_TMASK == 0); inherited, accessor, auto-init, varref and exotic (array/typed-array/proxy) accesses fall through to the unchanged slow path with identical semantics.

Safety / invalidation

  • A hit requires obj->shape == ic->shape AND obj->shape->ic_serial == ic->serial. The cached pointer is only ever compared, never dereferenced.
  • Self-invalidation: structural changes either move the object to a new shape pointer or only append properties in place (existing offsets are never moved or reused without a fresh shape allocation), so a pointer match guarantees the cached offset still names the same own data property.
  • ABA: a freed shape's memory can be reused at the same address. We do NOT hold a reference on the cached shape -- that would inflate ref_count and break the engine's "ref_count == 1 means exclusively owned, mutate in place" invariant (it caused add_property's ref_count==1 assertion to fail). Instead every JSShape carries a runtime-unique ic_serial, assigned afresh whenever a shape's memory is (re)allocated (new/clone/resize/ compact) and whenever js_shape_prepare_update mutates an exclusively-owned shape in place (e.g. redefining a data property as an accessor, or deleting a property). Reused memory therefore always carries a different serial, so ABA cannot produce a false hit. Because no reference is held, there is nothing to mark in the cycle GC and nothing to release beyond freeing the ic[] array.

Testing: make test reports 0/62 errors; the same suite under ASAN+UBSAN (QJS_ENABLE_ASAN/UBSAN) reports 0/62 errors with no leaks/UAF/UB. Targeted stress (shape-change invalidation, polymorphic and megamorphic sites, prototype shadowing/mutation, getters, Proxy, frozen objects, data->accessor redefinition, delete/re-add) and a 30k-read randomized differential test all produce output identical to a stock build. A property-read-heavy loop runs ~1.45x faster.

claude added 2 commits May 31, 2026 10:07
Speed up obj.prop reads by caching, per bytecode site, the resolution of
own data-property accesses so repeated reads skip the find_own_property
hash probe and prototype-chain walk.

Mechanism
- New opcodes OP_get_field_ic / OP_get_field2_ic (format atom_u16, size 7):
  the 4-byte atom stays at offset +1 (so existing bytecode atom relocation
  in (de)serialization works unchanged) plus an appended u16 IC index.
- resolve_labels() rewrites every eligible OP_get_field / OP_get_field2 into
  its _ic variant and assigns a unique IC slot index per site (capped at
  u16; falls back to the plain opcode if exhausted). obj.length keeps its
  existing OP_get_length specialization.
- JSFunctionBytecode gains an ic[] array (separate allocation) of
  { JSShape *shape; uint64_t serial; uint32_t prop_offset } slots, sized by
  the per-function IC count. Allocated in js_create_function from the
  compiler and rebuilt in the bytecode reader by scanning for IC opcodes
  (bc_function_alloc_ic). Freed in free_function_bytecode. BC_VERSION bumped
  26 -> 27 because the opcode space is renumbered; stale serialized bytecode
  is now rejected rather than misparsed. The committed gen/*.c bootstrap
  bytecode and the test_bjson fuzz corpus version bytes were regenerated.

Scope: READS ONLY. Writes (put_field) are intentionally not cached. The IC
fast path triggers only for own data properties found directly on a
non-exotic receiver (prs->flags & JS_PROP_TMASK == 0); inherited, accessor,
auto-init, varref and exotic (array/typed-array/proxy) accesses fall through
to the unchanged slow path with identical semantics.

Safety / invalidation
- A hit requires obj->shape == ic->shape AND obj->shape->ic_serial ==
  ic->serial. The cached pointer is only ever compared, never dereferenced.
- Self-invalidation: structural changes either move the object to a new
  shape pointer or only append properties in place (existing offsets are
  never moved or reused without a fresh shape allocation), so a pointer
  match guarantees the cached offset still names the same own data property.
- ABA: a freed shape's memory can be reused at the same address. We do NOT
  hold a reference on the cached shape -- that would inflate ref_count and
  break the engine's "ref_count == 1 means exclusively owned, mutate in
  place" invariant (it caused add_property's ref_count==1 assertion to
  fail). Instead every JSShape carries a runtime-unique ic_serial, assigned
  afresh whenever a shape's memory is (re)allocated (new/clone/resize/
  compact) and whenever js_shape_prepare_update mutates an exclusively-owned
  shape in place (e.g. redefining a data property as an accessor, or
  deleting a property). Reused memory therefore always carries a different
  serial, so ABA cannot produce a false hit. Because no reference is held,
  there is nothing to mark in the cycle GC and nothing to release beyond
  freeing the ic[] array.

Testing: make test reports 0/62 errors; the same suite under ASAN+UBSAN
(QJS_ENABLE_ASAN/UBSAN) reports 0/62 errors with no leaks/UAF/UB. Targeted
stress (shape-change invalidation, polymorphic and megamorphic sites,
prototype shadowing/mutation, getters, Proxy, frozen objects,
data->accessor redefinition, delete/re-add) and a 30k-read randomized
differential test all produce output identical to a stock build. A
property-read-heavy loop runs ~1.45x faster.

https://claude.ai/code/session_01MhkkobYvut7A4oP4w8eV1b
The inline cache bumped BC_VERSION 26->27 but only regenerated
gen/repl.c, gen/standalone.c and gen/function_source.c. The remaining
qjsc-generated bytecode (gen/hello.c, gen/hello_module.c, gen/test_fib.c
and the builtin-*.h headers) still encoded version 26, so CI's
"make codegen" clean-tree check failed with a dirty git tree.

Regenerate them via `make codegen` so all serialized bytecode carries the
bumped version.

https://claude.ai/code/session_01MhkkobYvut7A4oP4w8eV1b
@bnoordhuis
Copy link
Copy Markdown
Contributor

Can you run https://github.com/quickjs-ng/web-tooling-benchmark before and after?

We've had something like what you're trying here but I removed it again in 7de6d46 because it was sometimes a lot slower than no inline caches (the typescript compiler benchmark in particular was hit hard, if memory serves.)

@andreasrosdalw
Copy link
Copy Markdown

andreasrosdalw commented May 31, 2026

"Ran web-tooling-benchmark v0.5.3 before (700eba4, no IC) and after (47b73ae,
this PR), Release builds, 2 runs each. Overall geometric mean is unchanged:
~1.085 runs/s before vs ~1.08 after, i.e. flat within run-to-run noise.
Importantly there's no TypeScript blow-up this time (2.02 -> 2.00, ~-1.5%),
so the regression from 7de6d46 doesn't reproduce. Per-test it's mixed though:
prettier +34% and prepack +8%, but source-map -14%, terser -7% and
babel-minify -5%, so the IC roughly breaks even on real-world tooling.'

benchmark by Claude

@andreasrosdal
Copy link
Copy Markdown
Contributor Author

Closing since the benchmark doesn't show improvement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants