Add monomorphic inline cache for object property reads by andreasrosdal · Pull Request #1521 · quickjs-ng/quickjs

andreasrosdal · 2026-05-31T10:12:07Z

Speed up obj.prop reads by caching, per bytecode site, the resolution of own data-property accesses so repeated reads skip the find_own_property hash probe and prototype-chain walk.

Yet another Claude attempt

Mechanism

New opcodes OP_get_field_ic / OP_get_field2_ic (format atom_u16, size 7): the 4-byte atom stays at offset +1 (so existing bytecode atom relocation in (de)serialization works unchanged) plus an appended u16 IC index.
resolve_labels() rewrites every eligible OP_get_field / OP_get_field2 into its _ic variant and assigns a unique IC slot index per site (capped at u16; falls back to the plain opcode if exhausted). obj.length keeps its existing OP_get_length specialization.
JSFunctionBytecode gains an ic[] array (separate allocation) of { JSShape shape; uint64_t serial; uint32_t prop_offset } slots, sized by the per-function IC count. Allocated in js_create_function from the compiler and rebuilt in the bytecode reader by scanning for IC opcodes (bc_function_alloc_ic). Freed in free_function_bytecode. BC_VERSION bumped 26 -> 27 because the opcode space is renumbered; stale serialized bytecode is now rejected rather than misparsed. The committed gen/.c bootstrap bytecode and the test_bjson fuzz corpus version bytes were regenerated.

Scope: READS ONLY. Writes (put_field) are intentionally not cached. The IC fast path triggers only for own data properties found directly on a non-exotic receiver (prs->flags & JS_PROP_TMASK == 0); inherited, accessor, auto-init, varref and exotic (array/typed-array/proxy) accesses fall through to the unchanged slow path with identical semantics.

Safety / invalidation

A hit requires obj->shape == ic->shape AND obj->shape->ic_serial == ic->serial. The cached pointer is only ever compared, never dereferenced.
Self-invalidation: structural changes either move the object to a new shape pointer or only append properties in place (existing offsets are never moved or reused without a fresh shape allocation), so a pointer match guarantees the cached offset still names the same own data property.
ABA: a freed shape's memory can be reused at the same address. We do NOT hold a reference on the cached shape -- that would inflate ref_count and break the engine's "ref_count == 1 means exclusively owned, mutate in place" invariant (it caused add_property's ref_count==1 assertion to fail). Instead every JSShape carries a runtime-unique ic_serial, assigned afresh whenever a shape's memory is (re)allocated (new/clone/resize/ compact) and whenever js_shape_prepare_update mutates an exclusively-owned shape in place (e.g. redefining a data property as an accessor, or deleting a property). Reused memory therefore always carries a different serial, so ABA cannot produce a false hit. Because no reference is held, there is nothing to mark in the cycle GC and nothing to release beyond freeing the ic[] array.

Testing: make test reports 0/62 errors; the same suite under ASAN+UBSAN (QJS_ENABLE_ASAN/UBSAN) reports 0/62 errors with no leaks/UAF/UB. Targeted stress (shape-change invalidation, polymorphic and megamorphic sites, prototype shadowing/mutation, getters, Proxy, frozen objects, data->accessor redefinition, delete/re-add) and a 30k-read randomized differential test all produce output identical to a stock build. A property-read-heavy loop runs ~1.45x faster.

Speed up obj.prop reads by caching, per bytecode site, the resolution of own data-property accesses so repeated reads skip the find_own_property hash probe and prototype-chain walk. Mechanism - New opcodes OP_get_field_ic / OP_get_field2_ic (format atom_u16, size 7): the 4-byte atom stays at offset +1 (so existing bytecode atom relocation in (de)serialization works unchanged) plus an appended u16 IC index. - resolve_labels() rewrites every eligible OP_get_field / OP_get_field2 into its _ic variant and assigns a unique IC slot index per site (capped at u16; falls back to the plain opcode if exhausted). obj.length keeps its existing OP_get_length specialization. - JSFunctionBytecode gains an ic[] array (separate allocation) of { JSShape *shape; uint64_t serial; uint32_t prop_offset } slots, sized by the per-function IC count. Allocated in js_create_function from the compiler and rebuilt in the bytecode reader by scanning for IC opcodes (bc_function_alloc_ic). Freed in free_function_bytecode. BC_VERSION bumped 26 -> 27 because the opcode space is renumbered; stale serialized bytecode is now rejected rather than misparsed. The committed gen/*.c bootstrap bytecode and the test_bjson fuzz corpus version bytes were regenerated. Scope: READS ONLY. Writes (put_field) are intentionally not cached. The IC fast path triggers only for own data properties found directly on a non-exotic receiver (prs->flags & JS_PROP_TMASK == 0); inherited, accessor, auto-init, varref and exotic (array/typed-array/proxy) accesses fall through to the unchanged slow path with identical semantics. Safety / invalidation - A hit requires obj->shape == ic->shape AND obj->shape->ic_serial == ic->serial. The cached pointer is only ever compared, never dereferenced. - Self-invalidation: structural changes either move the object to a new shape pointer or only append properties in place (existing offsets are never moved or reused without a fresh shape allocation), so a pointer match guarantees the cached offset still names the same own data property. - ABA: a freed shape's memory can be reused at the same address. We do NOT hold a reference on the cached shape -- that would inflate ref_count and break the engine's "ref_count == 1 means exclusively owned, mutate in place" invariant (it caused add_property's ref_count==1 assertion to fail). Instead every JSShape carries a runtime-unique ic_serial, assigned afresh whenever a shape's memory is (re)allocated (new/clone/resize/ compact) and whenever js_shape_prepare_update mutates an exclusively-owned shape in place (e.g. redefining a data property as an accessor, or deleting a property). Reused memory therefore always carries a different serial, so ABA cannot produce a false hit. Because no reference is held, there is nothing to mark in the cycle GC and nothing to release beyond freeing the ic[] array. Testing: make test reports 0/62 errors; the same suite under ASAN+UBSAN (QJS_ENABLE_ASAN/UBSAN) reports 0/62 errors with no leaks/UAF/UB. Targeted stress (shape-change invalidation, polymorphic and megamorphic sites, prototype shadowing/mutation, getters, Proxy, frozen objects, data->accessor redefinition, delete/re-add) and a 30k-read randomized differential test all produce output identical to a stock build. A property-read-heavy loop runs ~1.45x faster. https://claude.ai/code/session_01MhkkobYvut7A4oP4w8eV1b

The inline cache bumped BC_VERSION 26->27 but only regenerated gen/repl.c, gen/standalone.c and gen/function_source.c. The remaining qjsc-generated bytecode (gen/hello.c, gen/hello_module.c, gen/test_fib.c and the builtin-*.h headers) still encoded version 26, so CI's "make codegen" clean-tree check failed with a dirty git tree. Regenerate them via `make codegen` so all serialized bytecode carries the bumped version. https://claude.ai/code/session_01MhkkobYvut7A4oP4w8eV1b

bnoordhuis · 2026-05-31T18:46:49Z

Can you run https://github.com/quickjs-ng/web-tooling-benchmark before and after?

We've had something like what you're trying here but I removed it again in 7de6d46 because it was sometimes a lot slower than no inline caches (the typescript compiler benchmark in particular was hit hard, if memory serves.)

andreasrosdalw · 2026-05-31T18:58:44Z

"Ran web-tooling-benchmark v0.5.3 before (700eba4, no IC) and after (47b73ae,
this PR), Release builds, 2 runs each. Overall geometric mean is unchanged:
~1.085 runs/s before vs ~1.08 after, i.e. flat within run-to-run noise.
Importantly there's no TypeScript blow-up this time (2.02 -> 2.00, ~-1.5%),
so the regression from 7de6d46 doesn't reproduce. Per-test it's mixed though:
prettier +34% and prepack +8%, but source-map -14%, terser -7% and
babel-minify -5%, so the IC roughly breaks even on real-world tooling.'

benchmark by Claude

andreasrosdal · 2026-05-31T19:04:44Z

Closing since the benchmark doesn't show improvement

claude added 2 commits May 31, 2026 10:07

andreasrosdal closed this May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add monomorphic inline cache for object property reads#1521

Add monomorphic inline cache for object property reads#1521
andreasrosdal wants to merge 2 commits into
quickjs-ng:masterfrom
nordstjernen-web:claude/perf-inline-cache

andreasrosdal commented May 31, 2026 •

edited

Loading

Uh oh!

bnoordhuis commented May 31, 2026

Uh oh!

andreasrosdalw commented May 31, 2026 •

edited

Loading

Uh oh!

andreasrosdal commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

andreasrosdal commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bnoordhuis commented May 31, 2026

Uh oh!

andreasrosdalw commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreasrosdal commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andreasrosdal commented May 31, 2026 •

edited

Loading

andreasrosdalw commented May 31, 2026 •

edited

Loading