Add monomorphic inline cache for object property reads#1521
Add monomorphic inline cache for object property reads#1521andreasrosdal wants to merge 2 commits into
Conversation
Speed up obj.prop reads by caching, per bytecode site, the resolution of
own data-property accesses so repeated reads skip the find_own_property
hash probe and prototype-chain walk.
Mechanism
- New opcodes OP_get_field_ic / OP_get_field2_ic (format atom_u16, size 7):
the 4-byte atom stays at offset +1 (so existing bytecode atom relocation
in (de)serialization works unchanged) plus an appended u16 IC index.
- resolve_labels() rewrites every eligible OP_get_field / OP_get_field2 into
its _ic variant and assigns a unique IC slot index per site (capped at
u16; falls back to the plain opcode if exhausted). obj.length keeps its
existing OP_get_length specialization.
- JSFunctionBytecode gains an ic[] array (separate allocation) of
{ JSShape *shape; uint64_t serial; uint32_t prop_offset } slots, sized by
the per-function IC count. Allocated in js_create_function from the
compiler and rebuilt in the bytecode reader by scanning for IC opcodes
(bc_function_alloc_ic). Freed in free_function_bytecode. BC_VERSION bumped
26 -> 27 because the opcode space is renumbered; stale serialized bytecode
is now rejected rather than misparsed. The committed gen/*.c bootstrap
bytecode and the test_bjson fuzz corpus version bytes were regenerated.
Scope: READS ONLY. Writes (put_field) are intentionally not cached. The IC
fast path triggers only for own data properties found directly on a
non-exotic receiver (prs->flags & JS_PROP_TMASK == 0); inherited, accessor,
auto-init, varref and exotic (array/typed-array/proxy) accesses fall through
to the unchanged slow path with identical semantics.
Safety / invalidation
- A hit requires obj->shape == ic->shape AND obj->shape->ic_serial ==
ic->serial. The cached pointer is only ever compared, never dereferenced.
- Self-invalidation: structural changes either move the object to a new
shape pointer or only append properties in place (existing offsets are
never moved or reused without a fresh shape allocation), so a pointer
match guarantees the cached offset still names the same own data property.
- ABA: a freed shape's memory can be reused at the same address. We do NOT
hold a reference on the cached shape -- that would inflate ref_count and
break the engine's "ref_count == 1 means exclusively owned, mutate in
place" invariant (it caused add_property's ref_count==1 assertion to
fail). Instead every JSShape carries a runtime-unique ic_serial, assigned
afresh whenever a shape's memory is (re)allocated (new/clone/resize/
compact) and whenever js_shape_prepare_update mutates an exclusively-owned
shape in place (e.g. redefining a data property as an accessor, or
deleting a property). Reused memory therefore always carries a different
serial, so ABA cannot produce a false hit. Because no reference is held,
there is nothing to mark in the cycle GC and nothing to release beyond
freeing the ic[] array.
Testing: make test reports 0/62 errors; the same suite under ASAN+UBSAN
(QJS_ENABLE_ASAN/UBSAN) reports 0/62 errors with no leaks/UAF/UB. Targeted
stress (shape-change invalidation, polymorphic and megamorphic sites,
prototype shadowing/mutation, getters, Proxy, frozen objects,
data->accessor redefinition, delete/re-add) and a 30k-read randomized
differential test all produce output identical to a stock build. A
property-read-heavy loop runs ~1.45x faster.
https://claude.ai/code/session_01MhkkobYvut7A4oP4w8eV1b
The inline cache bumped BC_VERSION 26->27 but only regenerated gen/repl.c, gen/standalone.c and gen/function_source.c. The remaining qjsc-generated bytecode (gen/hello.c, gen/hello_module.c, gen/test_fib.c and the builtin-*.h headers) still encoded version 26, so CI's "make codegen" clean-tree check failed with a dirty git tree. Regenerate them via `make codegen` so all serialized bytecode carries the bumped version. https://claude.ai/code/session_01MhkkobYvut7A4oP4w8eV1b
|
Can you run https://github.com/quickjs-ng/web-tooling-benchmark before and after? We've had something like what you're trying here but I removed it again in 7de6d46 because it was sometimes a lot slower than no inline caches (the typescript compiler benchmark in particular was hit hard, if memory serves.) |
|
"Ran web-tooling-benchmark v0.5.3 before (700eba4, no IC) and after (47b73ae, benchmark by Claude |
|
Closing since the benchmark doesn't show improvement |
Speed up obj.prop reads by caching, per bytecode site, the resolution of own data-property accesses so repeated reads skip the find_own_property hash probe and prototype-chain walk.
Yet another Claude attempt
Mechanism
Scope: READS ONLY. Writes (put_field) are intentionally not cached. The IC fast path triggers only for own data properties found directly on a non-exotic receiver (prs->flags & JS_PROP_TMASK == 0); inherited, accessor, auto-init, varref and exotic (array/typed-array/proxy) accesses fall through to the unchanged slow path with identical semantics.
Safety / invalidation
Testing: make test reports 0/62 errors; the same suite under ASAN+UBSAN (QJS_ENABLE_ASAN/UBSAN) reports 0/62 errors with no leaks/UAF/UB. Targeted stress (shape-change invalidation, polymorphic and megamorphic sites, prototype shadowing/mutation, getters, Proxy, frozen objects, data->accessor redefinition, delete/re-add) and a 30k-read randomized differential test all produce output identical to a stock build. A property-read-heavy loop runs ~1.45x faster.