The Rust bytecode dumper now formats exception handler labels, raw
operands, builtins, labels, and registers. Remove the C++ dump-only
formatters and flatten Operand to expose only the runtime value-array
layout that C++ still observes.
Generate Rust bytecode dump helpers from Bytecode.def and route
Executable::dump() through them for instruction stream formatting.
Add a small Rust runtime::value helper for decoding encoded LibJS
Values so immediate Value operands are formatted on the Rust side. C++
callbacks remain only for local names and GC-backed Value payloads that
still need LibJS object access.
Remove the generated C++ to_byte_string_impl() methods and the old
Instruction::to_byte_string() dispatch. The bytecode dump tests cover
output compatibility.
The bytecode dump path only writes directly to stderr now.
Remove the unused string-returning dump API.
Also remove the private helper mode that only existed for that API.
The Rust bytecode generator now owns basic block construction.
The old C++ BasicBlock class no longer has any users.
Label no longer needs to translate from BasicBlock.
Remove the now-empty Label.cpp from the build as well.
Enable -Wexit-time-destructors for all in-tree library targets and
update process-lifetime library statics so they no longer register
exit-time destructors. Long-lived caches, lookup tables, singleton
registries, and generated constants now use NeverDestroyed or leaked
references where the data is intended to live until process exit.
Update LibWeb, LibLine, and the binding generators so regenerated
sources follow the same rule instead of reintroducing destructed
statics.
Allow a freshly materialized executable to inherit compatible runtime
caches from the executable it replaces. Store template object caches as
GC cells so they can be shared safely between executable instances.
Dynamic environment binding opcodes lost the old coordinate warmup.
They were split away from the static coordinate opcodes. Hot closures
and eval-sensitive functions then resolved the same binding by name on
every execution, which regressed JS benchmark throughput badly.
Give each dynamic environment opcode a per-executable coordinate cache
slot. The cache keeps the bytecode stream immutable while letting both
interpreters take a direct declarative environment fast path after the
first lookup. Keep the existing eval invalidation behavior and only warm
caches for declarative-only chains so with environments continue to
observe object shadowing.
Reject cached bytecode that uses the no-cache sentinel for dynamic
environment coordinate cache operands, since execution indexes those
cache arrays unconditionally.
Rebaseline bytecode expectations for the instruction size changes. Add
coverage for with-object shadowing across repeated dynamic lookups and
for rejecting corrupt dynamic environment cache indices.
Store bytecode property lookup caches as tiered handles instead of
eagerly allocating four entries for every cache slot. Each slot starts
empty, grows to one monomorphic entry after the first cacheable lookup,
and promotes to the existing four-entry table when another cache key
appears.
Global variable caches keep one inline property entry because they
usually warm up as global object property accesses. This keeps common
global access allocation-free while still avoiding the inherited
polymorphic cache cost.
Generated asm offsets now point at the lazy property-cache storage and
the inline global entry, and asm fast paths bail out for empty property
cache slots.
Teach Bytecode::Executable to store its instruction stream as either
an owned Vector or a retained Core::ImmutableBytes range. Cached
bytecode materialization now clones the immutable blob owner and lets
the executable point directly into the file-backed cache blob instead
of copying instruction bytes back onto the heap.
Keep a cached instruction data pointer inside the stream wrapper so the
asm interpreter still has a direct hot-path load. Align executable
bytecode payloads in the cache format so mmap-backed instruction
streams satisfy validator and interpreter alignment requirements.
Store compact cache indexes in bytecode instructions instead of raw
pointers to the executable cache vectors. This keeps the instruction
stream independent from heap addresses and removes pointer fixups when
materializing cached bytecode.
Resolve the mutable cache pointers at execution time from the current
Executable. Bytecode test expectations are updated for the smaller cache
operands and resulting instruction offsets.
Avoid decoding warm-cache script responses into full UTF-16 SourceCode
buffers when a bytecode cache sidecar is available. SourceCode now keeps
the original immutable source bytes and source encoding, then decodes
only when full source text or a Function.toString() range is requested.
Compute the bytecode cache source hash while streaming decoded code
points from the response bytes, so cache validation does not force an
intermediate UTF-8 string. Function and class source text metadata now
stores SourceCode ranges instead of views into a materialized buffer.
Keep basic block offsets as construction-only metadata rather than
storing them on every Executable. The validator now receives the offsets
through a transient Rust FFI span, and the bytecode dump rebuilds block
starts by scanning labels, terminators, and exception handler metadata.
Drop the table from the bytecode cache format and bump the format
version so old caches are rebuilt. This removes a field that was only
used by validation and bytecode dump paths.
Store source map locations as bytecode offset, line, and column.
Runtime consumers only emit the start line and column, so source end
positions and source text offsets do not need to be carried through
Executable source maps, bytecode cache serialization, or the Rust FFI.
Keep SourceCode's internal position cache able to track source text
offsets so callers can still translate source offsets to line and
column pairs when needed. Hash dump-bytecode IDs from the name, first
source position, and bytecode size instead of source slices that need
end offsets.
Bump the bytecode cache format version for the slimmer serialized
source map entry shape.
Avoid emitting consecutive source map entries when they carry the
same source range. The bytecode offset for the previous entry remains
valid for later PCs because source lookup now uses the largest source
map entry whose offset is not greater than the program counter.
This keeps stack traces stable while allowing statement-sized runs of
bytecode to share one source map entry.
Executable caches can retain weak pointers to shapes and prototypes
across collections. With incremental sweeping, a previous sweep may have
already freed the block behind one of those pointers by the time pruning
runs for weak containers.
Use the live HeapBlock registry before reading cached cells. This
matches the other weak containers updated for incremental sweeping.
Move weak container cleanup (remove_dead_cells) out of both
sweep_dead_cells() and start_incremental_sweep() to the place
where it is actually safe to inspect cell state: collect_garbage().
Previously, remove_dead_cells could access cells that had already
been swept and poisoned by ASAN, causing use-after-poison crashes
when a new GC triggered while an incremental sweep was in progress.
Report outline storage retained by bytecode executables, table
objects, object property iterator cache data, and shared function
instance data. This includes bytecode vectors, cache arrays, source
maps, class blueprint elements, and binding metadata.
Carry full source positions through the Rust bytecode source map so
stack traces and other bytecode-backed source lookups can use them
directly.
This keeps exception-heavy paths from reconstructing line and column
information through SourceCode::range_from_offsets(), which can spend a
lot of time building SourceCode's position cache on first use.
We're trading some space for time here, but I believe it's worth it at
this tag, as this saves ~250ms of main thread time while loading
https://x.com/ on my Linux machine. :^)
Reading the stored Position out of the source map directly also exposed
two things masked by the old range_from_offsets() path: a latent
off-by-one in Lexer::new_at_offset() (its consume() bumped line_column
past the character at offset; only synthesize_binding_pattern() hit it),
and a (1,1) fallback in range_from_offsets() that fired whenever the
queried range reached EOF. Fix the lexer, then rebaseline both the
bytecode dump tests (no more spurious "1:1") and the destructuring AST
tests (binding-pattern identifiers now report their real columns).
Mirror Executable's constants size and data pointer in adjacent fields
so the asm Call fast path can pair-load them together. The underlying
Vector layout keeps size and data apart, so a small cached raw span
lets the hot constant-copy loop fetch both pieces of metadata at once.
Cache the flattened enumerable key snapshot for each `for..in` site and
reuse a `PropertyNameIterator` when the receiver shape, dictionary
generation, indexed storage kind and length, prototype chain
validity, and magical-length state still match.
Handle packed indexed receivers as well as plain named-property
objects. Teach `ObjectPropertyIteratorNext` in `asmint.asm` to return
cached property values directly and to fall back to the slow iterator
logic when any guard fails.
Treat arrays' hidden non-enumerable `length` property as a visited
name for for-in shadowing, and include the receiver's magical-length
state in the cache key so arrays and plain objects do not share
snapshots.
Add `test-js` and `test-js-bytecode` coverage for mixed numeric and
named keys, packed receiver transitions, re-entry, iterator reuse, GC
retention, array length shadowing, and same-site cache reuse.
Add a metadata header showing register count, block count, local
variable names, and the constants table. Resolve jump targets to
block labels (e.g. "block1") instead of raw hex addresses, and add
visual separation between basic blocks.
Make identifier and property key formatting more concise by using
backtick quoting and showing base_identifier as a trailing
parenthetical hint that joins the base and property names.
Generate a stable name for each executable by hashing the source
text it covers (stable across codegen changes). Named functions
show as "foo$9beb91ec", anonymous ones as "$43362f3f". Also show
the source filename, line, and column.
CachedSourceRange was a GC-allocated cell stored on the
ExecutionContext, only needed because ExecutionContext must be
trivially destructible.
Move the source range cache to a HashMap<u32, SourceRange> on the
Executable (keyed by program counter), where it belongs. This
eliminates the GC::Cell subclass entirely and removes the
cached_source_range field from ExecutionContext.
StackTraceElement and TracebackFrame now store Optional<SourceRange>
directly instead of GC::Ptr<CachedSourceRange>.
Shrinks ExecutionContext from 144 to 136 bytes.
Instead of storing a u32 index into a cache vector and looking up the
cache at runtime through a chain of dependent loads (load Executable*,
load vector data pointer, multiply index, add), store the actual cache
pointer as a u64 directly in the instruction stream.
A fixup pass (Executable::fixup_cache_pointers()) runs after Executable
construction in both the Rust and C++ pipelines, walking the bytecode
and replacing each index with the corresponding pointer.
The cache pointer type is encoded in Bytecode.def (e.g.
PropertyLookupCache*, GlobalVariableCache*) so the fixup switch is
auto-generated by the Python Op code generator, making it impossible
to forget updating the fixup when adding new cached instructions.
This eliminates 3-4 dependent loads on every inline cache access in
both the C++ interpreter and the assembly interpreter.
Property lookup cache entries previously used GC::Weak<T> for shape,
prototype, and prototype_chain_validity pointers. Each GC::Weak
requires a ref-counted WeakImpl allocation and an extra indirection
on every access.
Replace these with GC::RawPtr<T> and make Executable a WeakContainer
so the GC can clear stale pointers during sweep via remove_dead_cells.
For static PropertyLookupCache instances (used throughout the runtime
for well-known property lookups), introduce StaticPropertyLookupCache
which registers itself in a global list that also gets swept.
Now that inline cache entries use GC::RawPtr instead of GC::Weak,
we can compare shape/prototype pointers directly without going
through the WeakImpl indirection. This removes one dependent load
from each IC check in GetById, PutById, GetLength, GetGlobal, and
SetGlobal handlers.
Add the ability to dump AST and bytecode to a String instead of only
to stdout/stderr. This is done by adding an optional StringBuilder
output sink to ASTDumpState, and a new dump_to_string() method on
both ASTNode and Bytecode::Executable.
These will be used for comparing output between compilation pipelines.
Replace the ClassExpression const& reference in the NewClass
instruction with a u32 class_blueprint_index. The interpreter now
reads from the ClassBlueprint stored on the Executable and calls
construct_class() instead of the AST-based create_class_constructor().
Literal field initializers (numbers, booleans, null, strings, negated
numbers) are used directly in construct_class() without creating an
ECMAScriptFunctionObject, avoiding function creation overhead for
common field patterns like `x = 0` or `name = "hello"`.
Set class_field_initializer_name on SharedFunctionInstanceData at
codegen time for statically-known field keys (identifiers, private
identifiers, string literals, and numeric literals). For computed
keys, the name is set at runtime in construct_class().
ClassExpression AST nodes are no longer referenced from bytecode.
Replace the FunctionNode const& stored on the NewFunction bytecode
instruction with an index into a table of pre-created
SharedFunctionInstanceData objects on the Executable.
During bytecode compilation, we now eagerly create
SharedFunctionInstanceData for each function that will be
instantiated by NewFunction, and store it on both the FunctionNode
(for caching) and the Executable (for GC tracing).
At runtime, NewFunction simply looks up the SharedFunctionInstanceData
by index and calls create_from_function_data() directly, bypassing
the AST entirely. This removes one of the main reasons the AST had
to stay alive after compilation.
The instantiate_ordinary_function_expression() helper in
Interpreter.cpp is removed as its non-trivial code path (creating a
scope for named function expressions) was dead code -- it was only
called when !has_name(), so the has_own_name branch never executed.
The exception handler table is sorted by start_offset, so use
binary_search instead of a linear scan. This matches the pattern
already used by source_range_at() in the same file.
After replacing the runtime unwind context stack with explicit
completion records for try/finally dispatch, the distinction between
"handler" (catch) and "finalizer" (finally) in the exception handler
table is no longer meaningful at runtime.
handle_exception() checked handler first, then finalizer, but they
did the exact same thing (set the PC). When both were present, the
finalizer was dead code.
Collapse both fields into a single handler_offset (now non-optional,
since an entry always has a target), remove the finalizer concept
from BasicBlock, UnwindContext, and ExceptionHandlers, and simplify
handle_exception() to a direct assignment.
Bytecode source map entries are always added in order of increasing
bytecode offset, and lookups only happen during error handling (a cold
path). This makes a sorted vector with binary search a better fit than
a hash map.
This change reduces memory overhead and speeds up bytecode generation
by avoiding hash table operations during compilation. Lookups remain
fast via binary search, and since source_range_at() is only called
when generating stack traces, the O(log n) lookup is acceptable.
Every function call allocates an ExecutionContext with a trailing array
of Values for registers, locals, constants, and arguments. Previously,
the constructor would initialize all slots to js_special_empty_value(),
but constant slots were then immediately overwritten by the interpreter
copying in values from the Executable before execution began.
To eliminate this redundant initialization, we rearrange the layout from
[registers | constants | locals] to [registers | locals | constants].
This groups registers and locals together at the front, allowing us to
initialize only those slots while leaving constant slots uninitialized
until they're populated with their actual values.
This reduces the per-call initialization cost from O(registers + locals
+ constants) to O(registers + locals).
Also tightens up the types involved (size_t -> u32) and adds VERIFYs to
guard against overflow when computing the combined slot counts, and to
ensure the total fits within the 29-bit operand index field.
When a function creates object literals with simple property names,
we now cache the resulting shape after the first instantiation. On
subsequent calls, we create the object with the cached shape directly
and write property values at their known offsets.
This avoids repeated shape transitions and property offset lookups
for a common JavaScript pattern.
The optimization uses two new bytecode instructions:
- CacheObjectShape: Captures the final shape after object construction
- InitObjectLiteralProperty: Writes properties using cached offsets
Only "simple" object literals are optimized (string literal keys with
simple value expressions). Complex cases like computed properties,
getters/setters, and spread elements use the existing slow path.
3.4x speedup on a microbenchmark that repeatedly instantiates an object
literal with 26 properties. Small progressions on various benchmarks.
This adds visit_edges(Cell::Visitor&) methods to various helper structs
that contain GC pointers, and makes sure they are called from owning
GC-heap-allocated objects as needed.
These were found by our Clang plugin after expanding its capabilities.
The added rules will be enforced by CI going forward.
This resolves a FIXME in its code generation, particularly for:
- Caching the template object
- Setting the correct property attributes
- Freezing the resulting objects
This allows archive.org to load, which uses the Lit library.
The Lit library caches these template objects to determine if a
template has changed, allowing it to determine to do a full template
rerender or only partially update the rendering. Before, we would
always cause a full rerender on update because we didn't return the
same template object.
This caused issues with archive.org's code, I believe particularly with
its router library, where we would constantly detach and reattach nodes
unexpectedly, ending up with the page content not being attached to the
router's custom element.
Instead of creating PropertyKeys on the fly during interpreter
execution, we now store fully-formed ones in the Executable.
This avoids a whole bunch of busywork in property access instructions
and substantially reduces code size bloat.
While we're in the bytecode compiler, we want to know which type of
Operand we're dealing with, but once we've generated the bytecode
stream, we only ever need its index.
This patch simplifies Operand by removing the aarch64 bitfield hacks
and makes it 32-bit on all platforms. We keep 3 type bits in the high
bits of the index while compiling, and then zero them out when
flattening the final bytecode stream.
This makes bytecode more compact on x86_64, and avoids bit twiddling
on aarch64. Everyone wins something!
When stringifying bytecode for debugging output, we now have an API in
Executable that can look at a raw operand index and tell you what type
of operand it was, based on known quantities of each type in the stack
frame.
This commits puts the strict mode flag in the header of every bytecode
instruction. This allows us to check for strict mode without looking at
the currently running execution context.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root