The asm interpreter already inlines ECMAScript calls, but builtin calls
still went through the generic C++ Call slow path even when the callee
was a plain native function pointer. That added an avoidable boundary
around hot builtin calls and kept asm from taking full advantage of the
new RawNativeFunction representation.
Teach the asm Call handler to recognize RawNativeFunction, allocate the
callee frame on the interpreter stack, copy the call-site arguments,
and jump straight to the stored C++ entry point.
NativeJavaScriptBackedFunction and other non-raw callees keep falling
through to the existing C++ slow path unchanged.
Store whether a function can participate in JS-to-JS inline calls on
SharedFunctionInstanceData instead of recomputing the function kind,
class-constructor bit, and bytecode availability at each fast-path
call site.
Keep JS-to-JS inline calls out of m_execution_context_stack and walk
the active stack from the running execution context instead. Base
pushes now record the previous running context so duplicate
TemporaryExecutionContext pushes and host re-entry still restore
correctly.
This keeps the fast JS-to-JS path off the vector without losing GC
root collection, stack traces, or helpers that need to inspect the
active execution context chain.
The bytecode interpreter only needed the running execution context,
but still threaded a separate Interpreter object through both the C++
and asm entry points. Move that state and the bytecode execution
helpers onto VM instead, and teach the asm generator and slow paths to
use VM directly.
Specialize only the fixed unary case in the bytecode generator and let
all other argument counts keep using the generic Call instruction. This
keeps the builtin bytecode simple while still covering the common fast
path.
The asm interpreter handles int32 inputs directly, applies the ToUint16
mask in-place, and reuses the VM's cached ASCII single-character
strings when the result is 7-bit representable. Non-ASCII single code
unit results stay on the dedicated builtin path via a small helper, and
the dedicated slow path still handles the generic cases.
Tag String.prototype.charAt as a builtin and emit a dedicated
bytecode instruction for non-computed calls.
The asm interpreter can then stay on the fast path when the
receiver is a primitive string with resident UTF-16 data and the
selected code unit is ASCII. In that case we can return the VM's
cached empty or single-character ASCII string directly.
Teach builtin call specialization to recognize non-computed
member calls to charCodeAt() and emit a dedicated builtin opcode.
Mark String.prototype.charCodeAt with that builtin tag, then add
an asm interpreter fast path for primitive-string receivers whose
UTF-16 data is already resident.
The asm path handles both ASCII-backed and UTF-16-backed resident
strings, returns NaN for out-of-bounds Int32 indices, and falls
back to the generic builtin call path for everything else. This
keeps the optimistic case in asm while preserving the ordinary
method call semantics when charCodeAt has been replaced or when
string resolution would be required.
Replace the generic CallBuiltin instruction with one opcode per
supported builtin call and make those instructions fixed-size by
arity. This removes the builtin dispatch sled in the asm
interpreter, gives each builtin a dedicated slow-path entry point,
and lets bytecode generation encode the callee shape directly.
Keep the existing handwritten asm fast paths for the Math builtins
that already benefit from them, while routing the other builtin
opcodes through their own C++ execute implementations. Build the
new opcode directly in Rust codegen, and keep the generic call
fallback when the original builtin function has been replaced.
Cache the flattened enumerable key snapshot for each `for..in` site and
reuse a `PropertyNameIterator` when the receiver shape, dictionary
generation, indexed storage kind and length, prototype chain
validity, and magical-length state still match.
Handle packed indexed receivers as well as plain named-property
objects. Teach `ObjectPropertyIteratorNext` in `asmint.asm` to return
cached property values directly and to fall back to the slow iterator
logic when any guard fails.
Treat arrays' hidden non-enumerable `length` property as a visited
name for for-in shadowing, and include the receiver's magical-length
state in the cache key so arrays and plain objects do not share
snapshots.
Add `test-js` and `test-js-bytecode` coverage for mixed numeric and
named keys, packed receiver transitions, re-entry, iterator reuse, GC
retention, array length shadowing, and same-site cache reuse.
This better describes what the method returns and avoids the possible
confusion caused by the mismatch in behavior between
`Value::is_array()` and `Value::as_array()`.
In b2d9fd3352, the root cause of the crash
was somewhat misdiagnosed, particularly around what place in code an
allocation could occur while constants data was uninitialized.
But more importantly, we can do better than the solution in that commit.
Instead of initializing constants with default values and then
overwriting them afterwards, simply initialize them with their actual
values directly when constructing the execution context.
This effectivly reverts commit b2d9fd3352.
The additional data being passed will be used in an upcoming commit.
Allows splitting the churn of modified function signatures from the
logically meaningful code change.
No behavior change.
Switch LibJS `RegExp` over to the Rust-backed `ECMAScriptRegex` APIs.
Route `new RegExp()`, regex literals, and the RegExp builtins through
the new compile and exec APIs, and stop re-validating patterns with the
deleted C++ parser on the way in. Preserve the observable error
behavior by carrying structured compile errors and backtracking-limit
failures across the FFI boundary. Cache compiled regex state and named
capture metadata on `RegExpObject` in the new representation.
Use the new API surface to simplify and speed up the builtin paths too:
share `exec_internal`, cache compiled regex pointers, keep the legacy
RegExp statics lazy, run global replace through batch `find_all`, and
optimize replace, test, split, and String helper paths. Add regression
tests for those JavaScript-visible paths.
Dictionary shapes are mutable (properties added/removed in-place via
add_property_without_transition), so sharing them between objects via
the NewObject premade shape cache is unsafe.
When a large object literal (>64 properties) is created repeatedly in
a loop, the first execution transitions to a dictionary shape, which
CacheObjectShape then caches. Subsequent iterations create new objects
all pointing to the same dictionary shape. If any of these objects adds
a new property, it mutates the shared shape in-place, increasing its
property_count, but only grows its own named property storage. Other
objects sharing the shape are left with undersized storage, leading to
a heap-buffer-overflow when the GC visits their edges.
Fix this by not caching dictionary shapes. This means object literals
with >64 properties won't get the premade-shape fast path, but such
literals are uncommon.
Store yield_continuation and yield_is_await directly in
ExecutionContext instead of allocating a GeneratorResult GC cell.
This removes a heap allocation per yield/await and fixes a latent
bug where continuation addresses stored as doubles could lose
precision.
Delete AST.cpp, AST.h, ASTDump.cpp, ScopeRecord.h, and the dead
get_builtin(MemberExpression const&) from Builtins.cpp.
Extract ImportEntry and ExportEntry into a new ModuleEntry.h,
since they are data types used by the module system, not AST
node types.
Inline ModuleRequest's sorting constructor and
SourceRange::filename().
Remove the dead annex_b_function_declarations field from
EvalDeclarationData, which was only populated by the C++ parser.
Remove Bytecode::compile() and the old create() overloads on
ECMAScriptFunctionObject that accepted C++ AST nodes. These
have no remaining callers now that all compilation goes through
the Rust pipeline.
Also remove the if-constexpr Parse Node branch from
async_block_start, since the Statement template instantiation
was already removed.
Fix transitive include dependencies on Generator.h by adding
explicit includes for headers that were previously pulled in
transitively.
Now that the Rust pipeline is the sole compilation path, remove all
C++ parser/codegen fallback paths from the callers:
- Script::parse() no longer falls back to C++ Parser
- SourceTextModule::parse() no longer falls back to C++ Parser
- perform_eval() no longer falls back to C++ Parser + Generator
- create_dynamic_function() no longer falls back to C++ Parser
- ShadowRealm eval no longer falls back to C++ Parser + Generator
- Interpreter::run(Script&) no longer falls back to Generator
Also remove the now-dead old constructors that took C++ AST nodes,
the module_requests() helper, and AST dump code from js.cpp.
Replace the OwnPtr<IndexedPropertyStorage> indirection with inline
indexed element storage directly on Object. This eliminates virtual
dispatch and reduces indirection for indexed property access.
The new system uses three storage kinds tracked by IndexedStorageKind:
- Packed: Dense array, no holes. Elements stored in a malloced Value*
array with capacity header (same layout as named properties).
- Holey: Dense array with possible holes marked by empty sentinel.
Same physical layout as Packed.
- Dictionary: Sparse storage using GenericIndexedPropertyStorage,
type-punned into the m_indexed_elements pointer.
Transitions: None->Packed->Holey->Dictionary (mostly monotonic).
Dictionary mode triggers on non-default attributes or sparse arrays.
Object keeps the same 48-byte size since m_indexed_elements (8 bytes)
replaces IndexedProperties (8 bytes), and the storage kind + array
size fit in existing padding alongside m_flags.
The asm interpreter benefits from one fewer indirection: it now reads
the element pointer and array size directly from Object fields instead
of chasing through OwnPtr -> IndexedPropertyStorage -> Vector.
Removes: IndexedProperties, SimpleIndexedPropertyStorage,
IndexedPropertyStorage, IndexedPropertyIterator.
Keeps: GenericIndexedPropertyStorage (for Dictionary mode).
Add Mov2 and Mov3 bytecode instructions that perform 2 or 3 register
moves in a single dispatch. A peephole optimization pass during
bytecode assembly merges consecutive Mov instructions within each
basic block into these combined instructions.
When merging, identical Movs are deduplicated (e.g. two identical Movs
become a single Mov, not a Mov2). This optimization is implemented in
both the C++ and Rust codegen pipelines.
The goal is to reduce the per-instruction dispatch overhead, which is
significant compared to the actual cost of moving a value.
This isn't fancy or elegant, but provides a real speed-up on many
workloads. As an example, Kraken/imaging-desaturate.js improves by
~1.07x on my laptop.
Replace the 16-byte Variant<Empty, GC::Ref<Script>, GC::Ref<Module>>
with a simple 8-byte GC::Ptr<Cell> that points to either a Script or
Module (or is null for Empty).
A helper function script_or_module_from_cell() converts back to the
full ScriptOrModule variant when needed (e.g. in
VM::get_active_script_or_module).
This field was written by push_inline_frame but never read anywhere.
The caller's executable is accessible via caller_frame->executable
if ever needed.
Shrinks ExecutionContext from 120 to 112 bytes.
The arguments Span (pointer + size = 16 bytes) was always derivable
from the tail array layout: data = values + (total_count - arg_count).
Replace it with a u32 argument_count and derive the span on demand
via arguments_span() / arguments_data() accessors.
Shrinks ExecutionContext from 136 to 120 bytes.
Remove four fields that are trivially derivable from other fields
already present in the ExecutionContext:
- global_object (from realm)
- global_declarative_environment (from realm)
- identifier_table (from executable)
- property_key_table (from executable)
This shrinks ExecutionContext from 192 to 160 bytes (-17%).
The asmint's GetGlobal/SetGlobal handlers now load through the realm
pointer, taking advantage of the cached declarative environment
pointer added in the previous commit.
Instead of storing a u32 index into a cache vector and looking up the
cache at runtime through a chain of dependent loads (load Executable*,
load vector data pointer, multiply index, add), store the actual cache
pointer as a u64 directly in the instruction stream.
A fixup pass (Executable::fixup_cache_pointers()) runs after Executable
construction in both the Rust and C++ pipelines, walking the bytecode
and replacing each index with the corresponding pointer.
The cache pointer type is encoded in Bytecode.def (e.g.
PropertyLookupCache*, GlobalVariableCache*) so the fixup switch is
auto-generated by the Python Op code generator, making it impossible
to forget updating the fixup when adding new cached instructions.
This eliminates 3-4 dependent loads on every inline cache access in
both the C++ interpreter and the assembly interpreter.
Add a new interpreter that executes bytecode via generated assembly,
written in a custom DSL (asmint.asm) that AsmIntGen compiles to
native x86_64 or aarch64 code.
The interpreter keeps the bytecode program counter and register file
pointer in machine registers for fast access, dispatching opcodes
through a jump table. Hot paths (arithmetic, comparisons, property
access on simple objects) are handled entirely in assembly, with
cold/complex operations calling into C++ helper functions defined
in AsmInterpreter.cpp.
A small build-time tool (gen_asm_offsets) uses offsetof() to emit
struct field offsets as constants consumed by the DSL, ensuring the
assembly stays in sync with C++ struct layouts.
The interpreter is enabled by default on platforms that support it.
The C++ interpreter can be selected via LIBJS_USE_CPP_INTERPRETER=1.
Currently supported platforms:
- Linux/x86_64
- Linux/aarch64
- macOS/x86_64
- macOS/aarch64
Move Interpreter::get() and set() from the .cpp file into the header
as inline methods. Make handle_exception(), perform_call(),
perform_call_impl(), and the HandleExceptionResponse enum public so
they can be called by the upcoming assembly interpreter's C++ glue
code. Also add set_running_execution_context() for the same reason.
This path will replace manual execution-context stack resizing with
vm().pop_execution_context() in the inline unwind paths. Apply this
in both exception unwinding and inline return handling so frame
teardown consistently goes through the VM’s canonical pop logic,
reducing the risk of execution-context stack desynchronization.
Instead of recursing through 5 native stack frames per JS function
call (execute_call -> internal_call -> ordinary_call_evaluate_body ->
run_executable -> run_bytecode), handle Call and CallConstruct for
normal ECMAScript functions directly in the dispatch loop.
The fast path allocates the callee's execution context on the
InterpreterStack, copies arguments, sets up the environment, and
jumps to the callee's bytecode entry point. Return and End unwind
inline frames by restoring the caller's state. Exception unwinding
walks through inline frames to find handlers.
The fast path code is kept in NEVER_INLINE helper functions
(try_inline_call, try_inline_call_construct, pop_inline_frame) to
minimize register pressure in the dispatch loop. handle_exception
takes program_counter by value to avoid forcing it onto the stack.
Reloading of bytecode/program_counter after frame switches is done
inline at each call site via RELOAD_AND_GOTO_START to preserve a
single dispatch entry point for optimal indirect branch prediction.
Replace alloca-based execution context allocation with InterpreterStack
bump allocation across all call sites: bytecode call instructions,
AbstractOperations call/construct, script evaluation, module evaluation,
and LibWeb module script evaluation.
Also replace the native stack space check with an InterpreterStack
exhaustion check, and remove the now-unused alloca macros from
ExecutionContext.h.
Replace 20 separate Put instructions (5 PutKinds x 4 forms) with
4 unified instructions (PutById, PutByIdWithThis, PutByValue,
PutByValueWithThis), each carrying a PutKind field at runtime instead
of being a separate opcode.
This reduces the number of handler entry points in the dispatch loop
and eliminates template instantiations of put_by_property_key and
put_by_value that were being duplicated 5x each when inlined by LTO.
Implement a complete Rust reimplementation of the LibJS frontend:
lexer, parser, AST, scope collector, and bytecode code generator.
The Rust pipeline is built via Corrosion (CMake-Cargo bridge) and
linked into LibJS as a static library. It is gated behind a build
flag (ENABLE_RUST, on by default except on Windows) and two runtime
environment variables:
- LIBJS_CPP: Use the C++ pipeline instead of Rust
- LIBJS_COMPARE_PIPELINES=1: Run both pipelines in lockstep,
aborting on any difference in AST or bytecode generated.
The C++ side communicates with Rust through a C FFI layer
(RustIntegration.cpp/h) that passes source text to Rust and receives
a populated Executable back via a BytecodeFactory interface.
Per step 13 of ScriptEvaluation in the ECMA-262 spec, the script body
should only be evaluated if GlobalDeclarationInstantiation returned a
normal completion.
This can't currently be triggered since we always create fresh Script
objects, but if we ever start reusing cached executables across
evaluations, this would prevent a subtle bug where the script body
runs despite GDI failing.
AsyncIteratorClose is now fully inlined as bytecode in ASTCodegen.cpp,
using the Await bytecode op to yield naturally. The C++ implementation
used synchronous await() which spins the event loop, violating
assertions when execution contexts are on the stack.
Change the completion_value field from Optional<Value> to Operand
in both IteratorClose and AsyncIteratorClose bytecode instructions.
This allows passing a dynamic value from a register, which is needed
for iterator close on abrupt completion where the exception value
is not known at codegen time.
Remove CodeGenerationError and make all bytecode generation functions
return their results directly instead of wrapping them in
CodeGenerationErrorOr.
For the few remaining sites where codegen encounters an unimplemented
or unexpected AST node, we now use a new emit_todo() helper that emits
a NewTypeError + Throw sequence at compile time (preserving the runtime
behavior) and then switches to a dead basic block so subsequent codegen
for the same function can continue without issue.
This allows us to remove error handling from all callers of the
bytecode compiler, simplifying the code significantly.
After compiling the bytecode executable on first run, null out the
AST (m_parse_node) and clear AnnexB candidates since they are no
longer needed. This frees the memory held by the entire AST for the
script's lifetime.
The parse_node() accessor now returns a nullable pointer. Callers
(js.cpp for AST dumping, Interpreter for first compilation) access
the AST before it is dropped.
Add Script::global_declaration_instantiation() that performs the GDI
algorithm using pre-computed name lists and shared function data
instead of walking the AST.
Runtime checks (has_lexical_declaration, can_declare_global_function,
etc.) remain since they depend on global environment state. AnnexB
iterates pre-collected candidates and calls
set_should_do_additional_annexB_steps() on stored refs.
The Interpreter::run(Script&) now calls the Script method instead of
the Program method.
Replace the runtime uses of formal_parameters() with pre-computed data:
- m_formal_parameter_count stores the parameter count
- m_parameter_names_for_mapped_arguments stores ordered parameter names
for simple parameter lists (used by create_mapped_arguments_object)
Change create_mapped_arguments_object to take Span<Utf16FlyString>
instead of NonnullRefPtr<FunctionParameters const>.
Remove virtual formal_parameters() from FunctionObject as it is no
longer needed.
Replace the ClassExpression const& reference in the NewClass
instruction with a u32 class_blueprint_index. The interpreter now
reads from the ClassBlueprint stored on the Executable and calls
construct_class() instead of the AST-based create_class_constructor().
Literal field initializers (numbers, booleans, null, strings, negated
numbers) are used directly in construct_class() without creating an
ECMAScriptFunctionObject, avoiding function creation overhead for
common field patterns like `x = 0` or `name = "hello"`.
Set class_field_initializer_name on SharedFunctionInstanceData at
codegen time for statically-known field keys (identifiers, private
identifiers, string literals, and numeric literals). For computed
keys, the name is set at runtime in construct_class().
ClassExpression AST nodes are no longer referenced from bytecode.
Replace the FunctionNode const& stored on the NewFunction bytecode
instruction with an index into a table of pre-created
SharedFunctionInstanceData objects on the Executable.
During bytecode compilation, we now eagerly create
SharedFunctionInstanceData for each function that will be
instantiated by NewFunction, and store it on both the FunctionNode
(for caching) and the Executable (for GC tracing).
At runtime, NewFunction simply looks up the SharedFunctionInstanceData
by index and calls create_from_function_data() directly, bypassing
the AST entirely. This removes one of the main reasons the AST had
to stay alive after compilation.
The instantiate_ordinary_function_expression() helper in
Interpreter.cpp is removed as its non-trivial code path (creating a
scope for named function expressions) was dead code -- it was only
called when !has_name(), so the has_own_name branch never executed.
delete super.x and delete super[expr] always throw a ReferenceError
per spec. Instead of deferring this to runtime via DeleteByIdWithThis
and DeleteByValueWithThis instructions, emit the throw directly during
bytecode generation.
Remove the now-unused DeleteByIdWithThis and DeleteByValueWithThis
instructions, and add a NewReferenceError instruction.
This comment referenced the old runtime unwind context stack behavior
where a flag had to be set to prevent yield from going through a
finally statement. That mechanism was removed and finally is now
handled purely through explicit completion records in bytecode.
After replacing the runtime unwind context stack with explicit
completion records for try/finally dispatch, the distinction between
"handler" (catch) and "finalizer" (finally) in the exception handler
table is no longer meaningful at runtime.
handle_exception() checked handler first, then finalizer, but they
did the exact same thing (set the PC). When both were present, the
finalizer was dead code.
Collapse both fields into a single handler_offset (now non-optional,
since an entry always has a target), remove the finalizer concept
from BasicBlock, UnwindContext, and ExceptionHandlers, and simplify
handle_exception() to a direct assignment.
The runtime unwind context stack was pushed by EnterUnwindContext
and popped by LeaveUnwindContext. With both opcodes removed, it is
no longer read or written by anything.
Remove UnwindInfo, the unwind_contexts vector, its GC visit loop,
its copy in ExecutionContext::copy(), and the VERIFY assertions that
referenced it in handle_exception() and catch_exception().
LeaveUnwindContext popped the runtime unwind context stack. With the
stack being removed, all emission sites become dead code. Remove the
opcode and all its emissions.