Implement a complete Rust reimplementation of the LibJS frontend:
lexer, parser, AST, scope collector, and bytecode code generator.
The Rust pipeline is built via Corrosion (CMake-Cargo bridge) and
linked into LibJS as a static library. It is gated behind a build
flag (ENABLE_RUST, on by default except on Windows) and two runtime
environment variables:
- LIBJS_CPP: Use the C++ pipeline instead of Rust
- LIBJS_COMPARE_PIPELINES=1: Run both pipelines in lockstep,
aborting on any difference in AST or bytecode generated.
The C++ side communicates with Rust through a C FFI layer
(RustIntegration.cpp/h) that passes source text to Rust and receives
a populated Executable back via a BytecodeFactory interface.
Add the ability to dump AST and bytecode to a String instead of only
to stdout/stderr. This is done by adding an optional StringBuilder
output sink to ASTDumpState, and a new dump_to_string() method on
both ASTNode and Bytecode::Executable.
These will be used for comparing output between compilation pipelines.
The scope collector uses HashMaps for identifier groups and variables,
which means their iteration order is non-deterministic. This causes
local variable indices and function declaration instantiation (FDI)
bytecode to vary between runs.
Fix this by sorting identifier group keys alphabetically before
assigning local variable indices, and sorting vars_to_initialize by
name before emitting FDI bytecode.
Also make register allocation deterministic by always picking the
lowest-numbered free register instead of whichever one happens to be
at the end of the free list.
This is preparation for bringing in a new source->bytecode pipeline
written in Rust. Checking for regressions is significantly easier
if we can expect identical output from both pipelines.
For `export default (class Name { })`, two things were wrong:
The parser extracted the class expression's name as the export's
local binding name instead of `*default*`. Per the spec, this is
`export default AssignmentExpression ;` whose BoundNames is
`*default*`, not the class name.
The bytecode generator had a special case for ClassExpression that
skipped emitting InitializeLexicalBinding for named classes.
These two bugs compensated for each other (no crash, but wrong
behavior). Fix both: always use `*default*` as the local binding
name for expression exports, and always emit InitializeLexicalBinding
for the `*default*` binding.
When a statement in a switch case body doesn't produce a result (e.g.
a variable declaration), we were incorrectly resetting the completion
value to undefined. This caused the completion value of preceding
expression statements to be lost.
Per step 13 of ScriptEvaluation in the ECMA-262 spec, the script body
should only be evaluated if GlobalDeclarationInstantiation returned a
normal completion.
This can't currently be triggered since we always create fresh Script
objects, but if we ever start reusing cached executables across
evaluations, this would prevent a subtle bug where the script body
runs despite GDI failing.
When emitting block declaration instantiation, we were not calling
set_local_initialized() after writing block-scoped function
declarations to local variables via Mov. This caused unnecessary
ThrowIfTDZ checks to be emitted when those locals were later read.
Block-scoped function declarations are always initialized at block
entry (via NewFunction + Mov), so TDZ checks for them are redundant.
Move the duplicated ThrowIfTDZ emission logic from three places in
ASTCodegen.cpp into a single Generator::emit_tdz_check_if_needed()
helper. This handles both argument TDZ (which requires a Mov to
empty first) and lexically-declared variable TDZ uniformly.
This avoids emitting some unnecessary ThrowIfTDZ instructions.
The find_source_record lambda was doing a reverse linear scan through
the entire source map for every instruction emitted, resulting in
quadratic behavior. This was catastrophic for large scripts like
Octane/mandreel.js, where compile() dominated the profile at ~30s.
Since both source map entries and instruction iteration are ordered by
offset, replace the per-instruction scan with a forward cursor that
advances in lockstep with instruction emission.
The compile() function was adding source map entries for all
instructions in a block upfront, before processing assembly-time
optimizations (Jump-to-next-block elision, Jump-to-Return/End inlining,
JumpIf-to-JumpTrue/JumpFalse conversion). When a Jump was skipped,
its phantom source map entry remained at the offset where the next
block's first instruction would be placed, causing binary_search to
find the wrong source location for error messages.
Fix by building source map entries inline with instruction emission,
ensuring only actually-emitted instructions get entries. For blocks
with duplicate source map entries at the same offset (from rewind in
fuse_compare_and_jump), the last entry is used.
Add ThisExpression handling to the expression_identifier() helper used
for base_identifier in bytecode instructions. This makes PutById and
GetById emit base_identifier:this when the base is a this expression.
When MemberExpression::generate_bytecode calls emit_load_from_reference,
it only uses the loaded_value and discards the reference operands. For
computed member expressions (e.g. a[0]), this was generating an
unnecessary Mov to save the property register for potential store-back.
Add a ReferenceMode parameter to emit_load_from_reference. When LoadOnly
is passed, the computed property path skips the register save and Mov.
Per AssignmentRestElement and AssignmentElement in the specification,
the DestructuringAssignmentTarget reference must be evaluated before
iterating or stepping the iterator. We were doing it in the wrong
order, which caused observable differences when the target evaluation
has side effects, and could lead to infinite loops when the iterator
never completes.
Add Generator::emit_evaluate_reference() to evaluate a member
expression's base and property into ReferenceOperands without performing
a load or store, then use the pre-evaluated reference for the store
after iteration completes.
When a function has parameter expressions (e.g. destructured params with
defaults), CreateVariableEnvironment creates a separate variable
environment for function declarations and sets it as the current lexical
environment at runtime. However, the bytecode generator's
m_lexical_environment_register_stack was not updated to reflect this, so
subsequent CreateLexicalEnvironment ops would parent themselves to the
old (pre-variable-environment) lexical environment, skipping the
variable environment entirely.
This meant function declarations hoisted into the variable environment
were invisible to closures created in the function body.
Fix this by capturing the new lexical environment into a register after
CreateVariableEnvironment and pushing it onto the environment register
stack.
This fixes a problem where https://tumblr.com/ wouldn't load the feed.
AsyncIteratorClose is now fully inlined as bytecode in ASTCodegen.cpp,
using the Await bytecode op to yield naturally. The C++ implementation
used synchronous await() which spins the event loop, violating
assertions when execution contexts are on the stack.
The AsyncIteratorClose bytecode op calls async_iterator_close() which
uses synchronous await() internally. This spins the event loop while
execution contexts are on the stack, violating the microtask checkpoint
assertion in LibWeb.
Replace AsyncIteratorClose op emissions in for-await-of close handlers
with inline bytecode that uses the proper Await op, allowing the async
function to yield and resume naturally through the event loop.
For the non-throw path (break/return/continue-to-outer): emit
GetMethod, Call, Await, and ThrowIfNotObject inline.
For the throw path: wrap the close steps in an exception handler so
that any error from GetMethod/Call/Await is discarded and the original
exception is rethrown, per spec step 5.
The else branch already throws ReferenceError and switches to a dead
basic block, so the emit_todo() in the PutValue section is unreachable.
Return early after the throw and replace emit_todo() with
VERIFY_NOT_REACHED().
CallExpression is accepted as an assignment target for web compatibility
(Annex B), but must throw ReferenceError at runtime. We were incorrectly
throwing TypeError with a TODO message.
Replace emit_todo() calls in three codegen paths (simple assignment,
compound assignment/update, and for-in/of) with proper ReferenceError
using the "Invalid left-hand side in assignment" message, matching the
behavior of V8 and JSC.
When a for-of or for-await-of loop exits via break, return, throw,
or continue-to-outer-loop, we now correctly call IteratorClose
(or AsyncIteratorClose) to give the iterator a chance to clean
up resources.
This uses a synthetic FinallyContext that wraps the LHS assignment
and loop body, reusing the existing try/finally completion record
machinery. The ReturnToFinally boundary is placed between Break
and Continue so that continue-to-same-loop bypasses the close
(zero overhead on normal iteration) while all other abrupt exits
route through the iterator close dispatch chain.
for-in (enumerate) does not require iterator close per spec.
Change the completion_value field from Optional<Value> to Operand
in both IteratorClose and AsyncIteratorClose bytecode instructions.
This allows passing a dynamic value from a register, which is needed
for iterator close on abrupt completion where the exception value
is not known at codegen time.
Remove CodeGenerationError and make all bytecode generation functions
return their results directly instead of wrapping them in
CodeGenerationErrorOr.
For the few remaining sites where codegen encounters an unimplemented
or unexpected AST node, we now use a new emit_todo() helper that emits
a NewTypeError + Throw sequence at compile time (preserving the runtime
behavior) and then switches to a dead basic block so subsequent codegen
for the same function can continue without issue.
This allows us to remove error handling from all callers of the
bytecode compiler, simplifying the code significantly.
These checks validate engine-internal usage of builtin abstract
operations (arity, argument types, known operation names), not user JS
code. Replace CodeGenerationError returns with VERIFY() assertions:
- Spread argument check becomes VERIFY(!argument.is_spread)
- Arity checks become VERIFY(arguments.size() == N)
- StringLiteral type checks become VERIFY(message)
- Unknown operation/constant fallthroughs become VERIFY_NOT_REACHED()
Replace CodeGenerationError returns with VERIFY_NOT_REACHED() or
VERIFY() at sites that are provably unreachable:
- Non-computed member expression fallbacks in emit_load_from_reference,
emit_store_to_reference, and emit_delete_reference (member expression
properties are always computed, identifier, or private identifier)
- Two non-computed member expression fallbacks in AssignmentExpression
- Default case in compound assignment switch (all 15 AssignmentOp values
are handled)
- BindingPattern Empty/Expression name+alias pair (computed property
names always require an alias)
- Two assignment+destructuring combinations in for-in/of body evaluation
(is_destructuring is only set for VariableDeclaration lhs, which
always has VarBinding or LexicalBinding kind, never Assignment)
When a class field has a BigInt literal key like `128n = class {}`,
the anonymous class should get the name "128". The codegen path
handles Identifier, StringLiteral, and NumericLiteral keys but was
missing BigInt keys, causing the name to be empty.
Parse the BigInt literal value at codegen time and convert it to a
decimal string for both the field_name (anonymous function naming)
and class_field_initializer_name (eval("arguments") checking) paths.
Add static factory methods create_for_function_node() on
SharedFunctionInstanceData and update all callers to use them instead
of FunctionNode::ensure_shared_data().
This removes the GC::Root<SharedFunctionInstanceData> cache from
FunctionNode, eliminating the coupling between the RefCounted AST
and GC-managed runtime objects. The cache was effectively dead code:
hoisted declarations use m_functions_to_initialize directly, and
function expressions always create fresh instances during codegen.
After compiling the bytecode executable on first run, null out the
AST (m_parse_node) and clear AnnexB candidates since they are no
longer needed. This frees the memory held by the entire AST for the
script's lifetime.
The parse_node() accessor now returns a nullable pointer. Callers
(js.cpp for AST dumping, Interpreter for first compilation) access
the AST before it is dropped.
Add Script::global_declaration_instantiation() that performs the GDI
algorithm using pre-computed name lists and shared function data
instead of walking the AST.
Runtime checks (has_lexical_declaration, can_declare_global_function,
etc.) remain since they depend on global environment state. AnnexB
iterates pre-collected candidates and calls
set_should_do_additional_annexB_steps() on stored refs.
The Interpreter::run(Script&) now calls the Script method instead of
the Program method.
Extract FunctionParsingInsights into its own header and introduce
FunctionLocal as a standalone mirror of Identifier::Local. This
allows SharedFunctionInstanceData.h to avoid pulling in the full
AST type hierarchy, reducing transitive include bloat.
The AST.h include is kept in SharedFunctionInstanceData.cpp where
it's needed for the constructor that accesses AST node types.
Replace the runtime uses of formal_parameters() with pre-computed data:
- m_formal_parameter_count stores the parameter count
- m_parameter_names_for_mapped_arguments stores ordered parameter names
for simple parameter lists (used by create_mapped_arguments_object)
Change create_mapped_arguments_object to take Span<Utf16FlyString>
instead of NonnullRefPtr<FunctionParameters const>.
Remove virtual formal_parameters() from FunctionObject as it is no
longer needed.
Pre-compute the data that emit_function_declaration_instantiation
previously obtained by querying ScopeNode methods at codegen time:
- m_has_scope_body: whether ecmascript_code is a ScopeNode
- m_has_non_local_lexical_declarations: from ScopeNode query
- m_lexical_bindings: non-local lexically-scoped identifier names and
their constant-declaration status
After this change, emit_function_declaration_instantiation no longer
casts m_ecmascript_code to ScopeNode or calls any ScopeNode methods.
Replace Vector<FunctionDeclaration const&> with a FunctionToInitialize
struct that stores a pre-created SharedFunctionInstanceData, function
name, and local index. The SharedFunctionInstanceData for each hoisted
function is created eagerly during the parent's construction, removing
the need to reference FunctionDeclaration AST nodes after construction.
Replace VariableNameToInitialize (which holds Identifier const&) with a
VarBinding struct that stores pre-extracted values: name, local index,
parameter_binding, and function_name. This removes a reference to AST
Identifier nodes from SharedFunctionInstanceData, allowing the AST to
be freed after compilation.
Replace the ClassExpression const& reference in the NewClass
instruction with a u32 class_blueprint_index. The interpreter now
reads from the ClassBlueprint stored on the Executable and calls
construct_class() instead of the AST-based create_class_constructor().
Literal field initializers (numbers, booleans, null, strings, negated
numbers) are used directly in construct_class() without creating an
ECMAScriptFunctionObject, avoiding function creation overhead for
common field patterns like `x = 0` or `name = "hello"`.
Set class_field_initializer_name on SharedFunctionInstanceData at
codegen time for statically-known field keys (identifiers, private
identifiers, string literals, and numeric literals). For computed
keys, the name is set at runtime in construct_class().
ClassExpression AST nodes are no longer referenced from bytecode.
Build a ClassBlueprint from ClassExpression elements at codegen time:
- Methods/getters/setters: register SharedFunctionInstanceData from
the method's FunctionExpression
- Field initializers with literal values (numbers, booleans, null,
strings, negated numbers): store the value directly, avoiding
function creation entirely
- Field initializers with non-literal values: wrap in
ClassFieldInitializerStatement and create SharedFunctionInstanceData
- Static initializers: create SharedFunctionInstanceData from the
function body
- Constructor: register SharedFunctionInstanceData from the
constructor's FunctionExpression
Add public accessors to ClassMethod::function() and
StaticInitializer::function_body() for codegen access.
The blueprint is registered but not yet used by NewClass (dual path).
No behavioral change.
Introduce ClassBlueprint and ClassElementDescriptor structs that will
replace the AST-backed class construction path. ClassBlueprint stores
pre-compiled function data indices and element metadata, following the
same pattern as SharedFunctionInstanceData for NewFunction.
Add Vector<ClassBlueprint> to Executable for storage.
No behavioral change.
Replace the FunctionNode const& stored on the NewFunction bytecode
instruction with an index into a table of pre-created
SharedFunctionInstanceData objects on the Executable.
During bytecode compilation, we now eagerly create
SharedFunctionInstanceData for each function that will be
instantiated by NewFunction, and store it on both the FunctionNode
(for caching) and the Executable (for GC tracing).
At runtime, NewFunction simply looks up the SharedFunctionInstanceData
by index and calls create_from_function_data() directly, bypassing
the AST entirely. This removes one of the main reasons the AST had
to stay alive after compilation.
The instantiate_ordinary_function_expression() helper in
Interpreter.cpp is removed as its non-trivial code path (creating a
scope for named function expressions) was dead code -- it was only
called when !has_name(), so the has_own_name branch never executed.
After successful bytecode compilation, the m_functions_to_initialize
and m_var_names_to_initialize_binding vectors are no longer needed
as they are only consumed by emit_function_declaration_instantiation()
during code generation.
Add clear_compile_inputs() to release these vectors post-compile,
and call it from both ECMAScriptFunctionObject::get_stack_frame_size()
and NativeJavaScriptBackedFunction::bytecode_executable() after their
respective lazy compilation succeeds.
Also add a pre-compile assertion in Generator::generate_from_function()
to verify we never try to compile the same function data twice, and a
VERIFY in ECMAScriptFunctionObject::ecmascript_code() to guard against
null dereference.
delete super.x and delete super[expr] always throw a ReferenceError
per spec. Instead of deferring this to runtime via DeleteByIdWithThis
and DeleteByValueWithThis instructions, emit the throw directly during
bytecode generation.
Remove the now-unused DeleteByIdWithThis and DeleteByValueWithThis
instructions, and add a NewReferenceError instruction.
Each of the three blocks in a TryStatement (try body, catch body,
finally body) needs its own CompletionRegisterScope so that
break/continue inside any of them carries the block's own
completion value rather than leaking a value from a surrounding
statement or a different block.
Previously, statements inside these blocks would update the
enclosing scope's completion register (e.g. a for-loop's
register), and if break/continue fired with no prior expression
value, the enclosing register's stale value would leak through
as the completion value instead of undefined.
Each block now allocates a fresh register initialized to
undefined and uses it as the current completion register during
body generation. This matches the pattern already used by loops
and switch statements.
When a loop or switch body produces an abrupt completion (break or
continue) with an empty value, the ES spec requires UpdateEmpty to
replace the empty value with the last non-empty completion value V.
The bytecode compiler was failing to do this because it only updated
the completion register after body codegen, guarded by
!is_current_block_terminated(). When break/continue terminated the
block, the update was skipped.
Fix this with three changes:
1. Introduce a CompletionRegisterScope that tells
ScopeNode::generate_bytecode to eagerly emit Mov instructions
into the completion register after each value-producing
statement. This ensures the register is up to date before any
break or continue fires.
2. Give IfStatement its own CompletionRegisterScope (initialized
to undefined) during branch evaluation. This models the spec's
UpdateEmpty(stmtCompletion, undefined) for if-statements: when
break/continue fires inside an if-branch, the scoped jump
propagation sees that the if's completion register differs from
the loop's and emits a Mov, correctly replacing the eagerly
written value with undefined. Without this, code like
{ 3; if (true) { break; } else { } } would incorrectly carry
the value 3 instead of undefined through the break.
3. Capture loop body results and emit a fallback Mov for
non-ScopeNode bodies (e.g. bare expression statements like
do x=1; while(false)) that don't participate in the eager
CompletionRegisterScope update mechanism.
For labelled break/continue that cross loop boundaries, the jump
codegen now propagates the inner completion register to the target
scope's completion register before emitting the jump.
Also fix ForStatement to use a proper completion register
(previously it returned the body result directly, which was wrong
for empty bodies and break-with-no-value cases).
The exception handler table is sorted by start_offset, so use
binary_search instead of a linear scan. This matches the pattern
already used by source_range_at() in the same file.
This comment referenced the old runtime unwind context stack behavior
where a flag had to be set to prevent yield from going through a
finally statement. That mechanism was removed and finally is now
handled purely through explicit completion records in bytecode.
After replacing the runtime unwind context stack with explicit
completion records for try/finally dispatch, the distinction between
"handler" (catch) and "finalizer" (finally) in the exception handler
table is no longer meaningful at runtime.
handle_exception() checked handler first, then finalizer, but they
did the exact same thing (set the PC). When both were present, the
finalizer was dead code.
Collapse both fields into a single handler_offset (now non-optional,
since an entry always has a target), remove the finalizer concept
from BasicBlock, UnwindContext, and ExceptionHandlers, and simplify
handle_exception() to a direct assignment.
The runtime unwind context stack was pushed by EnterUnwindContext
and popped by LeaveUnwindContext. With both opcodes removed, it is
no longer read or written by anything.
Remove UnwindInfo, the unwind_contexts vector, its GC visit loop,
its copy in ExecutionContext::copy(), and the VERIFY assertions that
referenced it in handle_exception() and catch_exception().
LeaveUnwindContext popped the runtime unwind context stack. With the
stack being removed, all emission sites become dead code. Remove the
opcode and all its emissions.