Implement a complete Rust reimplementation of the LibJS frontend:
lexer, parser, AST, scope collector, and bytecode code generator.
The Rust pipeline is built via Corrosion (CMake-Cargo bridge) and
linked into LibJS as a static library. It is gated behind a build
flag (ENABLE_RUST, on by default except on Windows) and two runtime
environment variables:
- LIBJS_CPP: Use the C++ pipeline instead of Rust
- LIBJS_COMPARE_PIPELINES=1: Run both pipelines in lockstep,
aborting on any difference in AST or bytecode generated.
The C++ side communicates with Rust through a C FFI layer
(RustIntegration.cpp/h) that passes source text to Rust and receives
a populated Executable back via a BytecodeFactory interface.
Remove CodeGenerationError and make all bytecode generation functions
return their results directly instead of wrapping them in
CodeGenerationErrorOr.
For the few remaining sites where codegen encounters an unimplemented
or unexpected AST node, we now use a new emit_todo() helper that emits
a NewTypeError + Throw sequence at compile time (preserving the runtime
behavior) and then switches to a dead basic block so subsequent codegen
for the same function can continue without issue.
This allows us to remove error handling from all callers of the
bytecode compiler, simplifying the code significantly.
Change eval_declaration_instantiation to take EvalDeclarationData&
instead of Program const&. The function body now iterates
pre-computed name lists instead of walking the AST.
Both callers (perform_eval and perform_shadow_realm_eval) now build
EvalDeclarationData before calling eval_declaration_instantiation.
This decouples the runtime declaration-instantiation API from AST
types, matching the pattern already used by Script for global
declaration instantiation.
Every function call allocates an ExecutionContext with a trailing array
of Values for registers, locals, constants, and arguments. Previously,
the constructor would initialize all slots to js_special_empty_value(),
but constant slots were then immediately overwritten by the interpreter
copying in values from the Executable before execution began.
To eliminate this redundant initialization, we rearrange the layout from
[registers | constants | locals] to [registers | locals | constants].
This groups registers and locals together at the front, allowing us to
initialize only those slots while leaving constant slots uninitialized
until they're populated with their actual values.
This reduces the per-call initialization cost from O(registers + locals
+ constants) to O(registers + locals).
Also tightens up the types involved (size_t -> u32) and adds VERIFYs to
guard against overflow when computing the combined slot counts, and to
ensure the total fits within the 29-bit operand index field.
This moves the responsibility of setting up a SourceCode object to the
users of JS::Lexer.
This means Lexer and Parser are free to use string views into the
SourceCode internally while working.
It also means Lexer no longer has to think about anything other than
UTF-16 (or ASCII) inputs. So the unit test for parsing various invalid
UTF-8 sequences is deleted here.
We don't need to return two values; running an executable only ever
produces a throw completion, or a normal completion, i.e a Value.
This necessitated a few minor changes, such as adding a way to check
if a JS::Cell is a GeneratorResult.
This has quite a lot of fall out. But the majority of it is just type or
UDL substitution, where the changes just fall through to other function
calls.
By changing property key storage to UTF-16, the main affected areas are:
* NativeFunction names must now be UTF-16
* Bytecode identifiers must now be UTF-16
* Module/binding names must now be UTF-16
This is an active proposal at stage 3 of the TC39 proposal process.
See: https://tc39.es/proposal-dynamic-code-brand-checks/
See: https://github.com/tc39/proposal-dynamic-code-brand-checks
This proposal essentially adds support for the TrustedScript type from
the Trusted Types specification to eval and Function. This in turn
pipes support for the type into the CSP hook to check if the CSP allows
dynamic code compilation.
However, it currently doesn't support ShadowRealms, so the
implementation here is a close approximation, using PerformEval as the
basis.
See: https://github.com/tc39/proposal-dynamic-code-brand-checks/issues/19
This is required to support the new function signature for the CSP
hook, and will allow us to slot in Trusted Types support in the future.
This is better because:
- Better data locality
- Allocate vector for registers+constants+locals+arguments in one go
instead of allocating two vectors separately
The special empty value (that we use for array holes, Optional<Value>
when empty and a few other other placeholder/sentinel tasks) still
exists, but you now create one via JS::js_special_empty_value() and
check for it with Value::is_special_empty_value().
The main idea here is to make it very unlikely to accidentally create an
unexpected special empty value.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root