Implement a complete Rust reimplementation of the LibJS frontend:
lexer, parser, AST, scope collector, and bytecode code generator.
The Rust pipeline is built via Corrosion (CMake-Cargo bridge) and
linked into LibJS as a static library. It is gated behind a build
flag (ENABLE_RUST, on by default except on Windows) and two runtime
environment variables:
- LIBJS_CPP: Use the C++ pipeline instead of Rust
- LIBJS_COMPARE_PIPELINES=1: Run both pipelines in lockstep,
aborting on any difference in AST or bytecode generated.
The C++ side communicates with Rust through a C FFI layer
(RustIntegration.cpp/h) that passes source text to Rust and receives
a populated Executable back via a BytecodeFactory interface.
Remove CodeGenerationError and make all bytecode generation functions
return their results directly instead of wrapping them in
CodeGenerationErrorOr.
For the few remaining sites where codegen encounters an unimplemented
or unexpected AST node, we now use a new emit_todo() helper that emits
a NewTypeError + Throw sequence at compile time (preserving the runtime
behavior) and then switches to a dead basic block so subsequent codegen
for the same function can continue without issue.
This allows us to remove error handling from all callers of the
bytecode compiler, simplifying the code significantly.
Add static factory methods create_for_function_node() on
SharedFunctionInstanceData and update all callers to use them instead
of FunctionNode::ensure_shared_data().
This removes the GC::Root<SharedFunctionInstanceData> cache from
FunctionNode, eliminating the coupling between the RefCounted AST
and GC-managed runtime objects. The cache was effectively dead code:
hoisted declarations use m_functions_to_initialize directly, and
function expressions always create fresh instances during codegen.
Now that initialize_environment() uses pre-computed data and
execute_module() caches its executable / uses TLA shared data,
we can drop the AST reference after it's no longer needed.
For TLA modules, the AST is dropped immediately after constructing
the SharedFunctionInstanceData (which takes its own ref). For non-TLA
modules, the AST is dropped after the first bytecode compilation.
Also remove the m_default_export field (replaced by the pre-computed
m_default_export_binding_name) and extract default export info in
parse() instead of the constructor.
For non-TLA modules, cache the compiled Bytecode::Executable on the
SourceTextModule so we only compile once.
For TLA modules, pre-create the SharedFunctionInstanceData for the
async wrapper function at construction time. The wrapper function's
first invocation will compile it to bytecode and then automatically
drop the AST via clear_compile_inputs().
Extract the data needed by initialize_environment() from the AST at
construction time, following the pattern already used by Script for
global declaration instantiation.
This pre-computes var declared names, lexical bindings (with function
indices), functions to initialize (with SharedFunctionInstanceData),
and the default export binding name.
Every function call allocates an ExecutionContext with a trailing array
of Values for registers, locals, constants, and arguments. Previously,
the constructor would initialize all slots to js_special_empty_value(),
but constant slots were then immediately overwritten by the interpreter
copying in values from the Executable before execution began.
To eliminate this redundant initialization, we rearrange the layout from
[registers | constants | locals] to [registers | locals | constants].
This groups registers and locals together at the front, allowing us to
initialize only those slots while leaving constant slots uninitialized
until they're populated with their actual values.
This reduces the per-call initialization cost from O(registers + locals
+ constants) to O(registers + locals).
Also tightens up the types involved (size_t -> u32) and adds VERIFYs to
guard against overflow when computing the combined slot counts, and to
ensure the total fits within the 29-bit operand index field.
This moves the responsibility of setting up a SourceCode object to the
users of JS::Lexer.
This means Lexer and Parser are free to use string views into the
SourceCode internally while working.
It also means Lexer no longer has to think about anything other than
UTF-16 (or ASCII) inputs. So the unit test for parsing various invalid
UTF-8 sequences is deleted here.
We don't need to return two values; running an executable only ever
produces a throw completion, or a normal completion, i.e a Value.
This necessitated a few minor changes, such as adding a way to check
if a JS::Cell is a GeneratorResult.
This has quite a lot of fall out. But the majority of it is just type or
UDL substitution, where the changes just fall through to other function
calls.
By changing property key storage to UTF-16, the main affected areas are:
* NativeFunction names must now be UTF-16
* Bytecode identifiers must now be UTF-16
* Module/binding names must now be UTF-16
This is better because:
- Better data locality
- Allocate vector for registers+constants+locals+arguments in one go
instead of allocating two vectors separately
The special empty value (that we use for array holes, Optional<Value>
when empty and a few other other placeholder/sentinel tasks) still
exists, but you now create one via JS::js_special_empty_value() and
check for it with Value::is_special_empty_value().
The main idea here is to make it very unlikely to accidentally create an
unexpected special empty value.
Instead of making a copy of the Vector<FunctionParameter> from the AST
every time we instantiate an ECMAScriptFunctionObject, we now keep the
parameters in a ref-counted FunctionParameters object.
This reduces memory usage, and also allows us to cache the bytecode
executables for default parameter expressions without recompiling them
for every instantiation. :^)
While we don't yet have a working `using` implementation with our byte
code, we can still keep our DisposableStack implementation up to date.
The changes brought in here are all editorial, and set us up to start
an AsyncDisposableStack implementation.
From what I understand, the suspension steps are not required now,
or in the future for our implementation, or any other. The intent
is already implemented in the spec pushing on another execution
context to the stack and leaving the running execution context as-is.
The resume steps are a slightly different story as there is some subtle
behavior which the spec is trying to convey where some custom logic may
need to be done when one execution context changes from one to another.
It may be worth implementing those steps at a later point in time so
that this behavior is a bit easier to follow in those cases.
To make the situation more confusing - from what I can gather from the
spec, not all cases that the spec mentions resume actually means
anything normative. Resume is only _actually_ needed in a limited set
of locations.
For now, let's just remove the unneeded FIXMEs that indicate that there
is something to be done for the suspension steps, as there is not, and
leave the resume steps as is.
Previously it was only pushing the module context for the call to
capture the module execution context. This is incorrect, as the capture
occurs upon function construction. This resulted in it capturing the
execution context that execute_module was called from, instead of the
newly created module_context.
f87041bf3a/Libraries/LibJS/Runtime/ECMAScriptFunctionObject.cpp (L92)
This can be demonstrated with the following setup:
index.html:
```html
<script>
var foo = 1;
</script>
<script type="module">
import {test} from "./scriptA.mjs";
</script>
```
scriptA.mjs:
```js
function foo() {
return {a: "b"};
}
export let test = await foo();
```
Before this fix, this would throw:
```
[TypeError] 1 is not a function (evaluated from 'foo')
at module code with top-level await
at module code with top-level await
at <unknown>
at <unknown>
```
Fixes#2245.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root
Now that the heap has no knowledge about a JavaScript realm and is
purely for managing the memory of the heap, it does not make sense
to name this function to say that it is a non-realm variant.