Implement a complete Rust reimplementation of the LibJS frontend:
lexer, parser, AST, scope collector, and bytecode code generator.
The Rust pipeline is built via Corrosion (CMake-Cargo bridge) and
linked into LibJS as a static library. It is gated behind a build
flag (ENABLE_RUST, on by default except on Windows) and two runtime
environment variables:
- LIBJS_CPP: Use the C++ pipeline instead of Rust
- LIBJS_COMPARE_PIPELINES=1: Run both pipelines in lockstep,
aborting on any difference in AST or bytecode generated.
The C++ side communicates with Rust through a C FFI layer
(RustIntegration.cpp/h) that passes source text to Rust and receives
a populated Executable back via a BytecodeFactory interface.
Add static factory methods create_for_function_node() on
SharedFunctionInstanceData and update all callers to use them instead
of FunctionNode::ensure_shared_data().
This removes the GC::Root<SharedFunctionInstanceData> cache from
FunctionNode, eliminating the coupling between the RefCounted AST
and GC-managed runtime objects. The cache was effectively dead code:
hoisted declarations use m_functions_to_initialize directly, and
function expressions always create fresh instances during codegen.
parse_builtin_file() previously returned FunctionDeclaration AST nodes
stored in static vectors, keeping the full AST alive for the entire
process lifetime. Change it to return SharedFunctionInstanceData
objects directly, allowing the parsed Program and its AST nodes to be
freed when the function returns.
Each SharedFunctionInstanceData holds its own ref to the function body
AST via m_ecmascript_code, which is automatically dropped when
clear_compile_inputs() runs after first bytecode compilation.
This field is rarely accessed but we were creating it for every single
script function instantiated.
It's a little awkward but the same optimization can be found in other
engines, so it's nothing crazy.
This avoids creating roughly 80,000 objects on my x.com home feed.
This allows us to use the bytecode implementation of await, which
correctly suspends execution contexts and handles completion
injections.
This gains us 4 test262 tests around mutating Array.fromAsync's
iterable whilst it's suspended as well.
This is also one step towards removing spin_until, which the
non-bytecode implementation of await uses.
```
Duration:
-5.98s
Summary:
Diff Tests:
+4 ✅ -4 ❌
Diff Tests:
[...]/Array/fromAsync/asyncitems-array-add-to-singleton.js ❌ -> ✅
[...]/Array/fromAsync/asyncitems-array-add.js ❌ -> ✅
[...]/Array/fromAsync/asyncitems-array-mutate.js ❌ -> ✅
[...]/Array/fromAsync/asyncitems-array-remove.js ❌ -> ✅
```
This has quite a lot of fall out. But the majority of it is just type or
UDL substitution, where the changes just fall through to other function
calls.
By changing property key storage to UTF-16, the main affected areas are:
* NativeFunction names must now be UTF-16
* Bytecode identifiers must now be UTF-16
* Module/binding names must now be UTF-16
...when Array.prototype and Object.prototype are intact.
If `internal_set()` is called on an array exotic object with a numeric
PropertyKey, and:
- the prototype chain has not been modified (i.e., there are no getters
or setters for indexed properties), and
- the array is not the target of a Proxy object,
then we can directly store the value in the receiver's indexed
properties, without checking whether it already exists somewhere in the
prototype chain.
1.7x improvement on the following program:
```js
function f() {
let a = [];
let i = 0;
while (i < 10_000_000) {
a.push(i);
i++;
}
}
f();
```
- Avoids unnecessary conversions between StringOrSymbol and PropertyKey
on the hot path of property access.
- Simplifies the code by removing StringOrSymbol and using PropertyKey
directly. There was no reason to have a separate StringOrSymbol type
representing the same data as PropertyKey, just with the index key
stored as a string.
PropertyKey has been updated to use a tagged pointer instead of a
Variant, so it still occupies 8 bytes, same as StringOrSymbol.
12% improvement on JetStream/gcc-loops.cpp.js
12% improvement on MicroBench/object-assign.js
7% improvement on MicroBench/object-keys.js
~2% of the Speedometer 2.1 profile was just repeatedly performing the
shape transitions to add these two properties. We can avoid all that
work by caching a premade shape.
This avoids going through all the shape transitions when setting up the
most common form of ESFO.
This is extremely hot on Uber Eats, and this provides some relief.
In the very common case that no special constructor options are provided
for the Intl.Collator when calling localeCompare() on a string, we can
cache and reuse a default-constructed Intl.Collator, saving lots of time
and space.
This shaves a fair bit of load time off of https://wpt.fyi/ where they
use Array.prototype.sort() and localeCompare() to sort a big JSON thing.
Time spent in sort():
- Before: 1656 ms
- After: 135 ms
This also includes a stubbed Temporal.Duration.prototype.
Until we have re-implemented Temporal.PlainDate/ZonedDateTime, some of
Temporal.Duration.compare (and its invoked AOs) are left unimplemented.
Our Temporal implementation is woefully out of date. The spec has been
so vastly rewritten that it is unfortunately not practical to update our
implementation in-place. Even just removing Temporal objects that were
removed from the spec, or updating any of the simpler remaining objects,
has proven to be a mess in previous attempts.
So, this removes our Temporal implementation. AOs used by other specs
are left intact.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root
Now that the heap has no knowledge about a JavaScript realm and is
purely for managing the memory of the heap, it does not make sense
to name this function to say that it is a non-realm variant.
The main motivation behind this is to remove JS specifics of the Realm
from the implementation of the Heap.
As a side effect of this change, this is a bit nicer to read than the
previous approach, and in my opinion, also makes it a little more clear
that this method is specific to a JavaScript Realm.