The bytecode interpreter only needed the running execution context,
but still threaded a separate Interpreter object through both the C++
and asm entry points. Move that state and the bytecode execution
helpers onto VM instead, and teach the asm generator and slow paths to
use VM directly.
When an async function is resumed from a microtask and hits another
await with a non-thenable value (primitive or already-settled native
promise), and the microtask queue is empty, we can resolve the await
synchronously without suspending. No other microtask can observe the
difference in execution order, making this optimization safe.
This avoids the overhead of creating a GC::Function for the microtask
job, enqueuing/dequeuing from the microtask queue, and the execution
context push/pop that comes with it.
A new VM host hook, host_promise_job_queue_is_empty, is added so both
the standalone js binary and LibWeb can provide the appropriate check
for their respective job queue implementations.
Per spec, every `await` goes through PromiseResolve (which wraps the
value in a new Promise via NewPromiseCapability) and then
PerformPromiseThen (which creates PromiseReaction and JobCallback
objects). This results in 13-16 GC cell allocations per await.
Add a fast path that detects two common cases:
1. Primitive values: These can never have a "then" property, so we
can skip all promise wrapping and directly schedule the async
function's continuation as a microtask.
2. Already-settled native Promises: If the promise has no own
properties and its prototype is the intrinsic %Promise.prototype%,
we can extract the result directly and schedule continuation.
For these cases, we bypass promise_resolve(), new_promise_capability(),
create_resolving_functions(), perform_then(), PromiseReaction creation,
and JobCallback creation -- replacing ~13 GC allocations with 1
(the GC::Function for the microtask job).
The fulfilled and rejected closures in AsyncFunctionDriverWrapper::await
were identical in structure, differing only in whether they resumed the
async context with a NormalCompletion or ThrowCompletion.
Since the callback has access to the current promise, we can check its
state at reaction time to determine which completion to use. This allows
us to allocate a single GC-tracked NativeFunction (m_on_settled) and
register it for both the onFulfilled and onRejected slots in
PerformPromiseThen, halving the number of GC allocations on this path.
This reverts commit c14173f651. We
should only annotate the minimum number of symbols that external
consumers actually use, so I am starting from scratch to do that
Saves us two NativeFunction allocations per each `await`.
With this change, following function goes 80% faster on my computer:
```js
(async () => {
const resolved = Promise.resolve();
for (let i = 0; i < 5_000_000; i++) {
await resolved;
}
})();
```
Instead of adding a flag for the two callers that need a pop of the
execution context stack when invoking continue_async_execution inline
the pop of the execution context.
This makes the management of these stacks and surrounding VERIFY calls
much more obvious.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root
Async functions whose promise is never resolved were leaking, since they
had a strong root JS::Handle on themselves.
This doesn't appear to actually be necessary, since the wrapper will be
kept alive as long as it's reachable (and if it's not reachable, nobody
is going to resolve/reject the promise either).
This fixes the vast majority of leaks on Speedometer, bringing memory
usage at the end of a full run from ~12 GiB to ~3 GiB.