Store yield_continuation and yield_is_await directly in
ExecutionContext instead of allocating a GeneratorResult GC cell.
This removes a heap allocation per yield/await and fixes a latent
bug where continuation addresses stored as doubles could lose
precision.
Remove Bytecode::compile() and the old create() overloads on
ECMAScriptFunctionObject that accepted C++ AST nodes. These
have no remaining callers now that all compilation goes through
the Rust pipeline.
Also remove the if-constexpr Parse Node branch from
async_block_start, since the Statement template instantiation
was already removed.
Fix transitive include dependencies on Generator.h by adding
explicit includes for headers that were previously pulled in
transitively.
Remove four fields that are trivially derivable from other fields
already present in the ExecutionContext:
- global_object (from realm)
- global_declarative_environment (from realm)
- identifier_table (from executable)
- property_key_table (from executable)
This shrinks ExecutionContext from 192 to 160 bytes (-17%).
The asmint's GetGlobal/SetGlobal handlers now load through the realm
pointer, taking advantage of the cached declarative environment
pointer added in the previous commit.
Move Interpreter::get() and set() from the .cpp file into the header
as inline methods. Make handle_exception(), perform_call(),
perform_call_impl(), and the HandleExceptionResponse enum public so
they can be called by the upcoming assembly interpreter's C++ glue
code. Also add set_running_execution_context() for the same reason.
Instead of recursing through 5 native stack frames per JS function
call (execute_call -> internal_call -> ordinary_call_evaluate_body ->
run_executable -> run_bytecode), handle Call and CallConstruct for
normal ECMAScript functions directly in the dispatch loop.
The fast path allocates the callee's execution context on the
InterpreterStack, copies arguments, sets up the environment, and
jumps to the callee's bytecode entry point. Return and End unwind
inline frames by restoring the caller's state. Exception unwinding
walks through inline frames to find handlers.
The fast path code is kept in NEVER_INLINE helper functions
(try_inline_call, try_inline_call_construct, pop_inline_frame) to
minimize register pressure in the dispatch loop. handle_exception
takes program_counter by value to avoid forcing it onto the stack.
Reloading of bytecode/program_counter after frame switches is done
inline at each call site via RELOAD_AND_GOTO_START to preserve a
single dispatch entry point for optimal indirect branch prediction.
Remove CodeGenerationError and make all bytecode generation functions
return their results directly instead of wrapping them in
CodeGenerationErrorOr.
For the few remaining sites where codegen encounters an unimplemented
or unexpected AST node, we now use a new emit_todo() helper that emits
a NewTypeError + Throw sequence at compile time (preserving the runtime
behavior) and then switches to a dead basic block so subsequent codegen
for the same function can continue without issue.
This allows us to remove error handling from all callers of the
bytecode compiler, simplifying the code significantly.
LeaveUnwindContext popped the runtime unwind context stack. With the
stack being removed, all emission sites become dead code. Remove the
opcode and all its emissions.
EnterUnwindContext pushed an UnwindInfo and jumped to entry_point.
Without the stack push, it's just a Jump. Replace the single emission
site with a Jump and remove the opcode entirely.
Replace the saved_lexical_environments stack in ExecutionContextRareData
with explicit register-based environment tracking. Environments are now
stored in registers and restored via SetLexicalEnvironment, making the
environment flow visible in bytecode.
Key changes:
- Add GetLexicalEnvironment and SetLexicalEnvironment opcodes
- CreateLexicalEnvironment takes explicit parent and dst operands
- EnterObjectEnvironment stores new environment in a dst register
- NewClass takes an explicit class_environment operand
- Remove LeaveLexicalEnvironment opcode (instead: SetLexicalEnvironment)
- Remove saved_lexical_environments from ExecutionContextRareData
- Use a reserved register for the saved lexical environment to avoid
dominance issues with lazily-emitted GetLexicalEnvironment
Each finally scope gets two registers (completion_type and
completion_value) that form an explicit completion record. Every path
into the finally body sets these before jumping, and a dispatch chain
after the finally body routes to the correct continuation.
This replaces the old implicit protocol that relied on the exception
register, a saved_return_value register, and a scheduled_jump field
on ExecutionContext, allowing us to remove:
- 5 opcodes (ContinuePendingUnwind, ScheduleJump, LeaveFinally,
RestoreScheduledJump, PrepareYield)
- 1 reserved register (saved_return_value)
- 2 ExecutionContext fields (scheduled_jump, previously_scheduled_jumps)
Instead of creating PropertyKeys on the fly during interpreter
execution, we now store fully-formed ones in the Executable.
This avoids a whole bunch of busywork in property access instructions
and substantially reduces code size bloat.
In our process architecture, there's only ever one JS::VM per process.
This allows us to have a VM::the() singleton getter that optimizes
down to a single global access everywhere.
Seeing 1-2% speed-up on all JS benchmarks from this.
This hosts the ability to compile and run JavaScript to implement
native functions. This is particularly useful for any native function
that is not a normal function, for example async functions such as
Array.fromAsync, which require yielding.
These functions are not allowed to observe anything from outside their
environment. Any global identifiers will instead be assumed to be a
reference to an abstract operation or a constant. The generator will
inject the appropriate bytecode if the name of the global identifier
matches a known name. Anything else will cause a code generation error.
All the data we need for compilation is in SharedFunctionInstanceData,
so we shouldn't depend on ECMAScriptFunctionObject.
Allows NativeJavaScriptBackedFunction to compile bytecode.
Instead of always checking if we're about to return an empty completion
value in Interpreter::run_executable(), we now coerce empty completions
to the undefined value earlier instead.
This simplifies the most common path through run_executable(), giving us
a small speedup.
Instead of using this span, we can just use the getter that calculates
the base of the register/constant/local/argument array based on the
ExecutionContext's own address.
We don't need to return two values; running an executable only ever
produces a throw completion, or a normal completion, i.e a Value.
This necessitated a few minor changes, such as adding a way to check
if a JS::Cell is a GeneratorResult.
This simplifies function entry/exit and lets us just walk away from the
used ExecutionContext instead of resetting a bunch of its state when
returning control to the caller.
This gets rid of a lot of pointer chasing from interpreter to executable
to identifier table to the actual identifier.
1.05x speed-up on Kraken/ai-astar.js
This has quite a lot of fall out. But the majority of it is just type or
UDL substitution, where the changes just fall through to other function
calls.
By changing property key storage to UTF-16, the main affected areas are:
* NativeFunction names must now be UTF-16
* Bytecode identifiers must now be UTF-16
* Module/binding names must now be UTF-16
This reverts commit c14173f651. We
should only annotate the minimum number of symbols that external
consumers actually use, so I am starting from scratch to do that
This commit adds the minimal export macros needed to run js.exe on
windows. A followup commit is planned to move to explicit export
entirely.
A static_assert for the size of a struct is also ifdef'ed out as the
semantics around object layout and inheritance are different on MSVC abi
and the struct IteratorRecord ends up being 40 bytes not 32.
This reverts commit 36bb2824a6.
Although this was faster on my M3 MacBook Pro, other Apple machines
disagree, including our benchmark runner. So let's revert it.
This is a simple trick to generate better native code for access to
registers, locals, and constants. Before this change, each access had
to first dereference the member pointer in Interpreter, and then get to
the values. Now we always have a pointer directly to the values on hand.
Here's how it looks:
class StackFrame {
public:
Value get(Operand) const;
void set(Operand, Value);
private:
Value m_values[];
};
And we just place one of these as a window on top of the execution
context's array of values (registers, locals, and constants).
This way it's always automatically correct, and we don't have to
manually flush it in push_execution_context().
~7% speedup on the MicroBench/call* tests :^)
The special empty value (that we use for array holes, Optional<Value>
when empty and a few other other placeholder/sentinel tasks) still
exists, but you now create one via JS::js_special_empty_value() and
check for it with Value::is_special_empty_value().
The main idea here is to make it very unlikely to accidentally create an
unexpected special empty value.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root