Previously, when direct eval() was called, we would mark the entire
environment chain as "permanently screwed by eval", disabling variable
access caching all the way up to the global scope.
This was overly conservative. According to the ECMAScript specification,
a sloppy direct eval() can only inject var declarations into its
containing function's variable environment - it cannot inject variables
into parent function scopes.
This patch makes two changes:
1. Stop propagating the "screwed by eval" flag at function boundaries.
When set_permanently_screwed_by_eval() hits a FunctionEnvironment or
GlobalEnvironment, it no longer continues to outer environments.
2. Check each environment during cache lookup traversal. If any
environment in the path is marked as screwed, we bail to the slow
path. This catches the case where we're inside a function with eval
and have a cached coordinate pointing to an outer scope.
The second change is necessary because eval can create local variables
that shadow outer bindings. When looking up a variable from inside a
function that called eval, we can't trust cached coordinates that point
to outer scopes, since eval may have created a closer binding.
This improves performance for code with nested functions where an inner
function uses eval but parent functions perform many variable accesses.
The parent functions can now use cached environment coordinates.
All 29 new tests verify behavior matches V8.
Every function call allocates an ExecutionContext with a trailing array
of Values for registers, locals, constants, and arguments. Previously,
the constructor would initialize all slots to js_special_empty_value(),
but constant slots were then immediately overwritten by the interpreter
copying in values from the Executable before execution began.
To eliminate this redundant initialization, we rearrange the layout from
[registers | constants | locals] to [registers | locals | constants].
This groups registers and locals together at the front, allowing us to
initialize only those slots while leaving constant slots uninitialized
until they're populated with their actual values.
This reduces the per-call initialization cost from O(registers + locals
+ constants) to O(registers + locals).
Also tightens up the types involved (size_t -> u32) and adds VERIFYs to
guard against overflow when computing the combined slot counts, and to
ensure the total fits within the 29-bit operand index field.
When a function creates object literals with simple property names,
we now cache the resulting shape after the first instantiation. On
subsequent calls, we create the object with the cached shape directly
and write property values at their known offsets.
This avoids repeated shape transitions and property offset lookups
for a common JavaScript pattern.
The optimization uses two new bytecode instructions:
- CacheObjectShape: Captures the final shape after object construction
- InitObjectLiteralProperty: Writes properties using cached offsets
Only "simple" object literals are optimized (string literal keys with
simple value expressions). Complex cases like computed properties,
getters/setters, and spread elements use the existing slow path.
3.4x speedup on a microbenchmark that repeatedly instantiates an object
literal with 26 properties. Small progressions on various benchmarks.
We know the length they're gonna end up with up front since we're
instantiating array literals. Pre-sizing them allows us to skip
incremental resizing of the property storage.
This adds visit_edges(Cell::Visitor&) methods to various helper structs
that contain GC pointers, and makes sure they are called from owning
GC-heap-allocated objects as needed.
These were found by our Clang plugin after expanding its capabilities.
The added rules will be enforced by CI going forward.
This resolves a FIXME in its code generation, particularly for:
- Caching the template object
- Setting the correct property attributes
- Freezing the resulting objects
This allows archive.org to load, which uses the Lit library.
The Lit library caches these template objects to determine if a
template has changed, allowing it to determine to do a full template
rerender or only partially update the rendering. Before, we would
always cause a full rerender on update because we didn't return the
same template object.
This caused issues with archive.org's code, I believe particularly with
its router library, where we would constantly detach and reattach nodes
unexpectedly, ending up with the page content not being attached to the
router's custom element.
Instead of storing a list of builtin function objects with the realm,
just move the builtin field from NativeFunction up to FunctionObject.
Now you can ask any FunctionObject for its builtin(), and we no longer
need the get_builtin_value() API.
Fixes 10 test262 tests that were querying the realm builtins at a
bad time.
Regressed in 54b755126c.
Reorder members and use u32 instead of Optional<u32> for things that
didn't actually need the "empty" state other than for assertions.
Reduces memory usage on my x.com home feed by 9.9 MiB.
For StringPrototype functions that defer to RegExpPrototype builtins,
we can skip the generic call stuff (eliding the execution context etc)
and just call the builtin directly.
1.03x speedup on Octane/regexp.js
These were helpful when PropertyKey instantiation happened in the
interpreter, but now that we've moved it to bytecode generation time,
we can use the basic Put*ById* instructions instead.
Instead of creating PropertyKeys on the fly during interpreter
execution, we now store fully-formed ones in the Executable.
This avoids a whole bunch of busywork in property access instructions
and substantially reduces code size bloat.
These instructions are not necessarily rarely used, but they are very
large in terms of code size. By putting them out of line we keep the hot
path of the interpreter smaller and tighter.
No need to check this at runtime, we have all the necessary info already
when generating bytecode.
Also mark the "yes, we are indeed calling the builtin" path [[likely]]
since it's exceedingly rare for anyone to replace the global functions.
This doesn't affect interpreter size directly, but let's inform the
compiler that we're not terribly worried about code using the `with`
statement in JS.
While we're in the bytecode compiler, we want to know which type of
Operand we're dealing with, but once we've generated the bytecode
stream, we only ever need its index.
This patch simplifies Operand by removing the aarch64 bitfield hacks
and makes it 32-bit on all platforms. We keep 3 type bits in the high
bits of the index while compiling, and then zero them out when
flattening the final bytecode stream.
This makes bytecode more compact on x86_64, and avoids bit twiddling
on aarch64. Everyone wins something!
When stringifying bytecode for debugging output, we now have an API in
Executable that can look at a raw operand index and tell you what type
of operand it was, based on known quantities of each type in the stack
frame.
In our process architecture, there's only ever one JS::VM per process.
This allows us to have a VM::the() singleton getter that optimizes
down to a single global access everywhere.
Seeing 1-2% speed-up on all JS benchmarks from this.
This allows us to use the bytecode implementation of await, which
correctly suspends execution contexts and handles completion
injections.
This gains us 4 test262 tests around mutating Array.fromAsync's
iterable whilst it's suspended as well.
This is also one step towards removing spin_until, which the
non-bytecode implementation of await uses.
```
Duration:
-5.98s
Summary:
Diff Tests:
+4 ✅ -4 ❌
Diff Tests:
[...]/Array/fromAsync/asyncitems-array-add-to-singleton.js ❌ -> ✅
[...]/Array/fromAsync/asyncitems-array-add.js ❌ -> ✅
[...]/Array/fromAsync/asyncitems-array-mutate.js ❌ -> ✅
[...]/Array/fromAsync/asyncitems-array-remove.js ❌ -> ✅
```