Replace the 24-byte Vector<Value> m_storage with an 8-byte raw
Value* m_named_properties pointer, backed by a malloc'd allocation
with an inline capacity header.
Memory layout of the allocation:
[u32 capacity] [u32 padding] [Value 0] [Value 1] ...
m_named_properties points to Value 0.
This shrinks JS::Object from 64 to 48 bytes (on non-Windows
platforms) and removes one level of indirection for property access
in the asm interpreter, since the data pointer is now stored directly
on the object rather than inside a Vector's internal metadata.
Growth policy: max(4, max(needed, old_capacity * 2)).
Property lookup cache entries previously used GC::Weak<T> for shape,
prototype, and prototype_chain_validity pointers. Each GC::Weak
requires a ref-counted WeakImpl allocation and an extra indirection
on every access.
Replace these with GC::RawPtr<T> and make Executable a WeakContainer
so the GC can clear stale pointers during sweep via remove_dead_cells.
For static PropertyLookupCache instances (used throughout the runtime
for well-known property lookups), introduce StaticPropertyLookupCache
which registers itself in a global list that also gets swept.
Now that inline cache entries use GC::RawPtr instead of GC::Weak,
we can compare shape/prototype pointers directly without going
through the WeakImpl indirection. This removes one dependent load
from each IC check in GetById, PutById, GetLength, GetGlobal, and
SetGlobal handlers.
Replace individual bool bitfields in Object (m_is_extensible,
m_has_parameter_map, m_has_magical_length_property, etc.) with a
single u8 m_flags field and Flag:: constants.
This consolidates 8 scattered bitfields into one byte with explicit
bit positions, making them easy to access from generated assembly
code at a known offset. It also converts the virtual is_function()
and is_ecmascript_function_object() methods to flag-based checks,
avoiding virtual dispatch for these hot queries.
ProxyObject now explicitly clears the IsFunction flag in its
constructor when wrapping a non-callable target, instead of relying
on a virtual is_function() override.
Replace 20 separate Put instructions (5 PutKinds x 4 forms) with
4 unified instructions (PutById, PutByIdWithThis, PutByValue,
PutByValueWithThis), each carrying a PutKind field at runtime instead
of being a separate opcode.
This reduces the number of handler entry points in the dispatch loop
and eliminates template instantiations of put_by_property_key and
put_by_value that were being duplicated 5x each when inlined by LTO.
Intermediate classes in the initialize() chain set up prototypes and
define properties. Forgetting to call Base::initialize() in any
override would silently skip that setup.
As JS::Value is marked "trivial" without actually being trivial, make
the one user that would lead to garbage JS::Value entries provide a
default value instead.
Previously, deleting a property from a cacheable dictionary would
transition it to an uncacheable dictionary, defeating inline caching
for all subsequent property accesses on that object.
This was particularly impactful for the global object, which becomes a
dictionary due to its large number of properties. Test frameworks that
delete global properties (like test-js) would cause the global object
to become uncacheable, preventing global variable caching from working.
The fix is simple: instead of transitioning to uncacheable, we continue
using the existing remove_property_without_transition() which already
remaps all property offsets greater than the deleted offset. Combined
with the dictionary_generation counter (which is already incremented),
this ensures caches are properly invalidated while keeping the shape
cacheable for future accesses.
This adds visit_edges(Cell::Visitor&) methods to various helper structs
that contain GC pointers, and makes sure they are called from owning
GC-heap-allocated objects as needed.
These were found by our Clang plugin after expanding its capabilities.
The added rules will be enforced by CI going forward.
Instead of storing a list of builtin function objects with the realm,
just move the builtin field from NativeFunction up to FunctionObject.
Now you can ask any FunctionObject for its builtin(), and we no longer
need the get_builtin_value() API.
Fixes 10 test262 tests that were querying the realm builtins at a
bad time.
Regressed in 54b755126c.
This allows us to use the bytecode implementation of await, which
correctly suspends execution contexts and handles completion
injections.
This gains us 4 test262 tests around mutating Array.fromAsync's
iterable whilst it's suspended as well.
This is also one step towards removing spin_until, which the
non-bytecode implementation of await uses.
```
Duration:
-5.98s
Summary:
Diff Tests:
+4 ✅ -4 ❌
Diff Tests:
[...]/Array/fromAsync/asyncitems-array-add-to-singleton.js ❌ -> ✅
[...]/Array/fromAsync/asyncitems-array-add.js ❌ -> ✅
[...]/Array/fromAsync/asyncitems-array-mutate.js ❌ -> ✅
[...]/Array/fromAsync/asyncitems-array-remove.js ❌ -> ✅
```
We can use caching in a million more places. This is just me running JS
benchmarks and looking at which get() call sites were hot and putting
caches there.
Lots of nice speedups all over the place, some examples:
1.19x speedup on Octane/raytrace.js
1.13x speedup on Octane/earley-boyer.js
1.12x speedup on Kraken/ai-astar.js
1.10x speedup on Octane/box2d.js
1.08x speedup on Octane/gbemu.js
1.05x speedup on Octane/regexp.js
To speed up property access, callers of get() can now provide a lookup
cache like so:
static Bytecode::PropertyLookupCache cache;
auto value = TRY(object.get(property, cache));
Note that the cache has to be `static` or it won't make sense!
This basically brings the inline caches from our bytecode VM straight
into C++ land, allowing us to gain serious performance improvements.
The implementation shares code with the GetById bytecode instruction.
The [[Set]] operation on objects will normally end up doing
[[GetOwnProperty]] twice, but if the object and the receiver are one on
the same, we can avoid the double work.
AFAIK this is not observable via proxies or other mechanisms.
1.10x speed-up on MicroBench/object-set-with-rope-strings.js
We are often forced to convert numbers to strings inside LibJS, e.g when
iterating over the property names of an array, but it's also just a very
common operation in general.
This patch adds a 1000-entry string cache for the numbers 0-999 since
those appear to be by far the most common ones we convert.
Before this change, PropertyNameIterator (used by for..in) and
`Object::enumerable_own_property_names()` (used by `Object.keys()`,
`Object.values()`, and `Object.entries()`) enumerated an object's own
enumerable properties exactly as the spec prescribes:
- Call `internal_own_property_keys()`, allocating a list of JS::Value
keys.
- For each key, call internal_get_own_property() to obtain a
descriptor and check `[[Enumerable]]`.
While that is required in the general case (e.g. for Proxy objects or
platform/exotic objects that override `[[OwnPropertyKeys]]`), it's
overkill for ordinary JS objects that store their own properties in the
shape table and indexed-properties storage.
This change introduces `for_each_own_property_with_enumerability()`,
which, for objects where
`eligible_for_own_property_enumeration_fast_path()` is `true`, lets us
read the enumerability directly from shape metadata (and from
indexed-properties storage) without a per-property descriptor lookup.
When we cannot avoid `internal_get_own_property()`, we still
benefit by skipping the temporary `Vector<Value>` of keys and avoiding
the unnecessary round-trip between PropertyKey and Value.
We already had IC support in PutById for the following cases:
- Changing an existing own property
- Calling a setter located in the prototype chain
This was enough to speed up code where structurally identical objects
(same shape) are processed in a loop:
```js
const arr = [{ a: 1 }, { a: 2 }, { a: 3 }];
for (let obj of arr) {
obj.a += 1;
}
```
However, creating structurally identical objects in a loop was still
slow:
```js
for (let i = 0; i < 10_000_000; i++) {
const o = {};
o.a = 1;
o.b = 2;
o.c = 3;
}
```
This change addresses that by adding a new IC type that caches both the
source and target shapes, allowing property additions to be fast-pathed
by directly jumping to the shape that already includes the new property.
This has quite a lot of fall out. But the majority of it is just type or
UDL substitution, where the changes just fall through to other function
calls.
By changing property key storage to UTF-16, the main affected areas are:
* NativeFunction names must now be UTF-16
* Bytecode identifiers must now be UTF-16
* Module/binding names must now be UTF-16
Before this change each built-in iterator object has a boolean
`m_next_method_was_redefined`. If user code later changed the iterator’s
prototype (e.g. `Object.setPrototypeOf()`), we still believed the
built-in fast-path was safe and skipped the user supplied override,
producing wrong results.
With this change
`BuiltinIterator::as_builtin_iterator_if_next_is_not_redefined()` looks
up the current `next` property and verifies that it is still the
built-in native function.
- Avoids unnecessary conversions between StringOrSymbol and PropertyKey
on the hot path of property access.
- Simplifies the code by removing StringOrSymbol and using PropertyKey
directly. There was no reason to have a separate StringOrSymbol type
representing the same data as PropertyKey, just with the index key
stored as a string.
PropertyKey has been updated to use a tagged pointer instead of a
Variant, so it still occupies 8 bytes, same as StringOrSymbol.
12% improvement on JetStream/gcc-loops.cpp.js
12% improvement on MicroBench/object-assign.js
7% improvement on MicroBench/object-keys.js
By doing that we avoid lots of `PropertyKey` -> `Value` -> `PropertyKey`
transforms, which are quite expensive because of underlying
`FlyString` -> `PrimitiveString` -> `FlyString` conversions.
10% improvement on MicroBench/object-keys.js
81b6a11 regressed correctness by always bypassing the `next()` method
resolution for built-in iterators, causing incorrect behavior when
`next()` was redefined on built-in prototypes. This change fixes the
issue by storing a flag on built-in prototypes indicating whether
`next()` has ever been redefined.
This is *extremely* common on the web, but barely shows up at all in
JavaScript benchmarks.
A typical example is setting Element.innerHTML on a HTMLDivElement.
HTMLDivElement doesn't have innerHTML, so it has to travel up the
prototype chain until it finds it.
Before this change, we didn't cache this at all, so we had to travel
the prototype chain every time a setter like this was used.
We now use the same mechanism we already had for GetBydId and cache
PutById setter accesses in the prototype chain as well.
1.74x speedup on MicroBench/setter-in-prototype-chain.js
We don't need the [[Fields]] and [[PrivateMethods]] slots for most ESFO
instances, so let's reduce their impact on class size!
This shrinks ESFO from 200 bytes to 160 bytes, allowing more allocations
before we have to collect garbage.
The special empty value (that we use for array holes, Optional<Value>
when empty and a few other other placeholder/sentinel tasks) still
exists, but you now create one via JS::js_special_empty_value() and
check for it with Value::is_special_empty_value().
The main idea here is to make it very unlikely to accidentally create an
unexpected special empty value.
If a class field initializer is just a simple literal, we can skip
creating (and calling) a wrapper function for it entirely.
1.44x speedup on JetStream3/raytrace-private-class-fields.js
1.53x speedup on JetStream3/raytrace-public-class-fields.js
This avoids going through all the shape transitions when setting up the
most common form of ESFO.
This is extremely hot on Uber Eats, and this provides some relief.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root
Now that the heap has no knowledge about a JavaScript realm and is
purely for managing the memory of the heap, it does not make sense
to name this function to say that it is a non-realm variant.
The main motivation behind this is to remove JS specifics of the Realm
from the implementation of the Heap.
As a side effect of this change, this is a bit nicer to read than the
previous approach, and in my opinion, also makes it a little more clear
that this method is specific to a JavaScript Realm.
When constructing a GlobalObject, it has to pass itself as the global
object to its own Shape. Since this is done in the Object constructor,
and Object is a base class of GlobalObject, it's not yet valid to cast
"this" to a GlobalObject*.
Fix this by having Shape store the global object as an Object& and move
Shape::global_object() to GlobalObject.h where we can at least perform a
valid static_cast in the getter.
Found by oss-fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29267
We were building up a vector with all the values in an object's indexed
property storage, and then iterating over the vector to mark values.
Instead of this, simply iterate over the property storage directly. :^)
This changes the remaining uses of the following functions across LibJS:
- String::format() => String::formatted()
- dbg() => dbgln()
- printf() => out(), outln()
- fprintf() => warnln()
I also removed the relevant 'LogStream& operator<<' overloads as they're
not needed anymore.
Passing in a plain Value and expecting it to be a native property is
error prone, let's use a more narrow type and pass a NativeProperty
reference directly.
This prevents stack overflows when calling infinite/deep recursive
functions, e.g.:
const f = () => f(); f();
JSON.stringify({}, () => ({ foo: "bar" }));
new Proxy({}, { get: (_, __, p) => p.foo }).foo;
The VM caches a StackInfo object to not slow down function calls
considerably. VM::push_call_frame() will throw an exception if
necessary (plain Error with "RuntimeError" as its .name).