We specialize `Optional<T>` for value types that inherently support some
kind of "empty" value or whose value range allow for a unlikely to be
useful sentinel value that can mean "empty", instead of the boolean flag
a regular Optional<T> needs to store. Because of padding, this often
means saving 4 to 8 bytes per instance.
By extending the new `SentinelOptional<T, Traits>`, these
specializations are significantly simplified to just having to define
what the sentinel value is, and how to identify a sentinel value.
While we're in the bytecode compiler, we want to know which type of
Operand we're dealing with, but once we've generated the bytecode
stream, we only ever need its index.
This patch simplifies Operand by removing the aarch64 bitfield hacks
and makes it 32-bit on all platforms. We keep 3 type bits in the high
bits of the index while compiling, and then zero them out when
flattening the final bytecode stream.
This makes bytecode more compact on x86_64, and avoids bit twiddling
on aarch64. Everyone wins something!
When stringifying bytecode for debugging output, we now have an API in
Executable that can look at a raw operand index and tell you what type
of operand it was, based on known quantities of each type in the stack
frame.
This reverts commit c14173f651. We
should only annotate the minimum number of symbols that external
consumers actually use, so I am starting from scratch to do that
This is a slightly mystified attempt to recover the performance
regression seen on our JS benchmark runner after 3c2a2bb39f
and c7bba505ea.
With this change, c7bba505ea is effectively reverted from the
perspective of x86_64.
It seems both aarch64 and x86_64 are extremely sensitive to the use of
bitfields here. Unfortunately, aarch64 gains a huge speedup from them
while x86_64 sees a very noticeable slowdown.
Since we're talking about 5%+ swings in both directions here, let's go
for the best of both worlds and use ifdefs in the Operand memory layout.
This packs the bytecode much better and gives us a decent performance
boost on throughput-focused benchmarks.
Measured on my M3 MacBook Pro:
- 4.7% speedup on Kraken
- 2.3% speedup on Octane
- 2.7% speedup on JetStream1