Instead of trying to indirectly load 2x64 bits from *cc, load addresses
directly from their own contiguous allocation.
This allows a future optimisation where we defer loading addresses to
reduce memory port pressure.
This, along with moving the sources and destination out of the config
object, makes it so we don't have to double-deref to get to them on each
instruction, leading to a ~15% perf improvement on dispatch.
This commit adds a register allocator, with 8 available "register"
slots.
In testing with various random blobs, this moves anywhere from 30% to
74% of value accesses into predefined slots, and is about a ~20% perf
increase end-to-end.
To actually make this usable, a few structural changes were also made:
- we no longer do one instruction per interpret call
- trapping is an (unlikely) exit condition
- the label and frame stacks are replaced with linked lists with a huge
node cache size, as we only need to touch the last element and
push/pop is very frequent.
This is largely unused (only in wasm.cpp)
A future reimplementation can bring it back as a separate interpreter
class that embeds the current bytecode interpreter.
...instead of specially handling JS::Completion.
This makes it possible for LibWeb/LibJS to have full control over how
these things are made, stored, and visited (whenever).
Fixes an issue where we couldn't roundtrip a JS exception through Wasm.