SIMD loads/stores checked bit 0x20 of the align immediate to detect a
following memory index, unlike scalar mem ops which use 0x40 per the
multi-memory encoding. This caused the memidx byte to be misparsed as
the next immediate (e.g. offset).
Update both SIMD sites (v128 load/store and lane variants) to check and
clear 0x40, then read LEB128<u32> memidx.
Repro:
(module (memory $m0 1) (memory $m1 1)
(func (export "go")
i32.const 0
v128.load (memory 1)
drop))
Before: printed memidx 0 with offset 1.
After: prints memidx 1 with offset 0.
This, along with moving the sources and destination out of the config
object, makes it so we don't have to double-deref to get to them on each
instruction, leading to a ~15% perf improvement on dispatch.
By definition, the web allows lonely surrogates by default. Let's have
our string APIs reflect this, so we don't have to pass an allow option
all over the place.