Route the obvious substring-producing string operations through the
new PrimitiveString substring factory. Direct indexing, at(), charAt(),
slice(), substring(), substr(), and the plain-string split path can now
return lazy JS::Substring values backed by the original string.
Add runtime coverage for rope-backed string operations so these lazy
string slices stay exercised across both ASCII and UTF-16 inputs.
Introduce JS::Substring as a lazily materialized PrimitiveString
variant that stores an originating string plus a UTF-16 offset and
length. This makes substring creation cheap while still reifying to
a normal string when character data is requested.
Track which short strings actually live in the VM caches so lazily
resolved ropes and substrings do not evict unrelated cached strings
when they are finalized. Add focused unit tests for nested ranges,
rope-backed substrings, surrogate boundaries, and cache behavior.
Add a clang plugin check that flags GC::Cell subclasses (and their
base classes within the Cell hierarchy) that have destructors with
non-trivial bodies. Such logic should use Cell::finalize() instead.
Add GC_ALLOW_CELL_DESTRUCTOR annotation macro for opting out in
exceptional cases (currently only JS::Object).
This prevents us from accidentally adding code in destructors that
runs after something we're pointing to may have been destroyed.
(This could become a problem when the garbage collector sweeps
objects in an unfortunate order.)
This new check uncovered a handful of bugs which are then also fixed
in this commit. :^)
We are often forced to convert numbers to strings inside LibJS, e.g when
iterating over the property names of an array, but it's also just a very
common operation in general.
This patch adds a 1000-entry string cache for the numbers 0-999 since
those appear to be by far the most common ones we convert.
This reverts commit c14173f651. We
should only annotate the minimum number of symbols that external
consumers actually use, so I am starting from scratch to do that
I was investigating an optimization in this area, and while it
didn't seem to have a noticable improvement, it still seems
useful to apply this change.
If we have two PrimitiveString objects that are both backed by UTF-16
data, we don't have to convert them to UTF-8 for equality checking.
Just compare the underlying UTF-16 data. :^)
By moving the LHS and RHS pointers used by rope strings into a
RopeString subclass, we shrink PrimitiveString by 16 bytes. Most strings
are not rope strings, so this ends up saving quite a bit of memory.
PrimitiveString is now internally either UTF-8, UTF-16, or both.
We no longer convert them to/from ByteString anywhere, nor does VM have
a ByteString cache.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root
More work on decoupling the general runtime from Interpreter. The goal
is becoming clearer. Interpreter should be one possible way to execute
code inside a VM. In the future we might have other ways :^)