Keep the legacy regexp static properties backed by PrimitiveString
values instead of eagerly copied Utf16Strings. lastMatch, leftContext,
rightContext, and $1-$9 now materialize lazy Substrings from the
original match input when accessed.
Keep RegExp.input as a separate slot from the match source so manual
writes do not rewrite the last match state. Add coverage for that
behavior and for rope-backed UTF-16 inputs.
Switch LibJS `RegExp` over to the Rust-backed `ECMAScriptRegex` APIs.
Route `new RegExp()`, regex literals, and the RegExp builtins through
the new compile and exec APIs, and stop re-validating patterns with the
deleted C++ parser on the way in. Preserve the observable error
behavior by carrying structured compile errors and backtracking-limit
failures across the FFI boundary. Cache compiled regex state and named
capture metadata on `RegExpObject` in the new representation.
Use the new API surface to simplify and speed up the builtin paths too:
share `exec_internal`, cache compiled regex pointers, keep the legacy
RegExp statics lazy, run global replace through batch `find_all`, and
optimize replace, test, split, and String helper paths. Add regression
tests for those JavaScript-visible paths.
Delete Lexer.cpp/h and Token.cpp, replacing all tokenization with a
new rust_tokenize() FFI function that calls back for each token.
Rewrite SyntaxHighlighter.cpp and js.cpp REPL to use the Rust
tokenizer. The token type and category enums in Token.h now mirror
the Rust definitions in token.rs.
Move is_syntax_character/is_whitespace/is_line_terminator helpers
into RegExpConstructor.cpp as static functions, since they were only
used there.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root
- Default values should depend on arguments being undefined, not being
missing
- "(?:)" for empty pattern happens in RegExp.prototype.source, not the
constructor
Roughly 7% of test-js runtime was spent creating FlyStrings from string
literals. This patch frontloads that work and caches all the commonly
used names in LibJS on a CommonPropertyNames struct that hangs off VM.
More work on decoupling the general runtime from Interpreter. The goal
is becoming clearer. Interpreter should be one possible way to execute
code inside a VM. In the future we might have other ways :^)
To make sure that everything is set up correctly in objects before we
start adding properties to them, we split cell allocation into 3 steps:
1. Allocate a cell of appropriate size from the Heap
2. Call the C++ constructor on the cell
3. Call initialize() on the constructed object
The job of initialize() is to define all the initial properties.
Doing it in a second pass guarantees that the Object has a valid Shape
and can find its own GlobalObject.
Objects should get the GlobalObject from themselves instead. However,
it's not yet available during construction so this only switches code
that happens after construction.
To support multiple global objects, Interpreter needs to stop holding
on to "the" global object and let each object graph own their global.
This adds regex parsing/lexing, as well as a relatively empty
RegExpObject. The purpose of this patch is to allow the engine to not
get hung up on parsing regexes. This will aid in finding new syntax
errors (say, from google or twitter) without having to replace all of
their regexes first!