We had typo'd using ClassSetReservedDoublePunctuator which was
resulting in a parse error for the regex:
([^\\:]+?)
With the 'v' flag set.
Co-Authored-By: Ali Mohammad Pur <mpfard@serenityos.org>
Our floating point number parser was based on the fast_float library:
https://github.com/fastfloat/fast_float
However, our implementation only supports 8-bit characters. To support
UTF-16, we will need to be able to convert char16_t-based strings to
numbers as well. This works out-of-the-box with fast_float.
We can also use fast_float for integer parsing.
By definition, the web allows lonely surrogates by default. Let's have
our string APIs reflect this, so we don't have to pass an allow option
all over the place.
To prepare for an upcoming Utf16String, this migrates Utf16View to store
its data as a char16_t. Most function definitions are moved inline and
made constexpr.
This also adds a UDL to construct a Utf16View from a string literal:
auto string = u"hello"sv;
This let's us remove the NTTP Utf16View constructor, as we have found
that such constructors bloat binary size quite a bit.
The opcode may have last been accessed by
block_satisfies_atomic_rewrite_precondition, which would set it to a
state that no longer exists.
Set the state to the correct one unconditionally to ensure we're looking
at the right value.
Fixes#5145.
This also moves the next_serial class static into a file scope static.
The public class static was causing visibility issues with certain Linux
builds when hidden visibility was enabled. However, the current design
makes more sense anyway :^).
Clang's `x86_64-pc-windows-msvc` target requires
`[[msvc::no_unique_address]]`, which is properly set in the
`NO_UNIQUE_ADDRESS` macro in `AK/Platform.h`. Without this, building
on Windows fails due to `-Wunknown-attributes`.
Fixes a bunch of websites breaking because we now verify jump offsets by
trying to remove 0-offset jumps.
This has been broken for a good while, it was just rare to see Repeat
inside alternatives that lended themselves well to tree alts.
Previously we were counting the total number of *nodes* in the tree for
the chain cost, which greatly underestimated its cost when large
bytecode entries were present,
This commit switches to estimating it using the total bytecode *size*,
which is a closer value to the true cost than the tree node count.
This corresponds to a ~4x perf improvement on /<script|<style|<link/ in
speedometer.
For the slight cost of counting code points when converting between
encodings and a teeny bit of memory, this commit adds a fast path for
all-happy utf-16 substrings and code point operations.
This seems to be a significant chunk of time spent in many regex
benchmarks.
We already had a really nice hash that had a single issue, this commit
fixes that and makes it *the* hash for the hash table, so we avoid
double-hashing and making a long chain.
This is an easy 10% perf gain.
By the time we're executing bytecode, we know the the bytecode will be
flattened. This means we can use ReadonlySpan to look into it instead of
DisjointChunks::spans(), which allocates.
This removes another Match member that required destruction. The "API"
for accessing the strings is definitely a bit awkward. We'll think of
something nicer eventually.
Before, If the cache was empty we would try and evict non-existant
entries and crash. So the fix is to make sure that we don't saturate
the cache with a single parse result.
This mode made a lot of incorrect assumptions about string lifetimes,
and instead of fixing it, let's just remove it and tweak the few unit
tests that used it.