Instead of repeatedly removing elements off the vector, this allows for
specifying all the removed indices at once, and does not perform any
extra reallocations or unnecessary moves.
Copy parse() method from LibCore::DateTime::parse(). Augment the method
to handle parsing from GMT time. Fix incorrect handling of year in '%D'
format specifier. Remove all format specifiers related to time zones.
Copy relevant tests and add additional ones.
This release comes with a fix for a bug where certain unicode emoji
characters encoded in UTF-16 were mistakenly parsed as integers. This
manifested in keys of an JS object being coerced into integers, i.e.
`{ "⤵️": 42 }` would become `{ "5": 42 }`.
Relevant upstream PR: https://github.com/fastfloat/fast_float/pull/325
We added this for String some time ago, so let's give Utf16String the
same optimization. Note that Utf16String was already handling its data
member potentially being null as of 5af99f4dd0.
For the web, we allow a wobbly UTF-16 encoding (i.e. lonely surrogates
are permitted). Only in a few exceptional cases do we strictly require
valid UTF-16. As such, our `validate(AllowLonelySurrogates::Yes)` calls
will always succeed. It's a wasted effort to ever make such a check.
This patch eliminates such invocations. The validation methods will now
only check for strict UTF-16, and are only invoked when needed.
We now define GenericLexer as a template to allow using it with UTF-16
strings. To keep existing users happy, the template is defined in the
Detail namespace. Then AK::GenericLexer is an alias for a char-based
view, and AK::Utf16GenericLexer is an alias for a char16-based view.
* Remove completely unused methods.
* Deduplicate methods that were overloaded with both StringView and
char const* parameters.
A future commit will templatize GenericLexer by char type. This patch
serves to make that a tiny bit easier.
Before now, you could compare a Utf16View to a StringView, but it would
only be valid if the StringView were ASCII. When porting code to UTF-16,
it will be handy to have a code point-aware implementation for non-ASCII
StringViews.
Originally I added this to use it in Utf16View::ends_with(), but the
final implementation ended up a lot simpler. I chose to keep this anyway
since it mirrors Span::starts_with().
There were a couple of issues here, including the following computation
could actually overflow to NumericLimits<size_t>::max():
code_unit_offset -= it.length_in_code_units();
And do the same for Utf8View::code_point_offset_of(). Some of these
`VERIFY`s of the view's length were introduced recently, but they caused
the parsing of named capture groups in RegexParser to crash in some
situations.
Instead, allow indexing at the view's length: the byte offset of code
point `length()` is known, even though that code point does not exist in
the view. Similarly, we know the code point offset at byte offset
`byte_length()`. Beyond those offsets, we still crash.
Fixes 13 failures in test262's `language/literals/regexp/named-groups`.
Utf16FlyString more or less works exactly the same as FlyString. It will
store the raw encoded data of the string instance. If the string is a
short ASCII string, Utf16FlyString holds the ShortString bytes; else,
Utf16FlyString holds a pointer to the Utf16StringData.
The underlying storage used during string formatting is StringBuilder.
To support UTF-16 strings, this patch allows callers to specify a mode
during StringBuilder construction. The default mode is UTF-8, for which
StringBuilder remains unchanged.
In UTF-16 mode, we treat the StringBuilder's internal ByteBuffer as a
series of u16 code units. Appending a single character will append 2
bytes for that character (cast to a char16_t). Appending a StringView
will transcode the string to UTF-16.
Utf16String also gains the same memory optimization that we added for
String, where we hand-off the underlying buffer to Utf16String to avoid
having to re-allocate.
In the future, we may want to further optimize for ASCII strings. For
example, we could defer committing to the u16-esque storage until we
see a non-ASCII code point.