Commit graph

597 commits

Author SHA1 Message Date
Ali Mohammad Pur
80e5356853 AK: Add Vector::remove_all(container)/remove_all(it, end)
Instead of repeatedly removing elements off the vector, this allows for
specifying all the removed indices at once, and does not perform any
extra reallocations or unnecessary moves.
2025-10-01 23:47:29 +02:00
Tomasz Strejczek
5cde267979 AK: Implement Formatter<UnixDateTime>
Implement StringBuilder's Formatter<UnixDateTime>. Add necessary tests.
2025-09-30 12:39:01 +02:00
Tomasz Strejczek
ea32e39d68 AK: Add UnixDateTime::parse() method
Copy parse() method from LibCore::DateTime::parse(). Augment the method
to handle parsing from GMT time. Fix incorrect handling of year in '%D'
format specifier. Remove all format specifiers related to time zones.
Copy relevant tests and add additional ones.
2025-09-30 12:39:01 +02:00
InvalidUsernameException
2dd1918b10 Meta+Tests: Update fast-float to version 8.1.0
This release comes with a fix for a bug where certain unicode emoji
characters encoded in UTF-16 were mistakenly parsed as integers. This
manifested in keys of an JS object being coerced into integers, i.e.
`{ "⤵️": 42 }` would become `{ "5": 42 }`.

Relevant upstream PR: https://github.com/fastfloat/fast_float/pull/325
2025-09-25 21:14:29 -04:00
me-it-is
b76f1fb011 AK: Fix is_within_range when converting from float
Within range now uses the max capacity of a type rather than its size.
This fixes some subtests in
https://wpt.fyi/results/wasm/core/conversions.wast.js.html?product=ladybird
2025-09-24 10:40:24 +01:00
Zaggy1024
84241ea761 Tests: Add a test for HashTable<NonTrivial>::clear_with_capacity() 2025-09-22 17:28:00 -05:00
Viktor Szépe
1c01e183b7 Everywhere: Fix even more typos 2025-08-27 08:48:01 +02:00
Timothy Flynn
1869399fd1 AK: Specialize Optional for Utf16String and Utf16FlyString
We added this for String some time ago, so let's give Utf16String the
same optimization. Note that Utf16String was already handling its data
member potentially being null as of 5af99f4dd0.
2025-08-19 06:24:09 -04:00
Glenn Skrzypczak
d25d62e74c AK/Time+LibWeb/HTML: Fix ISO8601 week conversions
This reimplements conversions between unix date times and ISO8601
weeks. The new algorithms also do not use loops, so they should be
faster.
2025-08-14 11:05:28 -04:00
Timothy Flynn
8472e469f4 AK+LibJS+LibWeb: Recognize that our UTF-16 string is actually WTF-16
For the web, we allow a wobbly UTF-16 encoding (i.e. lonely surrogates
are permitted). Only in a few exceptional cases do we strictly require
valid UTF-16. As such, our `validate(AllowLonelySurrogates::Yes)` calls
will always succeed. It's a wasted effort to ever make such a check.

This patch eliminates such invocations. The validation methods will now
only check for strict UTF-16, and are only invoked when needed.
2025-08-13 09:56:13 -04:00
Timothy Flynn
99d7e08dff AK: Templatize GenericLexer for UTF-16 strings
We now define GenericLexer as a template to allow using it with UTF-16
strings. To keep existing users happy, the template is defined in the
Detail namespace. Then AK::GenericLexer is an alias for a char-based
view, and AK::Utf16GenericLexer is an alias for a char16-based view.
2025-08-13 09:56:13 -04:00
Timothy Flynn
28d9d3a2c7 AK+Libraries: Reduce API surface of GenericLexer a bit
* Remove completely unused methods.
* Deduplicate methods that were overloaded with both StringView and
  char const* parameters.

A future commit will templatize GenericLexer by char type. This patch
serves to make that a tiny bit easier.
2025-08-13 09:56:13 -04:00
Callum Law
861bcbd9ad AK: Format floats with precision in scientific notation where applicable 2025-08-11 17:10:04 +01:00
Idan Horowitz
93692242b9 AK: Implement take_all_matching(predicate) API in HashMap 2025-08-08 13:09:58 -04:00
Idan Horowitz
5097e72174 AK: Implement take_all_matching(predicate) API in HashTable 2025-08-08 13:09:58 -04:00
Ali Mohammad Pur
cadc3f85a6 Tests: Run all Vector tests for FastLastAccess::Yes too 2025-08-08 12:54:06 +02:00
Ali Mohammad Pur
bf4c436ef3 AK: Add some higher-level operations to DoublyLinkedList<T>
This also adds a node cache as allocation/deallocation was showing up in
my profiles; disabled by default to keep the old behaviour.
2025-08-08 12:54:06 +02:00
Timothy Flynn
298ec6a12a AK: Ensure StringBuilder encodes U+10000 as 2 UTF-16 code units 2025-08-07 02:05:50 +02:00
Timothy Flynn
1b611fba67 AK: Ensure Utf16FlyString is hash-compatible with Utf16View/Utf16String 2025-08-07 02:05:50 +02:00
Timothy Flynn
bbda6d13f7 AK: Add a Utf16View method to retrieve an iterator at a code unit offset 2025-08-07 02:05:50 +02:00
Timothy Flynn
6d1f90c739 AK: Remove now-unused UTF-16 length from UTF-8 string helper 2025-08-05 15:13:36 +02:00
Timothy Flynn
2dc0a3b3ce AK: Add trim methods to Utf16String that skip allocation when not needed
If the string does not begin with any of the provided code units, we do
not need to create a new string.
2025-08-05 15:13:36 +02:00
Timothy Flynn
782f8c381c AK: Implement the spaceship operator for UTF-16 strings 2025-08-05 07:07:15 -04:00
Timothy Flynn
0bf565b97f AK: Allow comparing UTF-16 strings to UTF-8 strings
Before now, you could compare a Utf16View to a StringView, but it would
only be valid if the StringView were ASCII. When porting code to UTF-16,
it will be handy to have a code point-aware implementation for non-ASCII
StringViews.
2025-08-05 07:07:15 -04:00
Timothy Flynn
319e7aa03b AK: Do not replace lonely surragates with U+FFFD while iterating
Utf8View doesn't do this either. The wobbly format is expected by JS.
2025-08-05 07:07:15 -04:00
Timothy Flynn
13ed6aba71 AK+LibIPC: Implement an encoder/decoder for UTF-16 strings 2025-08-02 10:10:14 -07:00
Aliaksandr Kalenik
d47a22150d AK: Define operator== for HashMap 2025-07-30 11:06:05 +02:00
Grant Knowlton
9e1e4f3b15 AK: Validate compressed tags in IPv4-mapped IPv6 address
This disallows parsing IPv4 mapped IPv6 address strings with multiple
compression prefixes.  Tests are provided for the updated
functionality.
2025-07-30 00:53:10 +02:00
Timothy Flynn
d9502505c2 AK: Fix bounds assertions in Utf16View::iterator_offset 2025-07-28 18:30:50 +02:00
Timothy Flynn
67723ef83c AK: Add a method to peek ahead of a UTF-16 iterator 2025-07-28 18:30:50 +02:00
Timothy Flynn
21d7d236e6 AK: Add a method to check if a UTF-16 string contains any code point 2025-07-28 18:30:50 +02:00
Timothy Flynn
ed63a60247 AK: Return an empty optional when UTF-16 code unit lookup fails
Accidentally returned the wrong type here.
2025-07-28 12:25:11 +02:00
Timothy Flynn
baddac5155 AK: Implement a method to split a UTF-16 string 2025-07-28 12:25:11 +02:00
Timothy Flynn
48a3b2c28e AK: Implement a method to count instances of a needle in a UTF-16 string 2025-07-28 12:25:11 +02:00
Andrew Kaster
7d669b8b0c AK: Update Swift test for Utf16String changes 2025-07-26 23:33:58 +02:00
Timothy Flynn
a740bfd8ff AK+LibUnicode: Implement Unicode-aware UTF-16 case transformations 2025-07-25 18:16:22 +02:00
Timothy Flynn
df77ae1920 AK: Implement creating a UTF-16 string from a repeated code point 2025-07-25 18:16:22 +02:00
Jelle Raaijmakers
0b96690f0c AK: Add HashMap::update()
This updates a HashMap by copying another HashMap's keys and values.
2025-07-25 16:22:06 +02:00
Timothy Flynn
6c73dff120 AK: Implement a UTF-16 method to check if a string is ASCII whitespace 2025-07-24 19:00:20 +02:00
Timothy Flynn
f53389bab1 AK: Add a couple of Utf16String factories
* Utf16String::from_utf8_with_replacement_character
* Utf16String::from_code_point
2025-07-24 19:00:20 +02:00
Jelle Raaijmakers
15178d5230 AK: Add ::ends_with() to Utf16View and Utf16StringBase
I noticed that we can significantly simplify ::starts_with(), and based
the new ::ends_with() on that.
2025-07-24 07:18:25 -04:00
Jelle Raaijmakers
54dd45d3f6 AK: Add Span::ends_with()
Originally I added this to use it in Utf16View::ends_with(), but the
final implementation ended up a lot simpler. I chose to keep this anyway
since it mirrors Span::starts_with().
2025-07-24 07:18:25 -04:00
Timothy Flynn
ad7ac679fd AK: Compute Utf16View::code_point_offset_of correctly
There were a couple of issues here, including the following computation
could actually overflow to NumericLimits<size_t>::max():

    code_unit_offset -= it.length_in_code_units();
2025-07-22 17:17:33 +02:00
Timothy Flynn
f595e47c1f AK: Add unit tests for Utf16View::code_unit_offset_of 2025-07-22 17:17:33 +02:00
Jelle Raaijmakers
265e278275 AK: Allow indexing at length in Utf8View::byte_offset_of()
And do the same for Utf8View::code_point_offset_of(). Some of these
`VERIFY`s of the view's length were introduced recently, but they caused
the parsing of named capture groups in RegexParser to crash in some
situations.

Instead, allow indexing at the view's length: the byte offset of code
point `length()` is known, even though that code point does not exist in
the view. Similarly, we know the code point offset at byte offset
`byte_length()`. Beyond those offsets, we still crash.

Fixes 13 failures in test262's `language/literals/regexp/named-groups`.
2025-07-22 09:10:32 -04:00
Timothy Flynn
9582895759 AK+LibJS+LibWeb+LibRegex: Replace AK::Utf16Data with AK::Utf16String 2025-07-18 12:45:38 -04:00
Timothy Flynn
d40e3af697 AK: Implement UTF-16 string-to-number conversions 2025-07-18 12:45:38 -04:00
Timothy Flynn
6e0290ecaa AK: Define some UTF-16 helper methods
* contains
* escape_html_entities
* replace
* to_ascii_lowercase
* to_ascii_uppercase
* to_ascii_titlecase
* trim
* trim_whitespace
2025-07-18 12:45:38 -04:00
Timothy Flynn
7f069efbc4 AK: Implement a flyweight string for Utf16String
Utf16FlyString more or less works exactly the same as FlyString. It will
store the raw encoded data of the string instance. If the string is a
short ASCII string, Utf16FlyString holds the ShortString bytes; else,
Utf16FlyString holds a pointer to the Utf16StringData.
2025-07-18 12:45:38 -04:00
Timothy Flynn
2803d66d87 AK: Support UTF-16 string formatting
The underlying storage used during string formatting is StringBuilder.
To support UTF-16 strings, this patch allows callers to specify a mode
during StringBuilder construction. The default mode is UTF-8, for which
StringBuilder remains unchanged.

In UTF-16 mode, we treat the StringBuilder's internal ByteBuffer as a
series of u16 code units. Appending a single character will append 2
bytes for that character (cast to a char16_t). Appending a StringView
will transcode the string to UTF-16.

Utf16String also gains the same memory optimization that we added for
String, where we hand-off the underlying buffer to Utf16String to avoid
having to re-allocate.

In the future, we may want to further optimize for ASCII strings. For
example, we could defer committing to the u16-esque storage until we
see a non-ASCII code point.
2025-07-18 12:45:38 -04:00