ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-10-19 15:43:20 +00:00

Author	SHA1	Message	Date
Andreas Kling	b50ff02da4	AK: Skip ASCII validation in {Utf16String,String}::number()	2025-10-05 11:24:46 +02:00
Timothy Flynn	8472e469f4	AK+LibJS+LibWeb: Recognize that our UTF-16 string is actually WTF-16 For the web, we allow a wobbly UTF-16 encoding (i.e. lonely surrogates are permitted). Only in a few exceptional cases do we strictly require valid UTF-16. As such, our `validate(AllowLonelySurrogates::Yes)` calls will always succeed. It's a wasted effort to ever make such a check. This patch eliminates such invocations. The validation methods will now only check for strict UTF-16, and are only invoked when needed.	2025-08-13 09:56:13 -04:00
Timothy Flynn	36c7302178	AK: Optimize the UTF-16 StringBuilder for ASCII storage When we build a UTF-16 string, we currently always switch to the UTF-16 storage mode inside StringBuilder. Then when it comes time to create the string, we switch the storage to ASCII if possible (by shifting the underlying bytes up). Instead, let's start out with ASCII storage and then switch to UTF-16 storage once we see a non-ASCII code point. For most strings, this will avoid allocating 2x the memory, and avoids many ASCII validation calls.	2025-08-13 09:56:13 -04:00
Timothy Flynn	13ed6aba71	AK+LibIPC: Implement an encoder/decoder for UTF-16 strings	2025-08-02 10:10:14 -07:00
Timothy Flynn	1375e6bf39	AK+LibJS+LibWeb: Use simdutf to create well-formed strings	2025-07-26 00:40:06 +02:00
Timothy Flynn	2803d66d87	AK: Support UTF-16 string formatting The underlying storage used during string formatting is StringBuilder. To support UTF-16 strings, this patch allows callers to specify a mode during StringBuilder construction. The default mode is UTF-8, for which StringBuilder remains unchanged. In UTF-16 mode, we treat the StringBuilder's internal ByteBuffer as a series of u16 code units. Appending a single character will append 2 bytes for that character (cast to a char16_t). Appending a StringView will transcode the string to UTF-16. Utf16String also gains the same memory optimization that we added for String, where we hand-off the underlying buffer to Utf16String to avoid having to re-allocate. In the future, we may want to further optimize for ASCII strings. For example, we could defer committing to the u16-esque storage until we see a non-ASCII code point.	2025-07-18 12:45:38 -04:00
Timothy Flynn	fe676585f5	AK: Add a UTF-16 string with optimized short- and ASCII-string storage This is a strictly UTF-16 string with some optimizations for ASCII. * If created from a short UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an inlined byte buffer. * If created with a long UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an outlined char buffer. * If created with a short or long UTF-8 or UTF-16 string that is not ASCII, then the string is stored in an outlined char16 buffer. We do not store short non-ASCII text in the inlined buffer to avoid confusion with operations such as `length_in_code_units` and `code_unit_at`. For example, "😀" would be stored as 4 UTF-8 bytes in short string form. But we still want `length_in_code_units` to be 2, and `code_unit_at(0)` to be 0xD83D.	2025-07-18 12:45:38 -04:00

7 commits