ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-12-08 06:09:58 +00:00

Author	SHA1	Message	Date
Timothy Flynn	ad7ac679fd	AK: Compute Utf16View::code_point_offset_of correctly There were a couple of issues here, including the following computation could actually overflow to NumericLimits<size_t>::max(): code_unit_offset -= it.length_in_code_units();	2025-07-22 17:17:33 +02:00
Timothy Flynn	f595e47c1f	AK: Add unit tests for Utf16View::code_unit_offset_of	2025-07-22 17:17:33 +02:00
Jelle Raaijmakers	265e278275	AK: Allow indexing at length in Utf8View::byte_offset_of() And do the same for Utf8View::code_point_offset_of(). Some of these `VERIFY`s of the view's length were introduced recently, but they caused the parsing of named capture groups in RegexParser to crash in some situations. Instead, allow indexing at the view's length: the byte offset of code point `length()` is known, even though that code point does not exist in the view. Similarly, we know the code point offset at byte offset `byte_length()`. Beyond those offsets, we still crash. Fixes 13 failures in test262's `language/literals/regexp/named-groups`.	2025-07-22 09:10:32 -04:00
Timothy Flynn	9582895759	AK+LibJS+LibWeb+LibRegex: Replace AK::Utf16Data with AK::Utf16String	2025-07-18 12:45:38 -04:00
Timothy Flynn	d40e3af697	AK: Implement UTF-16 string-to-number conversions	2025-07-18 12:45:38 -04:00
Timothy Flynn	6e0290ecaa	AK: Define some UTF-16 helper methods * contains * escape_html_entities * replace * to_ascii_lowercase * to_ascii_uppercase * to_ascii_titlecase * trim * trim_whitespace	2025-07-18 12:45:38 -04:00
Timothy Flynn	7f069efbc4	AK: Implement a flyweight string for Utf16String Utf16FlyString more or less works exactly the same as FlyString. It will store the raw encoded data of the string instance. If the string is a short ASCII string, Utf16FlyString holds the ShortString bytes; else, Utf16FlyString holds a pointer to the Utf16StringData.	2025-07-18 12:45:38 -04:00
Timothy Flynn	2803d66d87	AK: Support UTF-16 string formatting The underlying storage used during string formatting is StringBuilder. To support UTF-16 strings, this patch allows callers to specify a mode during StringBuilder construction. The default mode is UTF-8, for which StringBuilder remains unchanged. In UTF-16 mode, we treat the StringBuilder's internal ByteBuffer as a series of u16 code units. Appending a single character will append 2 bytes for that character (cast to a char16_t). Appending a StringView will transcode the string to UTF-16. Utf16String also gains the same memory optimization that we added for String, where we hand-off the underlying buffer to Utf16String to avoid having to re-allocate. In the future, we may want to further optimize for ASCII strings. For example, we could defer committing to the u16-esque storage until we see a non-ASCII code point.	2025-07-18 12:45:38 -04:00
Timothy Flynn	fe676585f5	AK: Add a UTF-16 string with optimized short- and ASCII-string storage This is a strictly UTF-16 string with some optimizations for ASCII. * If created from a short UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an inlined byte buffer. * If created with a long UTF-8 or UTF-16 string that is also ASCII, then the string is stored in an outlined char buffer. * If created with a short or long UTF-8 or UTF-16 string that is not ASCII, then the string is stored in an outlined char16 buffer. We do not store short non-ASCII text in the inlined buffer to avoid confusion with operations such as `length_in_code_units` and `code_unit_at`. For example, "😀" would be stored as 4 UTF-8 bytes in short string form. But we still want `length_in_code_units` to be 2, and `code_unit_at(0)` to be 0xD83D.	2025-07-18 12:45:38 -04:00
Timothy Flynn	8fbb80fffc	AK: Do not fall back to simdutf for UTF-16 ASCII validation This was a mistake. Consider U+201C (LEFT DOUBLE QUOTATION MARK). This code point is encoded as the bytes 0x1c 0x20 in UTF-16LE. Both of these bytes are ASCII if interpreted as UTF-8. But the string itself is most certainly not ASCII.	2025-07-18 12:45:38 -04:00
Timothy Flynn	01ebf1eb07	AK: Replace surrogates in String::from_utf8_with_replacement_character We are expected to replace lonely surrogates with U+FFFD when decoding UTF-8 text.	2025-07-06 04:30:17 +12:00
ayeteadoe	25f5936dee	CMake: Rename serenity_* helper functions/macros to ladybird_*	2025-07-03 23:19:41 +02:00
Timothy Flynn	62d9a84b8d	AK+Everywhere: Replace custom number parsers with fast_float Our floating point number parser was based on the fast_float library: https://github.com/fastfloat/fast_float However, our implementation only supports 8-bit characters. To support UTF-16, we will need to be able to convert char16_t-based strings to numbers as well. This works out-of-the-box with fast_float. We can also use fast_float for integer parsing.	2025-07-03 09:51:56 -04:00
Timothy Flynn	9fc3e72db2	AK+Everywhere: Allow lonely UTF-16 surrogates by default By definition, the web allows lonely surrogates by default. Let's have our string APIs reflect this, so we don't have to pass an allow option all over the place.	2025-07-03 09:51:56 -04:00
Timothy Flynn	86b1c78c1a	AK+Everywhere: Prepare Utf16View for integration with a UTF-16 string To prepare for an upcoming Utf16String, this migrates Utf16View to store its data as a char16_t. Most function definitions are moved inline and made constexpr. This also adds a UDL to construct a Utf16View from a string literal: auto string = u"hello"sv; This let's us remove the NTTP Utf16View constructor, as we have found that such constructors bloat binary size quite a bit.	2025-07-03 09:51:56 -04:00
Timothy Flynn	2abc955ca9	AK: Allow treating UTF-16 views with lonely surrogates as valid Much of the web requires us to allow lonely surrogates in UTF-16 data. The default behavior to disallow such code units has not been changed here - that will be changed in an upcoming commit.	2025-07-03 09:51:56 -04:00
Timothy Flynn	d978a582a0	AK: Add a Utf16View ASCII validator	2025-07-03 09:51:56 -04:00
Timothy Flynn	35a1832d08	Tests/AK: Rename TestUtf16 / TestUtf8 to TestUtf16View / TestUtf8View These are the files they actually test, so let's rename them to avoid confusion with an upcoming Utf16String test.	2025-07-03 09:51:56 -04:00
Luke Wilde	31a8004ddb	AK: Add the ability to consume specifically by a predicate This will be used by Content Security Policy to consume the next character, if it matches a whole range of characters, such as is_ascii_alpha.	2025-07-01 10:24:24 +12:00
Tomasz Strejczek	8f8e51b1fc	AK: Implement AK::UnixDateTime::to_string() Copy implementation of LibCore::DateTime::to_string() to AK. Rename TestDuration.cpp to TestTime.cpp and add there tests for to_string().	2025-06-19 18:42:45 -06:00
Tomasz Strejczek	e03c558a0a	AK: Implement demangle() for MSVC ABI This implements demangle() using Windows API. Also some rudimentary test is provided.	2025-06-17 18:39:18 -06:00
Sam Atkins	26105b8b11	AK: Add a Formatter for Checked This goes in Format.h instead of Checked.h, to avoid an include cycle.	2025-06-17 20:44:01 +02:00
Jelle Raaijmakers	6f926e6977	AK: Add `Utf8View::code_point_offset_of()`	2025-06-13 15:08:26 +02:00
Jelle Raaijmakers	cc0a28ee7d	AK: Add `Utf16View::find_code_unit_offset(_ignoring_case)`	2025-06-13 15:08:26 +02:00
Jelle Raaijmakers	7d7f6fa494	AK: Remove superfluous check from `Utf16View::equals_ignoring_case()` Returning true if both lengths are 0 is already handled by the default case.	2025-06-13 15:08:26 +02:00
Jelle Raaijmakers	b558b4dba6	AK: Add `Span<T>::index_of(ReadonlySpan)` This will be used for case-sensitive substring index matches in a later commit.	2025-06-13 15:08:26 +02:00
ayeteadoe	8cf01a25c2	AK: Add initial support for AK testsuite on Windows We now explicitly enabling support for the minimum libraries needed to build and run the AK testsuite. 81/82 tests are running and passing. The exception is LexicalPath, as some path behaviour on Windows is different than Unix, so the current tests will have lots of platform specific failures. The implementer of LexicalPathWindows recommended windows-specific tests here, so I will do that in a follow up.	2025-05-20 10:58:43 -06:00
Ashton	5f5ae6bf8b	AK: Replace wchar_t formatting with char32_t This makes TestFormat fully cross-platform as we no longer have to work around the 16 vs 32-bit wide strings	2025-05-18 19:18:13 -06:00
Ashton	4b3a3b0856	AK: Remove redundant TestPrint test This test was only useful when AK/PrintfImplementation.h existed. But that was removed 11 months ago, so since then this has just been testing std library functions not implemented by us.	2025-05-18 19:18:13 -06:00
Andreas Kling	734bc2a0ea	AK: Strip trailing zero decimals in default formatting of float numbers This gives us a more human-looking serialization of numbers by default, and in case a fixed number of decimal digits is actually wanted, we still have the 'f' specifier.	2025-05-18 17:23:34 +02:00
ayeteadoe	744fd91d0b	LibTest: Support death tests without child process cloning A challenge for getting LibTest working on Windows has always been CrashTest. It implements death tests similar to Google Test where a child process is cloned to invoke the expression that should abort/terminate the program. Then the exit code of the child is used by the parent test process to verify if the application correctly aborted/terminated due to invoking the expression. The problem was that finding an equivalent way to port Crash::run() to Windows was not looking very likely as publicly exposed Win32/ Native APIs have no equivalent to fork(); however, Windows actually does have native support for process cloning via undocumented NT APIs that clever people reverse engineered and published, see `NtCreateUserProcess()`. All that being said, this `EXPECT_DEATH()` implementation avoids needing to use a child process in general, allowing us to remove CrashTest in favour of a single cross-platform solution for death tests.	2025-05-16 13:23:32 -06:00
Andreas Kling	cf6e2531d9	AK: Make String::number() much faster for integer types Instead of going through String::formatted(), we now have a specialized code path for base-10 serialization directly to UTF-8. This is roughly 5-10x faster than the previous implementation, depending on how many digits we end up outputting. 1.07x speedup on MicroBench/for-in-indexed-properties.js	2025-05-02 19:13:03 +02:00
Tim Ledbetter	31dea89fe0	AK: Add lowest common multiple and greatest common divisor functions	2025-04-23 09:13:45 +01:00
Jonne Ransijn	bb20a0d8f8	AK: Allow the `Optional<T>` move assignment operator to be trivial This will change behaviour for moved-from `Optional<T>`s, since they will now no longer clear their value if `T` is trivial. However, a moved-from value should be considered to be in an unspecified state. Use `Optional<T>::clear` or `Optional<T>::release_value` instead.	2025-04-22 21:19:31 -06:00
Jonne Ransijn	a059ab4677	AK: Allow `Optional<T>` to be used in constant expressions	2025-04-22 21:19:31 -06:00
Andreas Kling	7628ddfaf7	AK: Remove endianness override from Utf16View Utf16View is now always in "host" endian mode. This makes it smaller and less branchy for everyone!	2025-04-16 10:04:50 +02:00
Timothy Flynn	917537b449	AK: Enable compile-time check of a format test string Our implementation seems to parse this string just fine now.	2025-04-08 20:00:18 -04:00
Timothy Flynn	1d9e226206	AK: Remove unused UTF-8 / other factory methods from ByteString	2025-04-07 17:44:38 +02:00
Timothy Flynn	ee6b2db009	AK+LibURL+LibWeb: Use simdutf to validate ASCII strings simdutf provides a vectorized ASCII validator, so let's use that instead of looping over strings manually.	2025-04-06 11:05:58 -04:00
Timothy Flynn	7f37a8f60f	AK: Add an `AK::find` helper to return a reference to the found value This is often more convenient than dealing with iterators. This commit includes a couple conversions to find_value as examples.	2025-04-06 13:45:10 +02:00
rmg-x	37998895d8	AK+Meta+LibCore+Tests: Remove unused `SipHash` implementation This is a homegrown implementation that wasn't actually used in dependent classes. If this is needed in the future, using OpenSSL would probably be a better option.	2025-04-06 01:47:50 +02:00
rmg-x	6480e1a3fe	AK+Tests: Add support for URI syntax in `IPv6Address::from_string` This supports IPv6 strings that start with `[` and end with `]` in accordance with RFC3986 which states: A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is distinguished by enclosing the IP literal within square brackets ("[" and "]"). This is the only place where square bracket characters are allowed in the URI syntax.	2025-04-05 14:26:09 -04:00
Kenneth Myhra	82a2ae99c8	Everywhere: Remove DeprecatedFlyString + any remaining references to it This reverts commit `7c32d1e8a5`.	2025-04-02 11:43:13 +02:00
Andreas Kling	7c32d1e8a5	Revert "Everywhere: Remove DeprecatedFlyString + any remaining references to it" This reverts commit `3131e6369f`. Greatly regressed JavaScript benchmark performance.	2025-04-01 15:40:27 +02:00
Kenneth Myhra	3131e6369f	Everywhere: Remove DeprecatedFlyString + any remaining references to it	2025-04-01 12:50:00 +02:00
Andrew Kaster	01ac48b36f	AK: Support storing blocks in AK::Function This has two slightly different implementations for ARC and non-ARC compiler modes. The main idea is to store a block pointer as our closure and use either ARC magic or BlockRuntime methods to manage the memory for the block. Things are complicated by the fact that we don't yet force-enable swift, so we can't count on the swift.org llvm fork being our compiler toolchain. The patch adds some CMake checks and ifdefs to still support environments without support for blocks or ARC.	2025-03-18 17:15:08 -06:00
Tim Ledbetter	040dca0223	AK: Add `first_is_equal_to_all_of()` This method returns true if all arguments are equal.	2025-03-18 21:55:06 +01:00
Jess	88c4f71114	AK/Checked: Dont verify overflow bit in lvalue operations Before, adding an overflow'n `Checked<T>` to another `Checked<T>` would cause a verification faliure when instead it should propogate m_overflow and allow the user to handle the overflow.	2025-02-25 11:20:13 +00:00
Timothy Flynn	1e841cd453	Tests: Add a test for moving an object out of a JSON value I recently questioned whether this would work as expected: JsonValue value { JsonObject {} }; auto object = move(value.as_object()); So this just adds a unit test to ensure that it does.	2025-02-24 12:05:29 -05:00
Timothy Flynn	fe2dff4944	AK+Everywhere: Convert JSON value serialization to String This removes the use of StringBuilder::OutputType (which was ByteString, and only used by the JSON classes). And it removes the StringBuilder template parameter from the serialization methods; this was only ever used with StringBuilder, so a template is pretty overkill here.	2025-02-20 19:27:51 -05:00

1 2 3 4 5 ...

605 commits