ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-12-08 06:09:58 +00:00

Author	SHA1	Message	Date
Callum Law	8ada4b7fdc	LibRegex: Account for opcode size when calculating incoming jump edges Not accounting for opcode size when calculating incoming jump edges meant that we were merging nodes where we otherwise shouldn't have been, for example /.a\|.b/.	2025-07-28 17:06:58 +02:00
Jelle Raaijmakers	73967ee90c	Everywhere: Use HashMap::update() where applicable	2025-07-25 16:22:06 +02:00
Ali Mohammad Pur	c7ad6cd508	LibRegex: Use code unit length in more places that apply Finishes what `7f6b70fafb` started. Having one part use length and another code unit length lead to crashes, the added test ensures we don't mess that up again.	2025-07-24 23:09:01 +02:00
aplefull	e2f8f5a350	LibRegex: Fix capture groups in quantified alternations This prevents empty matches from overwriting non-empty captures in quantified alternations. Fixes patterns like (a\|a?)+ where the optional branch would incorrectly overwrite meaningful captures with empty strings.	2025-07-24 10:40:16 +02:00
Jelle Raaijmakers	3db7d802db	LibRegex: Early return in `Parser::try_skip()` No functional changes.	2025-07-22 09:10:32 -04:00
Jelle Raaijmakers	7f6b70fafb	LibRegex: Use code unit length in Matcher<Parser>::match() We were calling into `view.length()`, which potentially returned the code _point_ length for Utf16Views. Make sure we use the code unit length instead, since we're only indexing into code units.	2025-07-22 01:23:52 +02:00
Jelle Raaijmakers	930ac9898f	LibRegex: Simplify ternary condition in RegexByteCode No functional changes.	2025-07-22 01:23:52 +02:00
Timothy Flynn	81fc8ab8cc	LibRegex: Rename a couple of RegexStringView methods for clarity `operator[]` -> `code_point_at` `code_unit_at` -> `unicode_aware_code_point_at` `unicode_aware_code_point_at` returns either a code point or a code unit depending on the Unicode flag.	2025-07-21 23:44:18 +02:00
Timothy Flynn	2dfcc4c307	LibRegex: Compare code units (not code points) in non-Unicode char range	2025-07-21 23:44:18 +02:00
Timothy Flynn	9582895759	AK+LibJS+LibWeb+LibRegex: Replace AK::Utf16Data with AK::Utf16String	2025-07-18 12:45:38 -04:00
Ali Mohammad Pur	5b45223d5f	LibRegex: Account for uppercase characters in insensitive patterns	2025-07-12 11:26:23 +02:00
Shannon Booth	bd6581fe22	LibRegex: Correctly use ClassSetReservedPunctuator in ClassSetCharacter We had typo'd using ClassSetReservedDoublePunctuator which was resulting in a parse error for the regex: ([^\\:]+?) With the 'v' flag set. Co-Authored-By: Ali Mohammad Pur <mpfard@serenityos.org>	2025-07-10 11:41:02 +02:00
ayeteadoe	25f5936dee	CMake: Rename serenity_* helper functions/macros to ladybird_*	2025-07-03 23:19:41 +02:00
Timothy Flynn	62d9a84b8d	AK+Everywhere: Replace custom number parsers with fast_float Our floating point number parser was based on the fast_float library: https://github.com/fastfloat/fast_float However, our implementation only supports 8-bit characters. To support UTF-16, we will need to be able to convert char16_t-based strings to numbers as well. This works out-of-the-box with fast_float. We can also use fast_float for integer parsing.	2025-07-03 09:51:56 -04:00
Timothy Flynn	9fc3e72db2	AK+Everywhere: Allow lonely UTF-16 surrogates by default By definition, the web allows lonely surrogates by default. Let's have our string APIs reflect this, so we don't have to pass an allow option all over the place.	2025-07-03 09:51:56 -04:00
Timothy Flynn	86b1c78c1a	AK+Everywhere: Prepare Utf16View for integration with a UTF-16 string To prepare for an upcoming Utf16String, this migrates Utf16View to store its data as a char16_t. Most function definitions are moved inline and made constexpr. This also adds a UDL to construct a Utf16View from a string literal: auto string = u"hello"sv; This let's us remove the NTTP Utf16View constructor, as we have found that such constructors bloat binary size quite a bit.	2025-07-03 09:51:56 -04:00
Ali Mohammad Pur	b0e471228d	LibRegex: Avoid use-after-return of MatchState in 'is_an_eligible_jump' The opcode may have last been accessed by block_satisfies_atomic_rewrite_precondition, which would set it to a state that no longer exists. Set the state to the correct one unconditionally to ensure we're looking at the right value. Fixes #5145.	2025-06-24 18:43:01 +02:00
Ali Mohammad Pur	2947ae7d6e	LibRegex: Move required bytecode.flatten() outside optimization function Not running the optimization passes should not leave the bytecode in a broken state. Fixes #5146.	2025-06-24 18:43:01 +02:00
aplefull	486602e796	LibRegex: Fix handling of + quantifier with zero-width matches Small change that allows quantifiers using Fork* forms (e.g., +) to succeed after one match, even if that match has zero width.	2025-06-02 15:52:26 +02:00
Ali Mohammad Pur	cfc241f61d	LibRegex: Make the trie rewrite optimisation maintain the alt order This is required by the spec.	2025-05-21 14:28:45 +02:00
Ali Mohammad Pur	2eccd68ba5	LibRegex: Document the append_alternative optimisation a bit	2025-05-21 14:28:45 +02:00
Timothy Flynn	7280ed6312	Meta: Enforce newlines around namespaces This has come up several times during code review, so let's just enforce it using a new clang-format 20 option.	2025-05-14 02:01:59 -06:00
ayeteadoe	a3754a7bf1	LibRegex: Annotate classes with export macro for hidden visibility This fix demos the gradual opt-in migration process libraries can take to switch to explicit symbol exports via the FOO_API macros.	2025-05-12 03:22:23 -06:00
Andrew Kaster	3dd2fbd041	LibRegex: Move StringTable ctor/dtor out of line This also moves the next_serial class static into a file scope static. The public class static was causing visibility issues with certain Linux builds when hidden visibility was enabled. However, the current design makes more sense anyway :^).	2025-05-12 03:22:23 -06:00
ayeteadoe	0253342c1a	LibRegex: Use `NO_UNIQUE_ADDRESS` in RegexMatch for Windows support Clang's `x86_64-pc-windows-msvc` target requires `[[msvc::no_unique_address]]`, which is properly set in the `NO_UNIQUE_ADDRESS` macro in `AK/Platform.h`. Without this, building on Windows fails due to `-Wunknown-attributes`.	2025-05-12 03:22:23 -06:00
Ali Mohammad Pur	022cd1adca	LibRegex: Use the right offset when patching jumps through fork-trees Fixes #4474.	2025-04-27 12:16:15 +02:00
Ali Mohammad Pur	fca1d33fec	LibRegex: Correctly calculate the target for Repeat in table alts Fixes a bunch of websites breaking because we now verify jump offsets by trying to remove 0-offset jumps. This has been broken for a good while, it was just rare to see Repeat inside alternatives that lended themselves well to tree alts.	2025-04-24 01:17:27 -06:00
Ali Mohammad Pur	4b9abdb963	LibRegex: Remove useless jumps (Jump* +0) before running opts This leads to some more significant performance increases on the simple /<script\|<style\|<link/ regex in speedometer (~2x)	2025-04-23 22:57:49 +02:00
Ali Mohammad Pur	ec0836c9ea	LibRegex: Don't blindly treat multi-target tree jumps as a single jump The tree generation was broken, we just didn't notice it because it was very rarely being picked for more complex bytecodes.	2025-04-23 22:57:49 +02:00
Ali Mohammad Pur	09eb28ee1d	LibRegex: Better estimate the cost of laying out alts as a chain Previously we were counting the total number of nodes in the tree for the chain cost, which greatly underestimated its cost when large bytecode entries were present, This commit switches to estimating it using the total bytecode size, which is a closer value to the true cost than the tree node count. This corresponds to a ~4x perf improvement on /<script\|<style\|<link/ in speedometer.	2025-04-23 22:57:49 +02:00
Ali Mohammad Pur	eea81738cd	AK+Everywhere: Recognise that surrogates in utf16 aren't all that common For the slight cost of counting code points when converting between encodings and a teeny bit of memory, this commit adds a fast path for all-happy utf-16 substrings and code point operations. This seems to be a significant chunk of time spent in many regex benchmarks.	2025-04-23 07:56:02 -06:00
Ali Mohammad Pur	3b4a184f1a	LibRegex: Avoid hashing the state hashes again We already had a really nice hash that had a single issue, this commit fixes that and makes it the hash for the hash table, so we avoid double-hashing and making a long chain. This is an easy 10% perf gain.	2025-04-18 17:09:27 +02:00
Ali Mohammad Pur	446a453719	LibRegex: Pull out the first compare to avoid unnecessary execution This adds a fast-path to drop view indices we know will not match immediately without going through the regex VM.	2025-04-18 17:09:27 +02:00
Ali Mohammad Pur	76f5dce3db	LibRegex: Flatten capture group list in MatchState This makes copying the capture group COWVector significantly cheaper, as we no longer have to run any constructors for it - just memcpy.	2025-04-18 17:09:27 +02:00
Andreas Kling	ca2f0141f6	LibRegex: Remove unused "simple substring search" optimization This is not relevant for LibJS since it only works when the input is UTF-8, and LibJS always provides UTF-16.	2025-04-16 10:04:50 +02:00
Andreas Kling	96f1f15ad6	LibRegex: Remove unused Utf8View/Utf32View support in RegexStringView	2025-04-16 10:04:50 +02:00
Andreas Kling	87ec5b32b0	LibRegex: Use ReadonlySpan to peek into OpCode_Compare LUTs By the time we're executing bytecode, we know the the bytecode will be flattened. This means we can use ReadonlySpan to look into it instead of DisjointChunks::spans(), which allocates.	2025-04-14 17:40:13 +02:00
Andreas Kling	c1c3b01a6c	LibRegex: Allow Vector<Match> to use trivial memcpy Now that Match has no more members that need destruction, we can allow Vector to memcpy them around.	2025-04-14 17:40:13 +02:00
Andreas Kling	5308d77600	LibRegex: Don't use Optional<T> inside regex::Match This prevented Match from being trivially copyable, which we want it to be for fast Vector copying.	2025-04-14 17:40:13 +02:00
Andreas Kling	54edf29f1b	LibRegex: Make Match::capture_group_name an index into the string table This removes another Match member that required destruction. The "API" for accessing the strings is definitely a bit awkward. We'll think of something nicer eventually.	2025-04-14 17:40:13 +02:00
Andreas Kling	9d47cc54f8	LibRegex: Remove unused regex::Match::string and unused constructor This shrinks regex::Match by 8 bytes and removes a member that needs destruction.	2025-04-14 17:40:13 +02:00
Ali Mohammad Pur	69050da929	LibRegex: Merge inverse string table mappings separately	2025-04-06 20:21:16 +02:00
Ali Mohammad Pur	299b9ca572	LibRegex: Check backreference index before looking it up If a backref happens after it's cleared, the slot may be cleared already.	2025-04-06 20:21:16 +02:00
Jess	83e46b3728	LibRegex: Fix crash when parse result exceeds max cache size Before, If the cache was empty we would try and evict non-existant entries and crash. So the fix is to make sure that we don't saturate the cache with a single parse result.	2025-04-04 16:10:25 +02:00
Ali Mohammad Pur	4136d8d13e	LibRegex: Use an interned string table for capture group names This avoids messing around with unsafe string pointers and removes the only non-FlyString-able user of DeprecatedFlyString.	2025-04-02 11:43:13 +02:00
Andreas Kling	e5db913b0d	Revert "LibRegex: Port remaining DeprecatedFlyString to ByteString" This reverts commit `aab3fbe254`. Greatly regressed JavaScript benchmark performance.	2025-04-01 15:40:38 +02:00
Andreas Kling	7c32d1e8a5	Revert "Everywhere: Remove DeprecatedFlyString + any remaining references to it" This reverts commit `3131e6369f`. Greatly regressed JavaScript benchmark performance.	2025-04-01 15:40:27 +02:00
Kenneth Myhra	3131e6369f	Everywhere: Remove DeprecatedFlyString + any remaining references to it	2025-04-01 12:50:00 +02:00
Kenneth Myhra	aab3fbe254	LibRegex: Port remaining DeprecatedFlyString to ByteString	2025-04-01 12:50:00 +02:00
Andreas Kling	6b6d3b32a4	LibRegex: Remove the StringCopyMatches mode This mode made a lot of incorrect assumptions about string lifetimes, and instead of fixing it, let's just remove it and tweak the few unit tests that used it.	2025-03-24 22:27:17 +00:00

1 2 3

111 commits