ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2026-04-20 02:40:27 +00:00

Author	SHA1	Message	Date
aplefull	aeec2c804c	LibRegex: Implement Unicode case-insensitive matching Previously, case-insensitive regex matching used ASCII-only case conversion (to_ascii_lowercase) even for Unicode characters. Now we implement Canonicalize abstract operation, so we can case-fold Unicode characters properly during case-insensitive matching.	2026-02-16 07:51:00 -05:00
Ali Mohammad Pur	01be1ed583	LibRegex: Mark OpCode_classes with REGEX_API	2026-02-07 14:09:56 +01:00
Ali Mohammad Pur	6aba31ba13	LibRegex: Add some FileCheck-like tests to ensure opts don't break	2026-02-07 14:09:56 +01:00
Ali Mohammad Pur	fedf0f78ca	LibRegex: Reject RSeekTo crossing the current-to-EOL boundary	2026-02-07 14:09:56 +01:00
aplefull	e4572aa9d7	LibRegex: Add support for regex modifiers This commit implements the regexp-modifiers proposal. It allows us to use modification of i,m,s flags within groups using `(?flags:subpattern)` and `(?flags-flags:subpattern)` syntax.	2026-01-16 15:00:00 +01:00
aplefull	6ce312e22f	LibRegex: Prevent empty matches in optional quantifiers Step 2.b of the RepeatMatcher states that once minimum repetitions are satisfied, empty matches should not be considered for further repetitions. This was not being enforced for optional quantifiers like `?`, so we had extra capture group matches.	2026-01-16 01:11:24 +01:00
mikiubo	535d2476a7	LibRegex: Implement proper lookbehind via new StepBack opcodes This introduces a new mechanism for evaluating lookbehind assertions by adding four new bytecode opcodes: SetStepBack, IncStepBack, CheckStepBack, and CheckSavedPosition. These opcodes replace the previous GoBack-based approach and enables correct handling of variable-length lookbehind patterns, where the match length cannot be known statically. Track lookbehind greediness in the parser and propagate it to bytecode generation. Allow controlled backtracking in lookbehind bodies while avoiding incorrect captures during step-back execution. Partially fix issue: #3459	2026-01-11 23:24:49 +01:00
Ali Mohammad Pur	c1535ef65b	LibRegex: Skip multi-op compare overhead when not necessary	2026-01-05 18:22:11 +01:00
Ali Mohammad Pur	637d47ba30	LibRegex: Add an optimisation for replacing /.*x/ with a seek op This will avoid some catastrophic backtracking by just skipping to 'x'.	2026-01-05 18:22:11 +01:00
Ali Mohammad Pur	e2c6918cdb	LibRegex: Fuse consecutive single-char Compares into a String Compare This avoids huge instruction decoding and dispatch overhead, ~40x performance improvement for /(^\|x)ppp/.	2026-01-05 18:22:11 +01:00
Ali Mohammad Pur	9d49fafdbf	LibRegex: Add an optimisation to skip forks that cannot produce a match ...and implement it for 'start of line' checks. This makes patterns like /(^\|x)ppp/ fork-free at runtime, ~30% perf improvement for that pattern.	2026-01-05 18:22:11 +01:00
Ali Mohammad Pur	3f35d84785	LibRegex+LibJS: Flatten the bytecode buffer before regex execution This makes it so we don't have to unnecessarily check for having a flattened buffer; significant performance increase.	2026-01-05 18:22:11 +01:00
aplefull	1b570fcd61	LibRegex: Correct negated character class escapes behavior Patterns like /[^\S]/ should match whitespace characters, but previously would fail to match. The position would advance twice: once during the character class comparison, and again at the end when temporary_inverse was reset. This caused matches to be skipped incorrectly. Now we advance at the end only if position hasn't already changed during the loop.	2025-12-23 11:04:16 +01:00
Andreas Kling	7d7886afea	LibJS: Don't assume flattened bytecode when dumping OpCode_Compare Fixes #7129	2025-12-13 16:40:19 -06:00
Andreas Kling	67b20017dc	LibRegex: Cache pointer to flattened bytecode data in OpCode_Compare This avoids repeatedly checking if the bytecode has been flattened (which is always the case by the time we're executing). 1.05x speedup on Octane/regexp.js	2025-12-13 13:51:12 -06:00
aplefull	934817d45e	LibRegex: Add missing StringSet cases	2025-11-27 14:02:04 +01:00
Tim Ledbetter	061b457bac	LibRegex: Use `unchecked_empend()` where possible	2025-11-26 14:33:59 +00:00
aplefull	eed4dd3745	LibRegex: Add support for string literals in character classes	2025-11-26 11:34:38 +01:00
aplefull	a49c39de32	LibRegex: Support matching unicode multi-character sequences	2025-11-26 11:34:38 +01:00
aplefull	c4eef822de	LibRegex: Fix backreferences to undefined capture groups Fixes handling of backreferences when the referenced capture group is undefined or hasn't participated in the match. CharacterCompareType::NamedReference is added to distinguish numbered (\1) from named (\k<name>) backreferences. Numbered backreferences use exact group lookup. Named backreferences search for participating groups among duplicates.	2025-10-16 16:37:54 +02:00
Ali Mohammad Pur	c7ad6cd508	LibRegex: Use code unit length in more places that apply Finishes what `7f6b70fafb` started. Having one part use length and another code unit length lead to crashes, the added test ensures we don't mess that up again.	2025-07-24 23:09:01 +02:00
aplefull	e2f8f5a350	LibRegex: Fix capture groups in quantified alternations This prevents empty matches from overwriting non-empty captures in quantified alternations. Fixes patterns like (a\|a?)+ where the optional branch would incorrectly overwrite meaningful captures with empty strings.	2025-07-24 10:40:16 +02:00
Jelle Raaijmakers	930ac9898f	LibRegex: Simplify ternary condition in RegexByteCode No functional changes.	2025-07-22 01:23:52 +02:00
Timothy Flynn	81fc8ab8cc	LibRegex: Rename a couple of RegexStringView methods for clarity `operator[]` -> `code_point_at` `code_unit_at` -> `unicode_aware_code_point_at` `unicode_aware_code_point_at` returns either a code point or a code unit depending on the Unicode flag.	2025-07-21 23:44:18 +02:00
Timothy Flynn	2dfcc4c307	LibRegex: Compare code units (not code points) in non-Unicode char range	2025-07-21 23:44:18 +02:00
Timothy Flynn	9582895759	AK+LibJS+LibWeb+LibRegex: Replace AK::Utf16Data with AK::Utf16String	2025-07-18 12:45:38 -04:00
Ali Mohammad Pur	5b45223d5f	LibRegex: Account for uppercase characters in insensitive patterns	2025-07-12 11:26:23 +02:00
aplefull	486602e796	LibRegex: Fix handling of + quantifier with zero-width matches Small change that allows quantifiers using Fork* forms (e.g., +) to succeed after one match, even if that match has zero width.	2025-06-02 15:52:26 +02:00
Andrew Kaster	3dd2fbd041	LibRegex: Move StringTable ctor/dtor out of line This also moves the next_serial class static into a file scope static. The public class static was causing visibility issues with certain Linux builds when hidden visibility was enabled. However, the current design makes more sense anyway :^).	2025-05-12 03:22:23 -06:00
Ali Mohammad Pur	76f5dce3db	LibRegex: Flatten capture group list in MatchState This makes copying the capture group COWVector significantly cheaper, as we no longer have to run any constructors for it - just memcpy.	2025-04-18 17:09:27 +02:00
Andreas Kling	87ec5b32b0	LibRegex: Use ReadonlySpan to peek into OpCode_Compare LUTs By the time we're executing bytecode, we know the the bytecode will be flattened. This means we can use ReadonlySpan to look into it instead of DisjointChunks::spans(), which allocates.	2025-04-14 17:40:13 +02:00
Andreas Kling	54edf29f1b	LibRegex: Make Match::capture_group_name an index into the string table This removes another Match member that required destruction. The "API" for accessing the strings is definitely a bit awkward. We'll think of something nicer eventually.	2025-04-14 17:40:13 +02:00
Ali Mohammad Pur	299b9ca572	LibRegex: Check backreference index before looking it up If a backref happens after it's cleared, the slot may be cleared already.	2025-04-06 20:21:16 +02:00
Ali Mohammad Pur	4136d8d13e	LibRegex: Use an interned string table for capture group names This avoids messing around with unsafe string pointers and removes the only non-FlyString-able user of DeprecatedFlyString.	2025-04-02 11:43:13 +02:00
Andreas Kling	6b6d3b32a4	LibRegex: Remove the StringCopyMatches mode This mode made a lot of incorrect assumptions about string lifetimes, and instead of fixing it, let's just remove it and tweak the few unit tests that used it.	2025-03-24 22:27:17 +00:00
mikiubo	c85df78c4c	LibRegex: Remove orphaned save points in nested LookAhead	2025-03-17 16:11:02 +01:00
Tim Ledbetter	b9ac99d2eb	Revert "LibRegex: Remove orphaned save points in nested LookAhead" This reverts commit `f2678bfcb8`.	2025-03-14 19:57:33 +00:00
mikiubo	f2678bfcb8	LibRegex: Remove orphaned save points in nested LookAhead	2025-03-14 09:41:41 +01:00
aplefull	61744322ad	LibRegex: Ensure nullable quantifiers backtrack when input remains Makes patterns like `/(a?b??)*/` correctly match the string	2025-03-02 15:19:04 +01:00
Ali Mohammad Pur	eee90f4aa2	LibRegex: Treat checks against nonexistent checkpoints as empty Due to optimiser shenanigans in the tree alternative form, some JumpNonEmpty ops might be moved before their Checkpoint instruction. It is safe to assume the distance between the nonexistent checkpoint and the current op is zero, so just do that.	2024-12-13 10:00:16 +01:00
Timothy Flynn	93712b24bf	Everywhere: Hoist the Libraries folder to the top-level	2024-11-10 12:50:45 +01:00
Andreas Kling	13d7c09125	Libraries: Move to Userland/Libraries/	2021-01-12 12:17:46 +01:00
Sahan Fernando	fe2b8906d4	Everywhere: Fix incorrect uses of String::format and StringBuilder::appendf These changes are arbitrarily divided into multiple commits to make it easier to find potentially introduced bugs with git bisect.	2021-01-11 21:06:32 +01:00
AnotherTest	19bf7734a4	LibRegex: Store 'String' matches inside the bytecode Also removes an unnecessary 'length' argument (StringView has a length!)	2020-12-06 15:38:40 +01:00
AnotherTest	ab2c646d5d	LibRegex: Fix OOB access in Regex debug print	2020-11-30 11:37:30 +01:00
AnotherTest	75081b2bdd	LibRegex: Fix reported compare-against value in debug	2020-11-27 21:32:41 +01:00
AnotherTest	dbef2b1ee9	LibRegex: Implement an ECMA262-compatible parser This also adds support for lookarounds and individually-negated comparisons. The only unimplemented part of the parser spec is the unicode stuff.	2020-11-27 21:32:41 +01:00
AnotherTest	3db8ced4c7	LibRegex: Change bytecode value type to a 64-bit value To allow storing unicode ranges compactly; this is not utilised at the moment, but changing this later would've been significantly more difficult. Also fixes a few debug logs.	2020-11-27 21:32:41 +01:00
AnotherTest	92ea9ed4a5	LibRegex: Fix greedy/reluctant modifiers in PosixExtendedParser Also fixes the issue with assertions causing early termination when they fail.	2020-11-27 21:32:41 +01:00
Emanuel Sprung	6add8b9c05	LibRegex: Remove backup file, remove BOM in RegexParser.cpp, run clang-format	2020-11-27 21:32:41 +01:00

1 2

52 commits