Ali Mohammad Pur
f4d4bd9ed1
LibRegex: Ignore 'FailIfEmpty' in dot-star loop detection
2026-02-07 14:09:56 +01:00
Ali Mohammad Pur
2677338f43
LibRegex: Process RSeekTo candidates in the correct order
2026-01-07 00:14:02 +01:00
Ali Mohammad Pur
9668927dfc
LibRegex: Don't generate duplicate results for /.*/ patterns
...
Since the code pattern may span multiple blocks, this can generate
duplicate results; keep the last one to avoid corrupting the bytecode.
2026-01-06 19:09:27 +01:00
Ali Mohammad Pur
363f1f6568
LibRegex: Correctly calculate ForkIf target offset in tree alternatives
2026-01-06 19:09:27 +01:00
Ali Mohammad Pur
fbd898fb54
LibRegex: Use nicer rewrite APIs where possible
...
Co-Authored-By: Hendiadyoin1 <leon.a@serenityos.org>
2026-01-05 18:22:11 +01:00
Ali Mohammad Pur
c1535ef65b
LibRegex: Skip multi-op compare overhead when not necessary
2026-01-05 18:22:11 +01:00
Ali Mohammad Pur
637d47ba30
LibRegex: Add an optimisation for replacing /.*x/ with a seek op
...
This will avoid some catastrophic backtracking by just skipping to 'x'.
2026-01-05 18:22:11 +01:00
Ali Mohammad Pur
77d982d6fe
LibRegex: Restore the pure substring search optimisation for u16view
...
ca2f0141f6 removed only the execution side
of this, which made it skip some optimisations for pure string searches.
This commit implements it properly for utf16 strings instead.
2026-01-05 18:22:11 +01:00
Ali Mohammad Pur
e2c6918cdb
LibRegex: Fuse consecutive single-char Compares into a String Compare
...
This avoids huge instruction decoding and dispatch overhead, ~40x
performance improvement for /(^|x)ppp/.
2026-01-05 18:22:11 +01:00
Ali Mohammad Pur
9d49fafdbf
LibRegex: Add an optimisation to skip forks that cannot produce a match
...
...and implement it for 'start of line' checks.
This makes patterns like /(^|x)ppp/ fork-free at runtime, ~30% perf
improvement for that pattern.
2026-01-05 18:22:11 +01:00
Ali Mohammad Pur
0acac7f02b
LibRegex: Split basic blocks at jump targets too
2026-01-05 18:22:11 +01:00
Ali Mohammad Pur
3f35d84785
LibRegex+LibJS: Flatten the bytecode buffer before regex execution
...
This makes it so we don't have to unnecessarily check for having a
flattened buffer; significant performance increase.
2026-01-05 18:22:11 +01:00
aplefull
a49c39de32
LibRegex: Support matching unicode multi-character sequences
2025-11-26 11:34:38 +01:00
Ali Mohammad Pur
d5d37abfa5
AK+LibRegex: Only set node metadata on Trie::ensure_child if missing
...
a290034a81 passed an empty vector to this,
which caused nodes that appeared multiple times to reset the trie
metadata...which broke the optimisation.
This patchset makes the function take a 'provide missing metadata'
function instead, and only invokes it when the node is missing rather
than unconditionally setting the metadata on all nodes.
2025-11-21 02:46:33 +01:00
Ali Mohammad Pur
a290034a81
LibRegex: Start alternation opt nodes with an empty vector
...
...instead of checking every time whether there's a vector there.
Fixes #6755 .
2025-11-08 11:51:27 +01:00
Ali Mohammad Pur
57ef949b61
LibRegex: Account for nested 'or' compare ops
...
Closes #6647 .
2025-11-01 17:49:57 +01:00
aplefull
c4eef822de
LibRegex: Fix backreferences to undefined capture groups
...
Fixes handling of backreferences when the referenced capture group is
undefined or hasn't participated in the match.
CharacterCompareType::NamedReference is added to distinguish numbered
(\1) from named (\k<name>) backreferences. Numbered backreferences use
exact group lookup. Named backreferences search for participating
groups among duplicates.
2025-10-16 16:37:54 +02:00
Callum Law
8ada4b7fdc
LibRegex: Account for opcode size when calculating incoming jump edges
...
Not accounting for opcode size when calculating incoming jump edges
meant that we were merging nodes where we otherwise shouldn't have been,
for example /.*a|.*b/.
2025-07-28 17:06:58 +02:00
Ali Mohammad Pur
5b45223d5f
LibRegex: Account for uppercase characters in insensitive patterns
2025-07-12 11:26:23 +02:00
Ali Mohammad Pur
b0e471228d
LibRegex: Avoid use-after-return of MatchState in 'is_an_eligible_jump'
...
The opcode may have last been accessed by
block_satisfies_atomic_rewrite_precondition, which would set it to a
state that no longer exists.
Set the state to the correct one unconditionally to ensure we're looking
at the right value.
Fixes #5145 .
2025-06-24 18:43:01 +02:00
Ali Mohammad Pur
2947ae7d6e
LibRegex: Move required bytecode.flatten() outside optimization function
...
Not running the optimization passes should not leave the bytecode in a
broken state. Fixes #5146 .
2025-06-24 18:43:01 +02:00
Ali Mohammad Pur
cfc241f61d
LibRegex: Make the trie rewrite optimisation maintain the alt order
...
This is required by the spec.
2025-05-21 14:28:45 +02:00
Ali Mohammad Pur
2eccd68ba5
LibRegex: Document the append_alternative optimisation a bit
2025-05-21 14:28:45 +02:00
Timothy Flynn
7280ed6312
Meta: Enforce newlines around namespaces
...
This has come up several times during code review, so let's just enforce
it using a new clang-format 20 option.
2025-05-14 02:01:59 -06:00
Ali Mohammad Pur
022cd1adca
LibRegex: Use the right offset when patching jumps through fork-trees
...
Fixes #4474 .
2025-04-27 12:16:15 +02:00
Ali Mohammad Pur
fca1d33fec
LibRegex: Correctly calculate the target for Repeat in table alts
...
Fixes a bunch of websites breaking because we now verify jump offsets by
trying to remove 0-offset jumps.
This has been broken for a good while, it was just rare to see Repeat
inside alternatives that lended themselves well to tree alts.
2025-04-24 01:17:27 -06:00
Ali Mohammad Pur
4b9abdb963
LibRegex: Remove useless jumps (Jump* +0) before running opts
...
This leads to some more significant performance increases on the simple
/<script|<style|<link/ regex in speedometer (~2x)
2025-04-23 22:57:49 +02:00
Ali Mohammad Pur
ec0836c9ea
LibRegex: Don't blindly treat multi-target tree jumps as a single jump
...
The tree generation was broken, we just didn't notice it because it was
very rarely being picked for more complex bytecodes.
2025-04-23 22:57:49 +02:00
Ali Mohammad Pur
09eb28ee1d
LibRegex: Better estimate the cost of laying out alts as a chain
...
Previously we were counting the total number of *nodes* in the tree for
the chain cost, which greatly underestimated its cost when large
bytecode entries were present,
This commit switches to estimating it using the total bytecode *size*,
which is a closer value to the true cost than the tree node count.
This corresponds to a ~4x perf improvement on /<script|<style|<link/ in
speedometer.
2025-04-23 22:57:49 +02:00
Ali Mohammad Pur
446a453719
LibRegex: Pull out the first compare to avoid unnecessary execution
...
This adds a fast-path to drop view indices we know will not match
immediately without going through the regex VM.
2025-04-18 17:09:27 +02:00
Ali Mohammad Pur
76f5dce3db
LibRegex: Flatten capture group list in MatchState
...
This makes copying the capture group COWVector significantly cheaper,
as we no longer have to run any constructors for it - just memcpy.
2025-04-18 17:09:27 +02:00
Ali Mohammad Pur
69050da929
LibRegex: Merge inverse string table mappings separately
2025-04-06 20:21:16 +02:00
Ali Mohammad Pur
4136d8d13e
LibRegex: Use an interned string table for capture group names
...
This avoids messing around with unsafe string pointers and removes the
only non-FlyString-able user of DeprecatedFlyString.
2025-04-02 11:43:13 +02:00
Ali Mohammad Pur
5355710481
LibRegex: Don't treat single-jump blocks as noop in the optimizer
2025-03-09 14:37:57 +01:00
Ali Mohammad Pur
ea3b7efd91
LibRegex: Treat the UnicodeSets flag as Unicode
...
Fixes /.../v not being interpreted as a unicode pattern.
2025-02-28 14:31:45 -05:00
mikiubo
8a6f7b787e
LibRegex: Use depth-first search in regex optimizer
...
use depth-first search in optimizer code bacause using breadth-first
search generate a bug. Add test example in test lib.
2025-02-25 00:09:20 +01:00
Ali Mohammad Pur
08ebfaff17
LibRegex: Take trailing inversion state into account in block comparison
...
Fixes #3421 .
2025-02-01 11:30:02 +01:00
Ali Mohammad Pur
50733c564c
LibRegex: Use the *actually* correct repeat start offset for Repeat
...
Fixes #2931 and various frequent crashes.
2024-12-23 13:13:52 +01:00
Ali Mohammad Pur
358378c1c0
LibRegex: Pick the right target for OpCode_Repeat
...
Repeat's 'offset' field is a bit odd in that it is treated as a negative
offset, causing a backwards jump when positive; the optimizer didn't
correctly model this behaviour, which caused crashes and misopts when
dealing with Repeats.
This commit fixes that behaviour.
2024-12-13 10:00:16 +01:00
Ali Mohammad Pur
4a8d3e35a3
LibRegex: Add some more debugging info to bytecode block ranges
...
These were getting difficult to differentiate, now they each get a
comment on where they came from to aid with future debugging.
2024-12-13 10:00:16 +01:00
Ali Mohammad Pur
5a4d657a4e
LibRegex: Avoid generating ForkJumps when jumping to the next alt block
...
Fixes #2398 .
2024-11-17 20:12:39 +01:00
Ali Mohammad Pur
00bc22c332
LibRegex: Don't immediately ignore TempInverse in optimizer
...
fe46b2c141 added the reset-temp-inverse flag, but set it up so all
tempinverse ops were negated at the start of the next op; this commit
makes it so these flags actually persist for one op and not zero.
Fixes #2296 .
2024-11-17 09:03:29 -05:00
Ali Mohammad Pur
dabd60180f
LibRegex: Don't ignore references that weren't bound in checked blocks
...
Fixes #2281 .
2024-11-12 10:37:57 +01:00
Timothy Flynn
93712b24bf
Everywhere: Hoist the Libraries folder to the top-level
2024-11-10 12:50:45 +01:00