Commit graph

22 commits

Author SHA1 Message Date
aplefull
5632a52531 LibRegex: Properly track code units in u-v modes
Previously, both string_position and view_index used code unit offsets
regardless of mode. Now in unicode mode, these variables track code
point positions while string_position_in_code_units is properly
updated to reflect code unit offsets.
2025-10-24 21:23:06 +02:00
aplefull
cd4ac4f30f LibJS: Escape line terminators in regex source 2025-10-24 13:24:55 -04:00
aplefull
7ce4abe330 LibRegex+LibUnicode: Add unicode string properties 2025-10-24 13:24:55 -04:00
aplefull
4b989b8efd LibRegex: Add support for forward references to named capture groups
This commit implements support for forward references to named capture
groups. We now allow patterns like \k<name>(?<name>x) and
self-references like (?<name>\k<name>x).
2025-10-16 16:37:54 +02:00
aplefull
25a47ceb1b LibRegex+LibJS: Include all named capture groups in source order
Previously, named capture groups in RegExp results did not always follow
their source order, and unmatched groups were omitted. According to the
spec, all named capture groups must appear in the result object in the
order they are defined, even if they did not participate in the match.
This commit makes sure we follow this requirement.
2025-10-16 16:37:54 +02:00
Timothy Flynn
04fe0c6aec LibJS: Create match indices based on code unit length
In Unicode mode, we were mixing code units (start_index) with code point
length (end_index).
2025-07-22 23:11:19 +02:00
Jelle Raaijmakers
5d19aacce7 LibJS: Do not directly append RegExp pattern code points during parse
There apparently is a bit of a disconnect between the spec asking us to
construct the pattern using code points and LibRegex not being able to
swallow those. Whenever we had multi-byte code points in the pattern and
tried to match that in unicode mode, we would fail.

Change the parser to encode all non-ASCII code units. Fixes 2 test262
cases in `language/literals/regexp`.
2025-07-22 01:23:52 +02:00
Timothy Flynn
efa9737cf7 AK+LibJS: Do not set UTF-16 code point length to its code unit length 2025-06-25 22:20:47 +02:00
Jess
83e46b3728 LibRegex: Fix crash when parse result exceeds max cache size
Before, If the cache was empty we would try and evict non-existant
entries and crash. So the fix is to make sure that we don't saturate
the cache with a single parse result.
2025-04-04 16:10:25 +02:00
Ali Mohammad Pur
ea3b7efd91 LibRegex: Treat the UnicodeSets flag as Unicode
Fixes /.../v not being interpreted as a unicode pattern.
2025-02-28 14:31:45 -05:00
Luke Wilde
105096e75a LibJS: Stop executing successful regex if it's past the end of the input
If the regex always matches the input, even if it's past the end, then
we need to stop execution of the regex when it's past the end. This
corresponds to step 13.a and prevents it from infinitely looping.

Reduced from: d98672060f/packages/react-i18n/src/utilities/money.ts (L10-L14)
2025-02-16 09:22:37 +01:00
Timothy Flynn
db87f173fb LibJS: Implement the RegExp.escape proposal
https://tc39.es/proposal-regex-escaping/
2024-12-05 13:56:21 +01:00
Timothy Flynn
93712b24bf Everywhere: Hoist the Libraries folder to the top-level 2024-11-10 12:50:45 +01:00
Andreas Kling
13d7c09125 Libraries: Move to Userland/Libraries/ 2021-01-12 12:17:46 +01:00
Linus Groh
5122f98198 Base+LibJS+LibWeb: Make prettier clean
Also use "// prettier-ignore" comments where necessary rather than
excluding whole files (via .prettierignore).
2020-12-27 21:25:27 +01:00
Linus Groh
8a9a7f1677 LibJS: Make RegExp.prototype.source spec-compliant
Basically:
- And edge case for this object being RegExp.prototype.source
- Return "(?:)" for empty pattern
- Escape some things properly
2020-11-28 01:20:11 +01:00
Linus Groh
b6e5442d55 LibJS: Make RegExp.prototype.toString() spec-compliant
It should use the 'source' and 'flags' properties of the object, and
therefore work with non-RegExp objects as well.
2020-11-28 01:20:11 +01:00
Linus Groh
ee66eaa1b0 LibJS: Make RegExp.prototype.flags spec-compliant
This should be using the individual flag boolean properties rather than
the [[OriginalFlags]] internal slot.
Use an enumerator macro here for brevity, this will be useful for other
things as well. :^)
2020-11-28 01:20:11 +01:00
Linus Groh
5cb45e4feb LibJS: Make RegExp() constructor spec-compliant
- Default values should depend on arguments being undefined, not being
  missing
- "(?:)" for empty pattern happens in RegExp.prototype.source, not the
  constructor
2020-11-28 01:20:11 +01:00
AnotherTest
210a3db44d LibJS: Implement `RegExpPrototype::exec()'
This implements *only* the builtin exec() function.
2020-11-27 21:32:41 +01:00
AnotherTest
8ba273a2f3 LibJS: Hook up Regex<ECMA262> to RegExpObject and implement `test()'
This makes RegExpObject compile and store a Regex<ECMA262>, adds
all flag-related properties, and implements `RegExpPrototype.test()`
(complete with 'lastIndex' support) :^)
It should be noted that this only implements `test()' using the builtin
`exec()'.
2020-11-27 21:32:41 +01:00
Linus Groh
e163db248d LibJS: Implement RegExp.prototype.toString() as standalone function
This should not just inherit Object.prototype.toString() (and override
Object::to_string()) but be its own function, i.e.
'RegExp.prototype.toString !== Object.prototype.toString'.
2020-11-04 19:33:49 +01:00