ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2026-04-18 18:00:31 +00:00

Author	SHA1	Message	Date
Andreas Kling	50b137f527	LibJS: Reject mixed surrogate forms in RegExp names Reject surrogate pairs in named group names unless both halves come from the same raw form. A literal surrogate half was being normalized into \uXXXX before LibRegex parsed the pattern, which let mixed literal and escaped forms sneak through. Validate surrogate handling on the UTF-16 pattern before normalization, but only treat \k<...> as a named backreference when the parser would do that too. Legacy regexes without named groups still use \k as an identity escape, so their literal text must not be rejected by the pre-scan. Add runtime and syntax tests for the mixed forms, the valid literal, fixed-width, and braced escape cases, and the legacy \k literals.	2026-03-31 15:59:04 +02:00
Andreas Kling	e243e146de	LibJS+LibRegex: Switch RegExp over to the Rust engine Switch LibJS `RegExp` over to the Rust-backed `ECMAScriptRegex` APIs. Route `new RegExp()`, regex literals, and the RegExp builtins through the new compile and exec APIs, and stop re-validating patterns with the deleted C++ parser on the way in. Preserve the observable error behavior by carrying structured compile errors and backtracking-limit failures across the FFI boundary. Cache compiled regex state and named capture metadata on `RegExpObject` in the new representation. Use the new API surface to simplify and speed up the builtin paths too: share `exec_internal`, cache compiled regex pointers, keep the legacy RegExp statics lazy, run global replace through batch `find_all`, and optimize replace, test, split, and String helper paths. Add regression tests for those JavaScript-visible paths.	2026-03-27 17:32:19 +01:00
aplefull	d95baf67f6	LibJS: Prevent escaped surrogates from combining in Unicode regexes Escaped surrogate sequences should not combine with adjacent literal surrogates in Unicode mode. We now use `\u{XXXX}` braces instead of `\uXXXX` when escaping code units in Unicode mode, so LibRegex treats each as a standalone code point. Also prevent GenericLexer from combining `\uXXXX` and `\u{XXXX}`.	2026-02-26 13:50:11 +01:00
aplefull	cd4ac4f30f	LibJS: Escape line terminators in regex source	2025-10-24 13:24:55 -04:00
Aliaksandr Kalenik	a54215c07d	LibJS: Make `internal_define_own_property()` save added property offset ...in `PropertyDescriptor`. This is required for the upcoming change that needs to know offset of newly added properties to set up inline caching.	2025-09-17 12:44:44 +02:00
Timothy Flynn	70db474cf0	LibJS+LibWeb: Port interned bytecode strings to UTF-16 This was almost a no-op, except we intern JS exception messages. So the bulk of this patch is porting exception messages to UTF-16.	2025-08-14 10:27:08 +02:00
Timothy Flynn	62d85dd90a	LibJS: Port RegExp flags and patterns to UTF-16	2025-08-13 09:56:13 -04:00
Jelle Raaijmakers	5d19aacce7	LibJS: Do not directly append RegExp pattern code points during parse There apparently is a bit of a disconnect between the spec asking us to construct the pattern using code points and LibRegex not being able to swallow those. Whenever we had multi-byte code points in the pattern and tried to match that in unicode mode, we would fail. Change the parser to encode all non-ASCII code units. Fixes 2 test262 cases in `language/literals/regexp`.	2025-07-22 01:23:52 +02:00
Timothy Flynn	9582895759	AK+LibJS+LibWeb+LibRegex: Replace AK::Utf16Data with AK::Utf16String	2025-07-18 12:45:38 -04:00
Ali Mohammad Pur	eea81738cd	AK+Everywhere: Recognise that surrogates in utf16 aren't all that common For the slight cost of counting code points when converting between encodings and a teeny bit of memory, this commit adds a fast path for all-happy utf-16 substrings and code point operations. This seems to be a significant chunk of time spent in many regex benchmarks.	2025-04-23 07:56:02 -06:00
Andreas Kling	46a5710238	LibJS: Use FlyString in PropertyKey instead of DeprecatedFlyString This required dealing with substantial fallout.	2025-03-24 22:27:17 +00:00
Shannon Booth	f87041bf3a	LibGC+Everywhere: Factor out a LibGC from LibJS Resulting in a massive rename across almost everywhere! Alongside the namespace change, we now have the following names: * JS::NonnullGCPtr -> GC::Ref * JS::GCPtr -> GC::Ptr * JS::HeapFunction -> GC::Function * JS::CellImpl -> GC::Cell * JS::Handle -> GC::Root	2024-11-15 14:49:20 +01:00
Shannon Booth	9b79a686eb	LibJS+LibWeb: Use realm.create<T> instead of heap.allocate<T> The main motivation behind this is to remove JS specifics of the Realm from the implementation of the Heap. As a side effect of this change, this is a bit nicer to read than the previous approach, and in my opinion, also makes it a little more clear that this method is specific to a JavaScript Realm.	2024-11-13 16:51:44 -05:00
Timothy Flynn	93712b24bf	Everywhere: Hoist the Libraries folder to the top-level	2024-11-10 12:50:45 +01:00
Andreas Kling	13d7c09125	Libraries: Move to Userland/Libraries/	2021-01-12 12:17:46 +01:00
Andreas Kling	f48751a739	LibJS: Remove hand-rolled Object is_foo() helpers in favor of RTTI	2021-01-01 17:46:39 +01:00
AnotherTest	8ba273a2f3	LibJS: Hook up Regex<ECMA262> to RegExpObject and implement `test()' This makes RegExpObject compile and store a Regex<ECMA262>, adds all flag-related properties, and implements `RegExpPrototype.test()` (complete with 'lastIndex' support) :^) It should be noted that this only implements `test()' using the builtin `exec()'.	2020-11-27 21:32:41 +01:00
AnotherTest	3200ff5f4f	LibJS+js: Rename RegExp.{content => pattern} The spec talks about it as 'pattern', so let's use that instead.	2020-11-27 21:32:41 +01:00
Linus Groh	e163db248d	LibJS: Implement RegExp.prototype.toString() as standalone function This should not just inherit Object.prototype.toString() (and override Object::to_string()) but be its own function, i.e. 'RegExp.prototype.toString !== Object.prototype.toString'.	2020-11-04 19:33:49 +01:00
Linus Groh	2e2571743b	LibJS: Use string::formatted() in to_string() functions	2020-10-04 19:22:02 +02:00
Andreas Kling	2bc5bc64fb	LibJS: Remove a whole bunch of includes of <LibJS/Interpreter.h>	2020-09-27 20:26:58 +02:00
Andreas Kling	591b7b7031	LibJS: Remove js_string(Interpreter&, ...)	2020-09-27 20:26:58 +02:00
Ben Wiederhake	08f9bc26a6	Meta+LibHTTP through LibWeb: Make clang-format-10 clean	2020-09-25 21:18:17 +02:00
Andreas Kling	ba641e97d9	LibJS: Clarify Object (base class) construction somewhat Divide the Object constructor into three variants: - The regular one (takes an Object& prototype) - One for use by GlobalObject - One for use by objects without a prototype (e.g ObjectPrototype)	2020-06-23 17:21:53 +02:00
Andreas Kling	64513f3c23	LibJS: Move native objects towards two-pass construction To make sure that everything is set up correctly in objects before we start adding properties to them, we split cell allocation into 3 steps: 1. Allocate a cell of appropriate size from the Heap 2. Call the C++ constructor on the cell 3. Call initialize() on the constructed object The job of initialize() is to define all the initial properties. Doing it in a second pass guarantees that the Object has a valid Shape and can find its own GlobalObject.	2020-06-20 15:46:30 +02:00
Matthew Olsson	61ac1d3ffa	LibJS: Lex and parse regex literals, add RegExp objects This adds regex parsing/lexing, as well as a relatively empty RegExpObject. The purpose of this patch is to allow the engine to not get hung up on parsing regexes. This will aid in finding new syntax errors (say, from google or twitter) without having to replace all of their regexes first!	2020-06-07 19:06:55 +02:00

26 commits