ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2026-04-19 02:10:26 +00:00

Author	SHA1	Message	Date
Shannon Booth	0b946f39b2	Meta: Remove ENABLE_RUST build configuration option This now is required to be set for the browser to function.	2026-04-02 22:59:42 +02:00
Andreas Kling	f627b7dcbb	LibRegex: Respect V8 astral literal lastIndex behavior Preserve V8's behavior for bare single-astral literals when a unicode global search starts in the middle of a surrogate pair. We were snapping that lastIndex back to the pair start unconditionally, which let /😀/gu and /\u{1F600}/gu match where V8 returns null. Expose that literal shape from LibRegex to LibJS and add runtime coverage for the bare literal case alongside a grouped control.	2026-03-27 17:32:19 +01:00
Andreas Kling	dc2e9bbe91	LibRegex: Avoid widening ASCII regex input Teach the Rust matcher to execute directly on ASCII-backed input. Make the VM and literal fast paths generic over an input trait so we can monomorphize separate ASCII and WTF-16 execution paths without duplicating the regex semantics. Add ASCII-specific FFI entry points and have the C++ bridge dispatch to them whenever Utf16View carries ASCII storage. This removes the per-match widening step from the hot path for exec(), test(), and find_all(), which is exactly where LibJS often hands us pure ASCII strings in 8-bit form. Keep the compiled representation and reported capture offsets in UTF-16 code units so the observable JavaScript behavior stays unchanged.	2026-03-27 17:32:19 +01:00
Andreas Kling	66fb0a8394	LibRegex/Rust: Add the ECMA-262 regex engine Add LibRegex's new Rust ECMAScript regular expression engine. Replace the old parser's direct pattern-to-bytecode pipeline with a split architecture: parse patterns into a lossless AST first, then lower that AST into bytecode for a dedicated backtracking VM. Keep the syntax tree as the place for validation, analysis, and optimization instead of teaching every transformation to rewrite partially built bytecode. Specialize this backend for the job LibJS actually needs. The old C++ engine shared one generic parser and matcher stack across ECMA-262 and POSIX modes and supported both byte-string and UTF-16 inputs. The new engine focuses on ECMA-262 semantics on WTF-16 data, which lets it model lone surrogates and other JavaScript-specific behavior directly instead of carrying POSIX and multi-encoding constraints through the whole implementation. Fill in the ECMAScript features needed to replace the old engine for real web workloads: Unicode properties and sets, lookahead and lookbehind, named groups and backreferences, modifier groups, string properties, large quantifiers, lone surrogates, and the parser and VM corner cases those features exercise. Reshape the runtime around compile-time pattern hints and a hotter VM loop. Pre-resolve Unicode properties, derive first-character, character-class, and simple-scan filters, extract safe trailing literals for anchored patterns, add literal and literal-alternation fast paths, and keep reusable scratch storage for registers, backtracking state, and modifier stacks. Teach `find_all` to stay inside one VM so global searches stop paying setup costs on every match. Make those shortcuts semantics-aware instead of merely fast. In Unicode mode, do not use literal fast paths for lone surrogates, since ECMA-262 must not let `/\ud83d/u` match inside a surrogate pair. Likewise, only derive end-anchor suffix hints when the suffix lies on every path to `Match`, so lookarounds and disjunctions cannot skip into a shared tail and produce false negatives. This commit lands the Rust crate, the C++ wrapper, the build integration, and the initial LibJS-side plumbing needed to exercise the new engine under real RegExp callers before removing the legacy backend.	2026-03-27 17:32:19 +01:00

4 commits