ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2026-06-28 04:00:33 +00:00

Author	SHA1	Message	Date
Andreas Kling	f47b6f3270	LibWeb: Reduce CSS parser token memory usage Store CSS token payloads in a variant so each token only carries the state needed by its type. Keep delimiter, number, hash, string, and dimension data separate instead of storing every possible payload on every token. Use a smaller component-value token for function and block boundary metadata. These component values only need token type, original source text, and source positions, so avoid embedding full token payload storage inside every Function and SimpleBlock. Shrink CSS source positions to explicit 32-bit counters. Guard the C++ and Rust tokenizer paths against overflow. Add size assertions for the hot Token and ComponentValue types so future growth is intentional.	2026-06-13 14:57:52 +02:00
Sam Atkins	df41e7a1cf	LibWeb/CSS: Move Token::Position into SourcePosition There's nothing about this that's specific to Tokens, and moving it makes it easier to use for other types. We'll need this for the following commits.	2026-06-04 20:54:33 +01:00
R-Goc	520a7c8ebd	AK+LibWeb: Centralize FFI helper functions This commit creates a central FFIHelpers.h header which implements common conversions from FFI.	2026-05-28 14:15:43 -05:00
Andreas Kling	f4960d9d7d	LibWeb: Honor requested CSS tokenizer encoding Keep decoded CSS text separate from tokenizer byte input. CSSOM and already-decoded stylesheet text preserve code point preprocessing, so a lone surrogate maps to one replacement character instead of being re-decoded as malformed UTF-8 bytes. Decode tokenizer byte input with the requested encoding unless that encoding is UTF-8 and the byte stream is strictly valid UTF-8. Keep the fast path by constructing the decoded string without validating twice after strict validation succeeds. Preserve UTF-8 decoder behavior on the byte fast path by stripping an initial UTF-8 BOM and rejecting encoded surrogate bytes. Invalid UTF-8 still goes through the decoder. Add tokenizer coverage for both the C++ and Rust backends across decoded text, UTF-8 aliases, BOM-prefixed input, invalid UTF-8, and non-UTF requested encodings.	2026-05-18 14:08:22 +02:00
Andreas Kling	355fb6b825	LibWeb: Stream Rust CSS tokenizer tokens over FFI Avoid building a temporary Rust token vector before calling back into C++. The tokenizer now invokes the callback as each token is produced, while borrowing the already-filtered input for source slices. Reserve an initial C++ token capacity from the input size so the common path avoids repeated growth while appending the converted tokens. With this change, the Rust CSS tokenizer is now ~1.3x faster than the C++ CSS tokenizer at churning through all the https://vercel.com/ CSS.	2026-05-03 17:22:17 +02:00
Sam Atkins	4278194d96	LibWeb/CSS: Port the CSS Tokenizer to Rust test-css-tokenizer is updated to run both the C++ and Rust tokenizers and compare their output, to ensure they behave identically. The Parser still uses the C++ Tokenizer. The LibWeb crate, FFI layer etc are all based on the existing ones for other libraries. This is a direct AI translation to get us started, and not idiomatic Rust. Future work can be done to make it more sensible.	2026-05-03 09:49:00 +02:00

6 commits