ladybird/Libraries/LibWeb/Rust
Andreas Kling 171e3adf01 LibWeb: Replace the HTML tokenizer with Rust
Replace the C++ HTML tokenizer with a Rust implementation behind the
existing HTMLTokenizer API.

Keep the parser-facing integration points for streaming input,
insertion points, document.write(), EOF insertion, parser aborts,
speculative parser input, and last start tag tracking. The generated
FFI handle stays an implementation detail of HTMLTokenizer, so callers
keep a single tokenizer class.

Preserve duplicate attributes through FFI so C++ token normalization can
record the duplicate-attribute signal used by CSP nonce checks. Keep
bulk tag-name and attribute scans capped at the active insertion point
so streamed parser input is spliced at the right offset.

Use generated DAFSA tables for named character references and intern
common tag and attribute names to reduce FFI marshalling overhead. This
also fixes attribute name source positions, nested old insertion points,
and aborted fast-path handling.

TestHTMLTokenizer covers duplicate attributes and insertion points in
fast tag-name, attribute-name, and quoted-value scans. A CSP text test
covers duplicate nonce attributes on parser-created script elements.
The tokenizer dump fixtures still match, TestHTMLTokenizer passes, and
the full release test-web run passes with 6981 tests and 226 skipped.
2026-05-15 21:01:40 +02:00
..
src LibWeb: Replace the HTML tokenizer with Rust 2026-05-15 21:01:40 +02:00
build.rs
Cargo.toml LibWeb: Replace the HTML tokenizer with Rust 2026-05-15 21:01:40 +02:00
cbindgen.toml