mirror of
https://github.com/LadybirdBrowser/ladybird.git
synced 2026-06-18 15:52:21 +00:00
Replace the C++ HTML tokenizer with a Rust implementation behind the existing HTMLTokenizer API. Keep the parser-facing integration points for streaming input, insertion points, document.write(), EOF insertion, parser aborts, speculative parser input, and last start tag tracking. The generated FFI handle stays an implementation detail of HTMLTokenizer, so callers keep a single tokenizer class. Preserve duplicate attributes through FFI so C++ token normalization can record the duplicate-attribute signal used by CSP nonce checks. Keep bulk tag-name and attribute scans capped at the active insertion point so streamed parser input is spliced at the right offset. Use generated DAFSA tables for named character references and intern common tag and attribute names to reduce FFI marshalling overhead. This also fixes attribute name source positions, nested old insertion points, and aborted fast-path handling. TestHTMLTokenizer covers duplicate attributes and insertion points in fast tag-name, attribute-name, and quoted-value scans. A CSP text test covers duplicate nonce attributes on parser-created script elements. The tokenizer dump fixtures still match, TestHTMLTokenizer passes, and the full release test-web run passes with 6981 tests and 226 skipped. |
||
|---|---|---|
| .. | ||
| src | ||
| build.rs | ||
| Cargo.toml | ||
| cbindgen.toml | ||