mirror of
https://github.com/LadybirdBrowser/ladybird.git
synced 2026-06-28 04:00:33 +00:00
Preserve leading BOMs when parsing already-decoded HTML strings, since those strings do not go through the encoded byte decoder path. Decoded markup from JS strings can also contain WTF-8 for lone surrogate code units. Keep the common scalar UTF-8 path to a single validation and copy, but replace surrogates before handing bytes to the Rust tokenizer. Add text coverage for DOMParser and innerHTML string parsing, including leading BOMs, text and attributes, lone high and low surrogates, and a valid surrogate pair. |
||
|---|---|---|
| .. | ||
| Rust | ||
| Entities.cpp | ||
| Entities.h | ||
| Entities.json | ||
| HTMLEncodingDetection.cpp | ||
| HTMLEncodingDetection.h | ||
| HTMLParser.cpp | ||
| HTMLParser.h | ||
| HTMLToken.cpp | ||
| HTMLToken.h | ||
| HTMLTokenizer.cpp | ||
| HTMLTokenizer.h | ||
| IncrementalDocumentParser.cpp | ||
| IncrementalDocumentParser.h | ||
| ParserScriptingMode.h | ||
| SpeculativeHTMLParser.cpp | ||
| SpeculativeHTMLParser.h | ||