ladybird/Libraries/LibWeb/HTML/Parser
Andreas Kling ca97f68cb7 LibWeb: Normalize decoded HTML string parsing
Preserve leading BOMs when parsing already-decoded HTML strings, since
those strings do not go through the encoded byte decoder path.

Decoded markup from JS strings can also contain WTF-8 for lone surrogate
code units. Keep the common scalar UTF-8 path to a single validation and
copy, but replace surrogates before handing bytes to the Rust tokenizer.

Add text coverage for DOMParser and innerHTML string parsing, including
leading BOMs, text and attributes, lone high and low surrogates, and a
valid surrogate pair.
2026-05-24 10:14:17 +02:00
..
Rust LibWeb: Add Rust preload scanner 2026-05-18 00:23:52 +02:00
Entities.cpp LibWeb/HTML: Improve data structure of named character reference data 2025-07-14 09:43:08 +02:00
Entities.h LibWeb/HTML: Improve data structure of named character reference data 2025-07-14 09:43:08 +02:00
Entities.json LibWeb: Make named character references more spec-compliant & efficient 2025-03-22 16:03:44 +01:00
HTMLEncodingDetection.cpp LibWeb: Implement chardetng-based encoding detection for HTML parsing 2026-05-23 11:57:33 +02:00
HTMLEncodingDetection.h LibWeb: Implement chardetng-based encoding detection for HTML parsing 2026-05-23 11:57:33 +02:00
HTMLParser.cpp LibWeb: Normalize decoded HTML string parsing 2026-05-24 10:14:17 +02:00
HTMLParser.h LibWeb: Normalize decoded HTML string parsing 2026-05-24 10:14:17 +02:00
HTMLToken.cpp LibWeb: Track if element was created from token with dupe attributes 2025-07-09 15:52:54 -06:00
HTMLToken.h LibWeb: Remove the C++ HTML tree builder 2026-05-17 15:35:56 +02:00
HTMLTokenizer.cpp LibWeb: Normalize decoded HTML string parsing 2026-05-24 10:14:17 +02:00
HTMLTokenizer.h LibWeb: Normalize decoded HTML string parsing 2026-05-24 10:14:17 +02:00
IncrementalDocumentParser.cpp LibWeb: Add incremental HTML parsing 2026-04-29 04:12:44 +02:00
IncrementalDocumentParser.h LibWeb: Add incremental HTML parsing 2026-04-29 04:12:44 +02:00
ParserScriptingMode.h LibWeb: Set fragment scripting mode from the context document 2026-04-14 23:01:36 +02:00
SpeculativeHTMLParser.cpp LibWeb: Use Rust preload scanner 2026-05-18 00:23:52 +02:00
SpeculativeHTMLParser.h LibWeb: Use Rust preload scanner 2026-05-18 00:23:52 +02:00