ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2026-04-19 10:20:22 +00:00

Author	SHA1	Message	Date
sideshowbarker	1b41659efd	LibXML+LibWeb: Use existing HTML entities table for XML parsing too For XHTML documents, resolve named character entities (e.g.,  ) using the HTML entity table via a getEntity SAX callback. This avoids parsing a large embedded DTD on every document and matches the approach used by Blink and WebKit. This also removes the now-unused DTD infrastructure: - Remove resolve_external_resource callback from Parser::Options - Remove resolve_xml_resource() function and its ~60KB embedded DTD - Remove all call sites passing the unused callback	2026-01-09 19:13:41 +00:00
sideshowbarker	cfe5ef32e1	LibXML: Add element-nesting depth limit for XML-parsed documents This change adds a limit of 5000 on the count for how deeply elements can be nested in documents parsed with our XML parser. Blink and WebKit both have such a limit, and both set it at 5000. This prevents bad actors from performing attacks by giving us XML docs with pathological levels of nesting, and causing stack exhaustion.	2026-01-08 14:49:12 +01:00
Tim Ledbetter	a48aa62b7a	LibXML: Prevent auto-detection of UTF-32 encoding by `libxml2`	2026-01-08 10:06:40 +01:00
sideshowbarker	fac81e84ba	LibXML: Replace the existing XML parser with libxml2 parsing This change replaces our LibXML parser with a new implementation that wraps libxml2's SAX2 API. The new Parser class uses libxml2's SAX2 callbacks to drive the existing XML::Listener interface. That preserves backward compatibility with all existing consumers (XMLDocumentBuilder, DOMParser, etc.).	2026-01-07 14:38:52 +01:00
rmg-x	b9554038ff	LibWeb+LibXML: Make `Listener::set_source(ByteString)` fallible `set_source` takes a ByteString but the implementation might require a specific encoding. Make it fallible so that we don't need to crash in the case of invalid UTF-8 or similar. The test includes a sequence of invalid UTF-8 bytes that crash the browser without this change.	2025-10-02 02:25:28 +02:00
Andreas Kling	b7595013c1	LibWeb+LibXML: Preserve element attribute order in XML documents We now use OrderedHashMap instead of HashMap to ensure that attributes on XML elements retain their original order.	2025-08-22 11:35:59 +02:00
Timothy Flynn	28d9d3a2c7	AK+Libraries: Reduce API surface of GenericLexer a bit * Remove completely unused methods. * Deduplicate methods that were overloaded with both StringView and char const* parameters. A future commit will templatize GenericLexer by char type. This patch serves to make that a tiny bit easier.	2025-08-13 09:56:13 -04:00
Andrew Kaster	d9976b98b9	LibXML: Add parser hooks for CDATASection and ProcessingInstructions This allows listeners to be notified when a CDATASection or ProcessingInstruction is encountered during parsing. The non-listener path still has the incorrect behavior of silently treating CDATASection as Text nodes, but this allows listeners to handle them correctly.	2025-07-19 14:56:20 +02:00
Timothy Flynn	62d9a84b8d	AK+Everywhere: Replace custom number parsers with fast_float Our floating point number parser was based on the fast_float library: https://github.com/fastfloat/fast_float However, our implementation only supports 8-bit characters. To support UTF-16, we will need to be able to convert char16_t-based strings to numbers as well. This works out-of-the-box with fast_float. We can also use fast_float for integer parsing.	2025-07-03 09:51:56 -04:00
mikiubo	cd576e594d	LibXml: Notify listener when doctype is parsed	2025-01-20 14:48:19 +01:00
Timothy Flynn	488034477a	Revert "LibWeb: Set doctype node immediately while parsing XML document" This reverts commit `cd446e5e9c`. This broke about 20k WPT subtests, all related to XML parsing. See: https://wpt.fyi/results/html/the-xhtml-syntax/parsing-xhtml-documents?diff=&filter=ADC&run_id=5154815472828416&run_id=5090731742199808	2024-11-20 19:11:56 -05:00
Andreas Kling	cd446e5e9c	LibWeb: Set doctype node immediately while parsing XML document Instead of deferring it to the end of parsing, where scripts that were expecting to look at the doctype may have already run.	2024-11-20 16:10:57 +01:00
Timothy Flynn	93712b24bf	Everywhere: Hoist the Libraries folder to the top-level	2024-11-10 12:50:45 +01:00

13 commits