When parsing an SVG image loaded via <img>, the XMLDocumentBuilder's
document_end() would spin the event loop waiting for scripts, load
events, and other page lifecycle milestones that are irrelevant for
a decoded SVG image. Skip this entirely, as SVG-as-image operates
in secure static mode where external resources are also disabled.
For XHTML documents, resolve named character entities (e.g., )
using the HTML entity table via a getEntity SAX callback. This avoids
parsing a large embedded DTD on every document and matches the approach
used by Blink and WebKit.
This also removes the now-unused DTD infrastructure:
- Remove resolve_external_resource callback from Parser::Options
- Remove resolve_xml_resource() function and its ~60KB embedded DTD
- Remove all call sites passing the unused callback
This change replaces our LibXML parser with a new implementation that
wraps libxml2's SAX2 API.
The new Parser class uses libxml2's SAX2 callbacks to drive the existing
XML::Listener interface. That preserves backward compatibility with all
existing consumers (XMLDocumentBuilder, DOMParser, etc.).
Prevents observably calling Trusted Types, which can run arbitrary JS,
cause crashes due to use of MUST and allow arbitrary JS to modify
internal elements.
`set_source` takes a ByteString but the implementation might require a
specific encoding. Make it fallible so that we don't need to crash in
the case of invalid UTF-8 or similar.
The test includes a sequence of invalid UTF-8 bytes that crash the
browser without this change.
Using the new hooks in the XML Parser's listener interface, we now
append DOM nodes for CDATASections and ProcessingInstructions
to the document as they are encountered. This commit also fixes where
comment nodes are appended, ensuring they are added to the current node
instead of the document root.
When the XML parser appends child nodes to a template element, it must
actually append the template element's contents. This special behavior
caused us to return to the wrong parent element after adding child
nodes to a template element, leading to a crash.
Documents created via DOMParser.parseFromString()
are parsed synchronously and do not participate in the
browsing context's loading pipeline.
This patch ensures that if the document has no browsing context
(i.e. was parsed via DOMParser),
its readiness is set to "complete" synchronously.
Fixes WPT:
domparsing/xmldomparser.html
Previously if we encountered any attributes with a namespace other than
`xml` or `xmlns`, we treated it as a parse error. Now, allow it as long
as it's been declared in the current context.
We also handle errors more gracefully - instead of exploding if setting
the namespace fails, treat it as an error and carry on.
Names like ":hi", "wow:", or "a🅱️c" are invalid, so early-out instead
of searching our namespace stack for matching prefixes.
And also rename the function because it's relevant to attributes too.
While the code that did this referred to the HTML spec, other browsers
appear not to have this behavior when parsing XML, and it breaks a WPT
subtest.
This change does not appear to break any tests, and fixes 1 WPT subtest.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root