Commit graph

20 commits

Author SHA1 Message Date
Aliaksandr Kalenik
f3ea882d6e LibWeb: Remove "signal to continue SHTQ" from document loading
This promise was previously used to signal the session history traversal
queue that it could continue processing, but is no longer needed.
2026-04-01 06:47:59 +02:00
Aliaksandr Kalenik
76d9cc4baf LibWeb: Replace spin_until in execute_script with deferred parser start
HTMLScriptElement::execute_script() and SVGScriptElement had spin_until
calls waiting for ready_to_run_scripts to become true. The race exists
because load_html_document() resolves the session history signal and
starts the parser in the same deferred_invoke — so the parser can hit a
<script> before update_for_history_step_application() sets the flag.

Instead of spinning, defer parser->run() until the document is ready.
Document gains a m_deferred_parser_start callback that is invoked when
set_ready_to_run_scripts() is called. The callback is cleared before
invocation to avoid reentrancy issues (parser->run() can synchronously
execute scripts). All three document loading paths (HTML, XML, text)
now check ready_to_run_scripts before starting the parser and defer if
needed.

create_document_for_inline_content() (used for error pages) now calls
set_ready_to_run_scripts() before mutating the document, ensuring the
invariant holds for all parser paths.

The spin_until calls are replaced with VERIFY assertions.
2026-03-29 01:05:35 +01:00
Zaggy1024
7994d27c8d LibWeb: Add missing Promise.h include in DocumentLoading.cpp 2026-03-02 17:06:39 -06:00
Andreas Kling
37bdcc3488 LibWeb: Support MIME type sniffing for streaming HTTP responses
Previously, when loading a document, we would try to sniff the MIME
type by reading from the response body's source. However, for streaming
HTTP responses, the body source is Empty (the data comes through the
stream instead), so we had no bytes to sniff.

This caused pages like hypr.land (which sends no Content-Type header)
to be misidentified as plain text instead of HTML, since the MIME
sniffing algorithm would receive zero bytes and fall back to the
default type.

The fix captures the first bytes of the response body during fetch,
storing them on the Body object. These bytes are the "resource header"
defined by the MIME Sniffing spec - up to 1445 bytes, which is enough
to identify any MIME type the spec can detect.

Since bytes may arrive asynchronously during streaming, we use a
callback mechanism: if bytes aren't ready yet when load_document()
needs them, it registers a callback that fires once enough bytes have
been captured (or the stream ends).

The flow is:
1. FetchedDataReceiver receives network bytes, buffers them
2. When Body is created, buffered bytes are flushed to Body's sniff
   buffer, and subsequent bytes are appended as they arrive
3. Before calling load_document(), Navigable waits for sniff bytes
4. load_document() passes the bytes to MimeSniff::Resource::sniff()
2026-01-24 15:21:26 +01:00
sideshowbarker
1b41659efd LibXML+LibWeb: Use existing HTML entities table for XML parsing too
For XHTML documents, resolve named character entities (e.g., &nbsp;)
using the HTML entity table via a getEntity SAX callback. This avoids
parsing a large embedded DTD on every document and matches the approach
used by Blink and WebKit.

This also removes the now-unused DTD infrastructure:

- Remove resolve_external_resource callback from Parser::Options
- Remove resolve_xml_resource() function and its ~60KB embedded DTD
- Remove all call sites passing the unused callback
2026-01-09 19:13:41 +00:00
sideshowbarker
fac81e84ba LibXML: Replace the existing XML parser with libxml2 parsing
This change replaces our LibXML parser with a new implementation that
wraps libxml2's SAX2 API.

The new Parser class uses libxml2's SAX2 callbacks to drive the existing
XML::Listener interface. That preserves backward compatibility with all
existing consumers (XMLDocumentBuilder, DOMParser, etc.).
2026-01-07 14:38:52 +01:00
Timothy Flynn
3dce6766a3 LibWeb: Extract some CORS and MIME Fetch helpers to their own files
An upcoming commit will migrate the contents of Headers.h/cpp to LibHTTP
for use outside of LibWeb. These CORS and MIME helpers depend on other
LibWeb facilities, however, so they cannot be moved.
2025-11-27 14:57:29 +01:00
Prajjwal
50a79c6af8 LibWeb: Change SessionHistoryTraversalQueue to use Promises
If multiple cross-document navigations are queued on
SessionHistoryTraversalQueue, running the next entry before the current
document load is finished may result in a deadlock. If the new document
has a navigable element of its own, it will append steps to SHTQ and
hang in nested spin_until.
This change uses promises to ensure that the current document loads
before the next entry is executed.

Fixes timeouts in the imported tests.

Co-authored-by: Sam Atkins <sam@ladybird.org>
2025-11-26 12:27:12 +01:00
Luke Wilde
82bd3d3891 LibWeb: Avoid invoking Trusted Types where avoidable
Prevents observably calling Trusted Types, which can run arbitrary JS,
cause crashes due to use of MUST and allow arbitrary JS to modify
internal elements.
2025-11-06 11:43:06 -05:00
Tete17
2fa84f1683 LibWeb: Properly propagate errors for Node set_text_content
This function was supposed to throw errors even before the TrustedTypes
spec thanks to the CharacterData replaceData call but had a MUST.

This changes this to ensure this function can throw an error
2025-10-27 16:14:20 +00:00
euro20179
e442aa6e10 LibWeb: Ensure parser cannot change the mode is handled
This fixes at least 1 wpt bug where text/plain documents are rendered in
quirks mode. The test in question: https://wpt.live/html/browsers/browsing-the-web/read-text/load-text-plain.html
2025-09-07 11:11:43 +01:00
Timothy Flynn
5c561c1a53 LibWeb: Port node text content to UTF-16 2025-07-28 18:30:50 +02:00
Timothy Flynn
8b6e3cb735 LibWeb+LibUnicode+WebContent: Port DOM:CharacterData to UTF-16
This replaces the underlying storage of CharacterData with Utf16String
and deals with the fallout.
2025-07-24 19:00:20 +02:00
Andrew Kaster
f9f854b493 LibWeb: Preserve comments in XML documents 2025-07-19 14:56:20 +02:00
Sam Atkins
9dbeecb73d LibWeb: Correct some spec typos
Corresponds to 285a58bf30
2025-04-10 04:01:37 +02:00
Sam Atkins
192cae17ee LibWeb/DOM: Update step 4 of load_document()
Corresponds to 2ab779b8e8

We don't implement most of what that touches, so the only actual change
is a couple of words added here.
2025-03-28 11:18:57 +00:00
Sam Atkins
c6a18f795d LibWeb/HTML: Pass user_involvement through navigables code
This corresponds to part of https://github.com/whatwg/html/pull/10818
2025-01-11 11:10:43 +01:00
Shannon Booth
f87041bf3a LibGC+Everywhere: Factor out a LibGC from LibJS
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:

 * JS::NonnullGCPtr -> GC::Ref
 * JS::GCPtr -> GC::Ptr
 * JS::HeapFunction -> GC::Function
 * JS::CellImpl -> GC::Cell
 * JS::Handle -> GC::Root
2024-11-15 14:49:20 +01:00
Shannon Booth
9b79a686eb LibJS+LibWeb: Use realm.create<T> instead of heap.allocate<T>
The main motivation behind this is to remove JS specifics of the Realm
from the implementation of the Heap.

As a side effect of this change, this is a bit nicer to read than the
previous approach, and in my opinion, also makes it a little more clear
that this method is specific to a JavaScript Realm.
2024-11-13 16:51:44 -05:00
Timothy Flynn
93712b24bf Everywhere: Hoist the Libraries folder to the top-level 2024-11-10 12:50:45 +01:00
Renamed from Userland/Libraries/LibWeb/DOM/DocumentLoading.cpp (Browse further)