Commit graph

12 commits

Author SHA1 Message Date
Andreas Kling
7b2bb4d49c LibWeb: Fix race in MIME sniff bytes when response completes early
When a streaming HTTP response completes before set_body() is called
on the FetchedDataReceiver, the sniff bytes would never be marked as
complete, causing navigation to hang intermittently.

The sequence that triggers this:

1. handle_network_bytes(data, Ongoing) runs, but m_body is null,
   so bytes only go into the FetchedDataReceiver's own m_buffer.
2. handle_network_bytes({}, Complete) runs, but m_body is still
   null, so the m_body->set_sniff_bytes_complete() call is skipped.
3. set_body(body) is called, which flushes m_buffer into the body
   via append_sniff_bytes(), but never marks them as complete.
4. populate_session_history_entry_document() calls
   sniff_bytes_if_available() on the body. Since the source is
   Empty (streaming) and m_sniff_bytes_complete is false, it
   returns no value.
5. The async path registers a callback via wait_for_sniff_bytes(),
   but since the stream already completed, no more data arrives,
   and the callback never fires. Navigation hangs.

Fix this by checking the lifecycle state in set_body(). If we have
already moved past the Receiving state, the Complete was already
processed and we need to mark sniff bytes as complete now.
2026-03-08 11:39:41 +01:00
Andreas Kling
37bdcc3488 LibWeb: Support MIME type sniffing for streaming HTTP responses
Previously, when loading a document, we would try to sniff the MIME
type by reading from the response body's source. However, for streaming
HTTP responses, the body source is Empty (the data comes through the
stream instead), so we had no bytes to sniff.

This caused pages like hypr.land (which sends no Content-Type header)
to be misidentified as plain text instead of HTML, since the MIME
sniffing algorithm would receive zero bytes and fall back to the
default type.

The fix captures the first bytes of the response body during fetch,
storing them on the Body object. These bytes are the "resource header"
defined by the MIME Sniffing spec - up to 1445 bytes, which is enough
to identify any MIME type the spec can detect.

Since bytes may arrive asynchronously during streaming, we use a
callback mechanism: if bytes aren't ready yet when load_document()
needs them, it registers a callback that fires once enough bytes have
been captured (or the stream ends).

The flow is:
1. FetchedDataReceiver receives network bytes, buffers them
2. When Body is created, buffered bytes are flushed to Body's sniff
   buffer, and subsequent bytes are appended as they arrive
3. Before calling load_document(), Navigable waits for sniff bytes
4. load_document() passes the bytes to MimeSniff::Resource::sniff()
2026-01-24 15:21:26 +01:00
Timothy Flynn
d3041dc054 LibHTTP+LibWeb: Support the HTTP Vary response header
We now partition the HTTP disk cache based on the Vary response header.
If a cached response contains a Vary header, we look for each of the
header names in the outgoing HTTP request. The outgoing request must
match every header value in the original request for the cache entry
to be used; otherwise, a new request will be issued, and a separate
cache entry will be created.

Note that we must now defer creating the disk cache file itself until we
have received the response headers. The Vary key is computed from these
headers, and affects the partitioned disk cache file name.

There are further optimizations we can make here. If we have a Vary
mismatch, we could find the best candidate cached response and issue a
conditional HTTP request. The content server may then respond with an
HTTP 304 if the mismatched request headers are actually okay. But for
now, if we have a Vary mismatch, we issue an unconditional request as
a purely correctness-oriented patch.
2026-01-22 08:54:49 -05:00
Timothy Flynn
bf7b812d0b LibHTTP+LibWeb: Store the in-memory HTTP cache without JS realms
The in-memory HTTP Fetch cache currently keeps the realm which created
each cache entry alive indefinitely. This patch migrates this cache to
LibHTTP, to ensure it is completely unaware of any JS objects.

Now that we are not interacting with Fetch response objects, we can no
longer use Streams infrastructure to pipe the response body into the
Fetch response. Fetch also ultimately creates the cache response once
the HTTP response headers have arrived. So the LibHTTP cache will hold
entries in a pending list until we have received the entire response
body. Then it is moved to a completed list and may be used thereafter.
2025-12-21 08:59:31 -06:00
Timothy Flynn
d08bd14928 LibWeb: Accumulate all network bytes in FetchedDataReceiver
This will allow us to hand off the bytes to the HTTP memory cache.
2025-12-21 08:59:31 -06:00
Aliaksandr Kalenik
3058274386 LibWeb: Use unbuffered network requests for all Fetch requests
Previously, unbuffered requests were only available as a special mode
for EventSource. With this change, they are enabled by default, which
means chunks can be read from the stream as soon as they arrive.

This unlocks some interesting possibilities, such as starting to parse
HTML documents before the entire response has been received (that, in
turn, allows us to initiate subresource fetches earlier or begin
executing scripts sooner), or start rendering videos before they are
fully downloaded.

Co-authored-by: Timothy Flynn <trflynn89@pm.me>
2025-11-20 06:29:13 -05:00
Andreas Kling
03256a2543 LibWeb: Add "parallel queue" and allow it as fetch task destination
Note that it's not actually executing tasks in parallel, it's still
throwing them on the HTML event loop task queue, each with its own
unique task source.

This makes our fetch implementation a lot more robust when HTTP caching
is enabled, and you can now click links on https://terminal.shop/
without hitting TODO assertions in fetch.
2025-07-17 00:13:39 +02:00
Timothy Flynn
a9ddd427cb LibWeb: Move ReadableStream AOs into their own file
The main streams AO file has gotten very large, and is a bit difficult
to navigate. In an effort to improve DX, this migrates ReadableStream
AOs to their own file. And the helper classes used for the tee and pipe-
to operations are also in their own files.
2025-04-18 06:55:40 -04:00
Shannon Booth
3f572d9ab7 LibWeb/Streams: Move ReadableStream functions out of AbstractOperations
These are not defined in the abstract operations section of the spec and
are the publically exported Stream APIs exposed on ReadableStream.
2024-12-11 15:11:21 +01:00
Jelle Raaijmakers
1514197e36 LibWeb: Remove dom_ from dom_exception_to_throw_completion
We're not converting `WebIDL::DOMException`, but `WebIDL::Exception`
instead.
2024-12-09 20:02:51 -07:00
Shannon Booth
f87041bf3a LibGC+Everywhere: Factor out a LibGC from LibJS
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:

 * JS::NonnullGCPtr -> GC::Ref
 * JS::GCPtr -> GC::Ptr
 * JS::HeapFunction -> GC::Function
 * JS::CellImpl -> GC::Cell
 * JS::Handle -> GC::Root
2024-11-15 14:49:20 +01:00
Timothy Flynn
93712b24bf Everywhere: Hoist the Libraries folder to the top-level 2024-11-10 12:50:45 +01:00
Renamed from Userland/Libraries/LibWeb/Fetch/Fetching/FetchedDataReceiver.cpp (Browse further)