When a streaming HTTP response completes before set_body() is called
on the FetchedDataReceiver, the sniff bytes would never be marked as
complete, causing navigation to hang intermittently.
The sequence that triggers this:
1. handle_network_bytes(data, Ongoing) runs, but m_body is null,
so bytes only go into the FetchedDataReceiver's own m_buffer.
2. handle_network_bytes({}, Complete) runs, but m_body is still
null, so the m_body->set_sniff_bytes_complete() call is skipped.
3. set_body(body) is called, which flushes m_buffer into the body
via append_sniff_bytes(), but never marks them as complete.
4. populate_session_history_entry_document() calls
sniff_bytes_if_available() on the body. Since the source is
Empty (streaming) and m_sniff_bytes_complete is false, it
returns no value.
5. The async path registers a callback via wait_for_sniff_bytes(),
but since the stream already completed, no more data arrives,
and the callback never fires. Navigation hangs.
Fix this by checking the lifecycle state in set_body(). If we have
already moved past the Receiving state, the Complete was already
processed and we need to mark sniff bytes as complete now.
Previously, when loading a document, we would try to sniff the MIME
type by reading from the response body's source. However, for streaming
HTTP responses, the body source is Empty (the data comes through the
stream instead), so we had no bytes to sniff.
This caused pages like hypr.land (which sends no Content-Type header)
to be misidentified as plain text instead of HTML, since the MIME
sniffing algorithm would receive zero bytes and fall back to the
default type.
The fix captures the first bytes of the response body during fetch,
storing them on the Body object. These bytes are the "resource header"
defined by the MIME Sniffing spec - up to 1445 bytes, which is enough
to identify any MIME type the spec can detect.
Since bytes may arrive asynchronously during streaming, we use a
callback mechanism: if bytes aren't ready yet when load_document()
needs them, it registers a callback that fires once enough bytes have
been captured (or the stream ends).
The flow is:
1. FetchedDataReceiver receives network bytes, buffers them
2. When Body is created, buffered bytes are flushed to Body's sniff
buffer, and subsequent bytes are appended as they arrive
3. Before calling load_document(), Navigable waits for sniff bytes
4. load_document() passes the bytes to MimeSniff::Resource::sniff()
We now partition the HTTP disk cache based on the Vary response header.
If a cached response contains a Vary header, we look for each of the
header names in the outgoing HTTP request. The outgoing request must
match every header value in the original request for the cache entry
to be used; otherwise, a new request will be issued, and a separate
cache entry will be created.
Note that we must now defer creating the disk cache file itself until we
have received the response headers. The Vary key is computed from these
headers, and affects the partitioned disk cache file name.
There are further optimizations we can make here. If we have a Vary
mismatch, we could find the best candidate cached response and issue a
conditional HTTP request. The content server may then respond with an
HTTP 304 if the mismatched request headers are actually okay. But for
now, if we have a Vary mismatch, we issue an unconditional request as
a purely correctness-oriented patch.
The in-memory HTTP Fetch cache currently keeps the realm which created
each cache entry alive indefinitely. This patch migrates this cache to
LibHTTP, to ensure it is completely unaware of any JS objects.
Now that we are not interacting with Fetch response objects, we can no
longer use Streams infrastructure to pipe the response body into the
Fetch response. Fetch also ultimately creates the cache response once
the HTTP response headers have arrived. So the LibHTTP cache will hold
entries in a pending list until we have received the entire response
body. Then it is moved to a completed list and may be used thereafter.
Previously, unbuffered requests were only available as a special mode
for EventSource. With this change, they are enabled by default, which
means chunks can be read from the stream as soon as they arrive.
This unlocks some interesting possibilities, such as starting to parse
HTML documents before the entire response has been received (that, in
turn, allows us to initiate subresource fetches earlier or begin
executing scripts sooner), or start rendering videos before they are
fully downloaded.
Co-authored-by: Timothy Flynn <trflynn89@pm.me>
Note that it's not actually executing tasks in parallel, it's still
throwing them on the HTML event loop task queue, each with its own
unique task source.
This makes our fetch implementation a lot more robust when HTTP caching
is enabled, and you can now click links on https://terminal.shop/
without hitting TODO assertions in fetch.
The main streams AO file has gotten very large, and is a bit difficult
to navigate. In an effort to improve DX, this migrates ReadableStream
AOs to their own file. And the helper classes used for the tee and pipe-
to operations are also in their own files.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root