This patch introduces a cookie cache in the WebContent process to reduce
blocking IPC calls when JS accesses document.cookie. The UI process now
maintains a cookie version counter per-domain in shared memory. When JS
reads document.cookie, we check whether we have a valid cached cookie by
comparing the current shared version to the last used version. If they
match, the cached cookie is returned without IPC.
This optimization is based on Chromium's shared versioning, in which it
was observed that 87% of document.cookie accesses were redundant. See:
https://blog.chromium.org/2024/06/introducing-shared-memory-versioning-to.html
Note that this cache only supports document.cookie, not HTTP Cookie
headers. HTTP cookies are attached to requests with varying URLs and
paths. The cookies that match the document URL might not match the
request URL, which we wouldn't know from WebContent. So attaching the
cached document cookie would be incorrect.
On https://twinings.co.uk, we see approximately 600 document.cookie
requests while the page loads. This patch reduces the time spent in
the document.cookie getter from ~45ms to 2-3ms.
By creating a new FetchController here, we prevent the original
FetchController from affecting the actual ongoing request. This is
necessary to allow FetchController::stop_fetch() to stop an ongoing
fetch in the subsequent commit.
The purpose of the copy appears to be only to change the
httpFetchParams' request over to the newly-cloned httpRequest, so all
existing references should refer back to the originals rather than new
instances.
Previously, when loading a document, we would try to sniff the MIME
type by reading from the response body's source. However, for streaming
HTTP responses, the body source is Empty (the data comes through the
stream instead), so we had no bytes to sniff.
This caused pages like hypr.land (which sends no Content-Type header)
to be misidentified as plain text instead of HTML, since the MIME
sniffing algorithm would receive zero bytes and fall back to the
default type.
The fix captures the first bytes of the response body during fetch,
storing them on the Body object. These bytes are the "resource header"
defined by the MIME Sniffing spec - up to 1445 bytes, which is enough
to identify any MIME type the spec can detect.
Since bytes may arrive asynchronously during streaming, we use a
callback mechanism: if bytes aren't ready yet when load_document()
needs them, it registers a callback that fires once enough bytes have
been captured (or the stream ends).
The flow is:
1. FetchedDataReceiver receives network bytes, buffers them
2. When Body is created, buffered bytes are flushed to Body's sniff
buffer, and subsequent bytes are appended as they arrive
3. Before calling load_document(), Navigable waits for sniff bytes
4. load_document() passes the bytes to MimeSniff::Resource::sniff()
We now partition the HTTP disk cache based on the Vary response header.
If a cached response contains a Vary header, we look for each of the
header names in the outgoing HTTP request. The outgoing request must
match every header value in the original request for the cache entry
to be used; otherwise, a new request will be issued, and a separate
cache entry will be created.
Note that we must now defer creating the disk cache file itself until we
have received the response headers. The Vary key is computed from these
headers, and affects the partitioned disk cache file name.
There are further optimizations we can make here. If we have a Vary
mismatch, we could find the best candidate cached response and issue a
conditional HTTP request. The content server may then respond with an
HTTP 304 if the mismatched request headers are actually okay. But for
now, if we have a Vary mismatch, we issue an unconditional request as
a purely correctness-oriented patch.
If the cache mode is no-store, we must not interact with the cache at
all.
If the cache mode is reload, we must not use any cached response.
If the cache-mode is only-if-cached or force-cache, we are permitted
to respond with stale cache responses.
Note that we currently cannot test only-if-cached in test-web. Setting
this mode also requires setting the cors mode to same-origin, but our
http-test-server infra requires setting the cors mode to cors.
Returning null here results in the fetch cache mode becoming hard-set to
no-store. This means the HTTP cache cannot be consulted nor updated. A
future commit will make our disk cache respect this flag, as this is a
valid client-provided value. So instead of setting this flag when the
memory cache is disabled, let's move the check to where the cache later
becomes consulted/updated.
We currently will perform some revalidation from both WebContent and
RequestServer. For simplicity's sake, now that the memory cache only
holds fresh responses, let's remove revalidation handling from the
WebContent process. If a memory-cached response is stale, it's fine
to just forward that request to RequestServer. It will then either
be served by disk cache, or revalidated at that point.
Once a cache entry is not fresh, we now remove it from the memory cache.
We will avoid handling revalidation from within WebContent. Instead, we
will just forward the request to RequestServer, where the disk cache
will handle revalidation for itself if needed.
When fully reading a response body fails, the transform stream's flush
algorithm (which calls processResponseEndOfBody) never runs since flush
only executes on successful stream completion.
This left report_timing_steps null, causing a crash when
process_response_consume_body tried to call report_timing.
The fetch spec doesn't explicitly handle this case. We work around it
by extracting the report timing setup (steps 1-3 of
processResponseEndOfBody) into a helper that we call from both the
success path (via flush) and the error path (via processBodyError).
This is an ad-hoc fix that deviates slightly from the spec, but it
ensures failed fetches still produce useful timing data and don't crash.
Propagate the request initiator type (e.g., "xmlhttprequest", "fetch",
"script", "stylesheet") from LibWeb through the IPC layer to DevTools.
This enables Firefox DevTools to correctly identify XHR/fetch requests
and display appropriate cause types in the Network panel's "Initiator"
column.
We currently do not handle responses for range requests at all in our
HTTP caches. This means if we issue a request for a range of bytes=1-10,
that response will be served to a subsequent request for a range of
bytes=10-20. This is obviously invalid - so until we handle these
requests, just don't cache them for now.
The in-memory HTTP Fetch cache currently keeps the realm which created
each cache entry alive indefinitely. This patch migrates this cache to
LibHTTP, to ensure it is completely unaware of any JS objects.
Now that we are not interacting with Fetch response objects, we can no
longer use Streams infrastructure to pipe the response body into the
Fetch response. Fetch also ultimately creates the cache response once
the HTTP response headers have arrived. So the LibHTTP cache will hold
entries in a pending list until we have received the entire response
body. Then it is moved to a completed list and may be used thereafter.
There are a couple of remaining RFC 9111 methods in LibWeb's Fetch, but
these are currently directly tied to the way we store GC-allocated HTTP
response objects. So de-coupling that is left as a future exercise.
The end goal here is for LibHTTP to be the home of our RFC 9111 (HTTP
caching) implementation. We currently have one implementation in LibWeb
for our in-memory cache and another in RequestServer for our disk cache.
The implementations both largely revolve around interacting with HTTP
headers. But in LibWeb, we are using Fetch's header infra, and in RS we
are using are home-grown header infra from LibHTTP.
So to give these a common denominator, this patch replaces the LibHTTP
implementation with Fetch's infra. Our existing LibHTTP implementation
was not particularly compliant with any spec, so this at least gives us
a standards-based common implementation.
This migration also required moving a handful of other Fetch AOs over
to LibHTTP. (It turns out these AOs were all from the Fetch/Infra/HTTP
folder, so perhaps it makes sense for LibHTTP to be the implementation
of that entire set of facilities.)
An upcoming commit will migrate the contents of Headers.h/cpp to LibHTTP
for use outside of LibWeb. These CORS and MIME helpers depend on other
LibWeb facilities, however, so they cannot be moved.
The spec declares these as a byte sequence, which we then implemented as
a ByteBuffer. This has become pretty awkward to deal with, as evidenced
by the plethora of `MUST(ByteBuffer::copy(...))` and `.bytes()` calls
everywhere inside Fetch. We would then treat the bytes as a string
anyways by wrapping them in StringView everywhere.
We now store these as a ByteString. This is more comfortable to deal
with, and we no longer need to continually copy underlying storage (as
ByteString is ref-counted).
This work is largely preparatory for an upcoming HTTP header refactor.
Generally just define things in the order they are declared (will make a
change to use ByteString in this file a bit easier to follow). Also make
a couple of free functions be class methods on Header / HeaderList.
Disallow calling `StringBase::bytes()` on temporaries to avoid returning
`ReadonlyBytes` that outlive the underlying string.
With this change, we catch a real UAF:
`load_result.data = maybe_response.release_value().bytes();`
All other updated call sites were already safe, they just needed to use
an intermediate named variable to satisfy the new lvalue-only
requirement.
Previously, unbuffered requests were only available as a special mode
for EventSource. With this change, they are enabled by default, which
means chunks can be read from the stream as soon as they arrive.
This unlocks some interesting possibilities, such as starting to parse
HTML documents before the entire response has been received (that, in
turn, allows us to initiate subresource fetches earlier or begin
executing scripts sooner), or start rendering videos before they are
fully downloaded.
Co-authored-by: Timothy Flynn <trflynn89@pm.me>
Our HTTP disk cache is currently manually tested against various sites.
This patch adds some tests to cover various scenarios, including non-
cacheable responses, expired responses, and revalidation.
In order to ensure we hit the disk cache in RequestServer, we must
disable the in-memory cache in WebContent.
Global Privacy Control aims to be a replacement for Do Not Track. DNT
ended up not being a great solution, as it wasn't enforced by law. This
actually resulted in the DNT header serving as an extra fingerprinting
data point.
GPC is becoming enforced by law in USA states such as California and
Colorado. CA is further working on a bill which requires that browsers
implement such an opt-out preference signal (OOPS):
https://cppa.ca.gov/announcements/2025/20250911.html
This patch replaces DNT with GPC and hooks up the associated settings.
Previously we depended on an associated document on the ESO to get to
the page, but Workers do not have documents. However, we can simply get
to the page with `principal_host_defined_page`, removing the issue.
This required some changes in LibURL & LibIPC since it has its own
definition of an BlobURLEntry. For now, we don't have a concrete usage
of MediaSource in LibURL so it is defined as an empty struct.
This removes one FIXME in an idl file.
This is an editoral change from the fetch spec. Since we already defined
the stream before it being used this only re-numbers the spec steps.
It also corrects a minor typo ('followings' to 'following') which was
corrected in the same editoral spec change.