The end goal here is for LibHTTP to be the home of our RFC 9111 (HTTP
caching) implementation. We currently have one implementation in LibWeb
for our in-memory cache and another in RequestServer for our disk cache.
The implementations both largely revolve around interacting with HTTP
headers. But in LibWeb, we are using Fetch's header infra, and in RS we
are using are home-grown header infra from LibHTTP.
So to give these a common denominator, this patch replaces the LibHTTP
implementation with Fetch's infra. Our existing LibHTTP implementation
was not particularly compliant with any spec, so this at least gives us
a standards-based common implementation.
This migration also required moving a handful of other Fetch AOs over
to LibHTTP. (It turns out these AOs were all from the Fetch/Infra/HTTP
folder, so perhaps it makes sense for LibHTTP to be the implementation
of that entire set of facilities.)
An upcoming commit will migrate the contents of Headers.h/cpp to LibHTTP
for use outside of LibWeb. These CORS and MIME helpers depend on other
LibWeb facilities, however, so they cannot be moved.
The spec declares these as a byte sequence, which we then implemented as
a ByteBuffer. This has become pretty awkward to deal with, as evidenced
by the plethora of `MUST(ByteBuffer::copy(...))` and `.bytes()` calls
everywhere inside Fetch. We would then treat the bytes as a string
anyways by wrapping them in StringView everywhere.
We now store these as a ByteString. This is more comfortable to deal
with, and we no longer need to continually copy underlying storage (as
ByteString is ref-counted).
This work is largely preparatory for an upcoming HTTP header refactor.
Generally just define things in the order they are declared (will make a
change to use ByteString in this file a bit easier to follow). Also make
a couple of free functions be class methods on Header / HeaderList.
Disallow calling `StringBase::bytes()` on temporaries to avoid returning
`ReadonlyBytes` that outlive the underlying string.
With this change, we catch a real UAF:
`load_result.data = maybe_response.release_value().bytes();`
All other updated call sites were already safe, they just needed to use
an intermediate named variable to satisfy the new lvalue-only
requirement.
Previously, unbuffered requests were only available as a special mode
for EventSource. With this change, they are enabled by default, which
means chunks can be read from the stream as soon as they arrive.
This unlocks some interesting possibilities, such as starting to parse
HTML documents before the entire response has been received (that, in
turn, allows us to initiate subresource fetches earlier or begin
executing scripts sooner), or start rendering videos before they are
fully downloaded.
Co-authored-by: Timothy Flynn <trflynn89@pm.me>
Our HTTP disk cache is currently manually tested against various sites.
This patch adds some tests to cover various scenarios, including non-
cacheable responses, expired responses, and revalidation.
In order to ensure we hit the disk cache in RequestServer, we must
disable the in-memory cache in WebContent.
Global Privacy Control aims to be a replacement for Do Not Track. DNT
ended up not being a great solution, as it wasn't enforced by law. This
actually resulted in the DNT header serving as an extra fingerprinting
data point.
GPC is becoming enforced by law in USA states such as California and
Colorado. CA is further working on a bill which requires that browsers
implement such an opt-out preference signal (OOPS):
https://cppa.ca.gov/announcements/2025/20250911.html
This patch replaces DNT with GPC and hooks up the associated settings.
Previously we depended on an associated document on the ESO to get to
the page, but Workers do not have documents. However, we can simply get
to the page with `principal_host_defined_page`, removing the issue.
This required some changes in LibURL & LibIPC since it has its own
definition of an BlobURLEntry. For now, we don't have a concrete usage
of MediaSource in LibURL so it is defined as an empty struct.
This removes one FIXME in an idl file.
This is an editoral change from the fetch spec. Since we already defined
the stream before it being used this only re-numbers the spec steps.
It also corrects a minor typo ('followings' to 'following') which was
corrected in the same editoral spec change.
Note that it's not actually executing tasks in parallel, it's still
throwing them on the HTML event loop task queue, each with its own
unique task source.
This makes our fetch implementation a lot more robust when HTTP caching
is enabled, and you can now click links on https://terminal.shop/
without hitting TODO assertions in fetch.
These are not associated with a javascript realm, so to avoid
confusion about which realm these need to be created in, make
all of these objects a GC::Cell, and deal with the fallout.
JavaScript module requests (in a non-worker context) always have CORS
enabled. However, CORS requests are only allowed for same-origin or
HTTP/S requests. This patch extends this to allow resource:// requests
from opaque origins (e.g. about: URLs).
We must also set the Access-Control-Allow-Origin header to "null" to
ensure that the response is accepted by the CORS checks. This does not
affect requesting resource:// URLs from resource:// URLs as those are
same-origin and skip CORS checks.
This ultimately enables requesting resource:// JS modules from the
about:settings page.
This will cause an exception to be thrown if user attempts to read from
the response stream of a failed request.
This is unfortunately not testable in CI. It requires a network response
(i.e. not a file:// URL). We also cannot import relevant WPT tests; they
exercise this condition with a python-generated response.
The main streams AO file has gotten very large, and is a bit difficult
to navigate. In an effort to improve DX, this migrates TransformStream
AOs to their own file.
The main streams AO file has gotten very large, and is a bit difficult
to navigate. In an effort to improve DX, this migrates ReadableStream
AOs to their own file. And the helper classes used for the tee and pipe-
to operations are also in their own files.
This is very clearly a very dangerous API to have, and was causing
a crash on Linux as a result of a stack use-after-free when visiting
https://www.index.hr/.
Fixes#3901
This is required to store Content Security Policies, as their
Directives are implemented as subclasses with overridden virtual
functions. Thus, they cannot be stored as generic Directive classes, as
it'll lose the ability to call overridden functions when they are
copied.
Instead of just putting in members directly, wrap them up in structs
which represent what a URL blob entry is meant to hold per the spec.
This makes more obvious what this is meant to represent, such as the
ByteBuffer being used to represent the bytes behind a Blob.
This also allows us to use a stronger type for a function that needs
to return a Blob URL entry's object.
This mistakenly implemented the 'piped to' operation on ReadableStream.
No functional difference as the caller was doing the extra work already
of 'piped through' vs 'piped to'.