The end goal here is for LibHTTP to be the home of our RFC 9111 (HTTP
caching) implementation. We currently have one implementation in LibWeb
for our in-memory cache and another in RequestServer for our disk cache.
The implementations both largely revolve around interacting with HTTP
headers. But in LibWeb, we are using Fetch's header infra, and in RS we
are using are home-grown header infra from LibHTTP.
So to give these a common denominator, this patch replaces the LibHTTP
implementation with Fetch's infra. Our existing LibHTTP implementation
was not particularly compliant with any spec, so this at least gives us
a standards-based common implementation.
This migration also required moving a handful of other Fetch AOs over
to LibHTTP. (It turns out these AOs were all from the Fetch/Infra/HTTP
folder, so perhaps it makes sense for LibHTTP to be the implementation
of that entire set of facilities.)
The only thing in HTTPResponse being used is reason_phrase_for_code,
which is just a static helper method. Move it to its own file and remove
HTTPResponse.
This is just one less thing to have to port to an upcoming HTTP header
refactor.
We could probably do with removing LoadRequest altogether. But this just
removes unused methods for now to make an upcoming HTTP header change a
bit simpler.
Disallow calling `StringBase::bytes()` on temporaries to avoid returning
`ReadonlyBytes` that outlive the underlying string.
With this change, we catch a real UAF:
`load_result.data = maybe_response.release_value().bytes();`
All other updated call sites were already safe, they just needed to use
an intermediate named variable to satisfy the new lvalue-only
requirement.
Previously, unbuffered requests were only available as a special mode
for EventSource. With this change, they are enabled by default, which
means chunks can be read from the stream as soon as they arrive.
This unlocks some interesting possibilities, such as starting to parse
HTML documents before the entire response has been received (that, in
turn, allows us to initiate subresource fetches earlier or begin
executing scripts sooner), or start rendering videos before they are
fully downloaded.
Co-authored-by: Timothy Flynn <trflynn89@pm.me>
Global Privacy Control aims to be a replacement for Do Not Track. DNT
ended up not being a great solution, as it wasn't enforced by law. This
actually resulted in the DNT header serving as an extra fingerprinting
data point.
GPC is becoming enforced by law in USA states such as California and
Colorado. CA is further working on a bill which requires that browsers
implement such an opt-out preference signal (OOPS):
https://cppa.ca.gov/announcements/2025/20250911.html
This patch replaces DNT with GPC and hooks up the associated settings.
JavaScript module requests (in a non-worker context) always have CORS
enabled. However, CORS requests are only allowed for same-origin or
HTTP/S requests. This patch extends this to allow resource:// requests
from opaque origins (e.g. about: URLs).
We must also set the Access-Control-Allow-Origin header to "null" to
ensure that the response is accepted by the CORS checks. This does not
affect requesting resource:// URLs from resource:// URLs as those are
same-origin and skip CORS checks.
This ultimately enables requesting resource:// JS modules from the
about:settings page.
Start work on a speculative HTML Parser in Swift. This component will
walk ahead of the normal HTML parser looking for fetch() requests to
make while the normal parser is blocked. This work exposed many holes in
the Swift C++ interop component, which have been reported upstream.
This commit:
- Prevents path traversal via the about: scheme
- Prevents loading about:inspector
- Requires about: URIs to be opaque paths
- Prevents crashes with invalid percent encoded paths
We only need a Page for file:// urls. At some point we probably
needed it for other kinds of requests, but the current functionality
doesn't need to store the Page pointer on the ResourceLoader.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root
These changes are arbitrarily divided into multiple commits to make it
easier to find potentially introduced bugs with git bisect.Everything:
The modifications in this commit were automatically made using the
following command:
find . -name '*.cpp' -exec sed -i -E 's/dbg\(\) << ("[^"{]*");/dbgln\(\1\);/' {} \;
This patch adds a global (per-process) filter list to LibWeb that is
used to filter all outgoing resource load requests.
Basically we check the URL against a list of filter patterns and if
it's a match for any one of them, we immediately fail the load.
The filter list is a simple text file:
~/.config/BrowserContentFilters.txt
It's one filter per line and they are simple glob filters for now,
with implicit asterisks (*) at the start and end of the line.
This patchset makes ProtocolServer stream the downloads to its client
(LibProtocol), and as such changes the download API; a possible
download lifecycle could be as such:
notation = client->server:'>', server->client:'<', pipe activity:'*'
```
> StartDownload(GET, url, headers, {})
< Response(0, fd 8)
* {data, 1024b}
< HeadersBecameAvailable(0, response_headers, 200)
< DownloadProgress(0, 4K, 1024)
* {data, 1024b}
* {data, 1024b}
< DownloadProgress(0, 4K, 2048)
* {data, 1024b}
< DownloadProgress(0, 4K, 1024)
< DownloadFinished(0, true, 4K)
```
Since managing the received file descriptor is a pain, LibProtocol
implements `Download::stream_into(OutputStream)`, which can be used to
stream the download into any given output stream (be it a file, or
memory, or writing stuff with a delay, etc.).
Also, as some of the users of this API require all the downloaded data
upfront, LibProtocol also implements `set_should_buffer_all_input()`,
which causes the download instance to buffer all the data until the
download is complete, and to call the `on_buffered_download_finish`
hook.