The end goal here is for LibHTTP to be the home of our RFC 9111 (HTTP
caching) implementation. We currently have one implementation in LibWeb
for our in-memory cache and another in RequestServer for our disk cache.
The implementations both largely revolve around interacting with HTTP
headers. But in LibWeb, we are using Fetch's header infra, and in RS we
are using are home-grown header infra from LibHTTP.
So to give these a common denominator, this patch replaces the LibHTTP
implementation with Fetch's infra. Our existing LibHTTP implementation
was not particularly compliant with any spec, so this at least gives us
a standards-based common implementation.
This migration also required moving a handful of other Fetch AOs over
to LibHTTP. (It turns out these AOs were all from the Fetch/Infra/HTTP
folder, so perhaps it makes sense for LibHTTP to be the implementation
of that entire set of facilities.)
An upcoming commit will migrate the contents of Headers.h/cpp to LibHTTP
for use outside of LibWeb. These CORS and MIME helpers depend on other
LibWeb facilities, however, so they cannot be moved.
Fixes a regression from commit:
f675cfe90f
It is not sufficient to only check if the builder is empty, as we will
then drop empty header values (when the first found value is empty).
This is tested in WPT by /cors/origin.htm, but that requires an HTTP
server.
The spec declares these as a byte sequence, which we then implemented as
a ByteBuffer. This has become pretty awkward to deal with, as evidenced
by the plethora of `MUST(ByteBuffer::copy(...))` and `.bytes()` calls
everywhere inside Fetch. We would then treat the bytes as a string
anyways by wrapping them in StringView everywhere.
We now store these as a ByteString. This is more comfortable to deal
with, and we no longer need to continually copy underlying storage (as
ByteString is ref-counted).
This work is largely preparatory for an upcoming HTTP header refactor.
Generally just define things in the order they are declared (will make a
change to use ByteString in this file a bit easier to follow). Also make
a couple of free functions be class methods on Header / HeaderList.
Disallow calling `StringBase::bytes()` on temporaries to avoid returning
`ReadonlyBytes` that outlive the underlying string.
With this change, we catch a real UAF:
`load_result.data = maybe_response.release_value().bytes();`
All other updated call sites were already safe, they just needed to use
an intermediate named variable to satisfy the new lvalue-only
requirement.
HTMLLinkElement is the final user of Resource/ResourceClient (used for
preloads and icons). This ports these link types to use fetch according
to the spec.
Preloads were particularly goofy because they would be stored in the
ResourceLoader's ad-hoc cache. But this cache was never consulted for
organic loads, thus were never used. There is more work to be done to
use these preloads within fetch, but for now they at least are stored
in fetch's HTTP cache for re-use.
This first pass only applies to the following two cases:
- Public functions returning a view type into an object they own
- Public ctors storing a view type
This catches a grand total of one (1) issue, which is fixed in
the previous commit.
Note that it's not actually executing tasks in parallel, it's still
throwing them on the HTML event loop task queue, each with its own
unique task source.
This makes our fetch implementation a lot more robust when HTTP caching
is enabled, and you can now click links on https://terminal.shop/
without hitting TODO assertions in fetch.
Which has an optmization if both size of the string being passed
through are FlyStrings, which actually ends up being the case
in some places during selector matching comparing attribute names.
Instead of maintaining more overloads of
Infra::is_ascii_case_insensitive_match, switch
everything over to equals_ignoring_ascii_case instead.
We currently store Web::Fetch::Infrastructure::Response objects in the
HTTP cache. They are associated with their original realm, but when we
use a cached response, we clone it into the target realm. For example,
two <iframe> objects loading the same HTML will be in different realms.
When we clone the response, we must use the target realm throughout the
entire cloning process. We neglected to do this for the cloned response
body stream, which is cloned via teeing. The result was the the stream
for the "cloned" response was created in the original realm, causing
issues down the line when reading from that stream tried to handle read
promises on behalf of the original realm. There are protections in place
to prevent this from happening, and the cached response read would never
complete.
The main streams AO file has gotten very large, and is a bit difficult
to navigate. In an effort to improve DX, this migrates ReadableStream
AOs to their own file. And the helper classes used for the tee and pipe-
to operations are also in their own files.
This is very clearly a very dangerous API to have, and was causing
a crash on Linux as a result of a stack use-after-free when visiting
https://www.index.hr/.
Fixes#3901
This is required to store Content Security Policies, as their
Directives are implemented as subclasses with overridden virtual
functions. Thus, they cannot be stored as generic Directive classes, as
it'll lose the ability to call overridden functions when they are
copied.
This currently uses a non spec-compliant property on the Response
object, which represents the time that the Response was created.
Setting this value allows `Performance.timeOrigin` to return a
reasonable value.
isomorphic encoding a value that has already been encoded will
result in garbage data. `response_headers` is already encoded in
ISO-8859-1/latin1, we cannot use `from_string_pair`, as it triggers
ISO-8859-1/latin1 encoding.
Follow-up of https://github.com/LadybirdBrowser/ladybird/pull/1893
By actually using streams, they get marked as disturbed and the
`.bodyUsed` API starts to work. Fixes at least 94 subtests in the WPT
`fetch/api/request` test suite.
Co-authored-by: Timothy Flynn <trflynn89@pm.me>
The spec for filtered responses states:
Unless stated otherwise a filtered response’s associated concepts
(such as its body) refer to the associated concepts of its internal
response.
This includes setting its associated concepts. In particular, when the
filtered response's body is set upon fetching a request with integrity
metadata, we must set the internal response's body instead.
Further restrictions that apply to filtered response subclasses (such as
opaque filtered responses having a status code of 0) are already
implemented.