Commit graph

18 commits

Author SHA1 Message Date
Timothy Flynn
21bbbacd07 LibHTTP+RequestServer: Move the HTTP cache implementation to LibHTTP
We currently have two ongoing implementations of RFC 9111, HTTP caching.
In order to consolidate these, this patch moves the implementation from
RequestServer to LibHTTP for re-use within LibWeb.
2025-11-29 08:35:02 -05:00
Timothy Flynn
9375660b64 LibHTTP+LibWeb+RequestServer: Move Fetch's HTTP header infra to LibHTTP
The end goal here is for LibHTTP to be the home of our RFC 9111 (HTTP
caching) implementation. We currently have one implementation in LibWeb
for our in-memory cache and another in RequestServer for our disk cache.

The implementations both largely revolve around interacting with HTTP
headers. But in LibWeb, we are using Fetch's header infra, and in RS we
are using are home-grown header infra from LibHTTP.

So to give these a common denominator, this patch replaces the LibHTTP
implementation with Fetch's infra. Our existing LibHTTP implementation
was not particularly compliant with any spec, so this at least gives us
a standards-based common implementation.

This migration also required moving a handful of other Fetch AOs over
to LibHTTP. (It turns out these AOs were all from the Fetch/Infra/HTTP
folder, so perhaps it makes sense for LibHTTP to be the implementation
of that entire set of facilities.)
2025-11-27 14:57:29 +01:00
Timothy Flynn
4470f94129 RequestServer: Store a couple of data hashes in disk cache entries
We currently have a FIXME to validate cached data with a crc32. But this
is sort of a non-starter, because we never actually have the cached data
in-memory - we transfer it to the WebContent process via system calls,
and it never reaches userspace in RequestServer.

Chrome makes a bit of an educated gamble here. They assume cosmic bit
blips are extremely unlikely, thus the cached data does not get verified
with a hash. Instead, they store non-cryptographic hashes of some select
fields, and they validate just those hashes.

Here, we store a hash of the cache key in the cache header, and a hash
of the cache header in the cache footer. With these validations, along
with other validations already in-place, we can be reasonably sure we
are not sending corrupt data to the WebContent process.
2025-11-21 08:48:42 +01:00
Timothy Flynn
4de3f77d37 RequestServer: Add a hook to advance a request's clock time for testing
For example, we will want to be able to test that a cached object was
expired after N seconds. Rather than waiting that time during testing,
this adds a testing-only request header to internally advance the clock
for a single HTTP request.
2025-11-20 09:33:49 +01:00
Timothy Flynn
0020af37cd RequestServer: Do not pack the disk cache header and footer structures
This is causing misaligned reads with Address Sanitizer enabled. We
could maintain the packed attribute, and deal with alignment - but we
ended up not actually needing these to be packed anyways. The only thing
we need to know is the serialized size of the header, which we can just
determine differently.
2025-11-20 09:33:49 +01:00
Andreas Kling
837d5fb7ea RequestServer: Add heuristic cacheability and freshness per RFC 9111
This commit extends is_cacheable() to allow storage of responses that
rely on heuristic cacheability, including status codes defined as
heuristically cacheable.

We implement heuristic freshness lifetime calculation based on the
Last-Modified header as guidance, and apply it when no explicit
expiration information is present.
2025-11-15 10:07:49 -05:00
Timothy Flynn
3f61f0f189 RequestServer: Add a time parameter to the clear cache endpoint
This allows removing cache entries last accessed since a provided
timestamp.
2025-11-12 09:06:21 -05:00
Timothy Flynn
a4e3890c05 RequestServer: Implement stale cache revalidation
When a request becomes stale, we will now issue a revalidation request
(if the response indicates it may be revalidated). We do this by issuing
a normal fetch request, with If-None-Match and/or If-Modified-Since
request headers.

If the server replies with an HTTP 304 status, we update the stored
response headers to match the 304's headers, and serve the response to
the client from the cache.

If the server replies with any other code, we remove the cache entry.
We will open a new cache entry to cache the new response, if possible.
2025-11-02 13:03:29 -05:00
Timothy Flynn
3d45a209b6 RequestServer: Rename CacheEntryReader::m_headers to m_response_headers
Let's be extra clear what we're talking about here.
2025-11-02 13:03:29 -05:00
Timothy Flynn
20cd19be4d RequestServer: Store HTTP response headers in the cache index
We currently store response headers in the cache entry file, before the
response body. When we implement cache revalidation, we will need to
update the stored response headers with whatever headers are received
in a 304 response. It's not unlikely that those headers will have a size
that differs from the stored headers. We would then have to rewrite the
entire response body after the new headers.

Instead of dealing with those inefficiencies, let's instead store the
response headers in the cache index. This will allow us to update the
headers with a simple SQL query.
2025-11-02 13:03:29 -05:00
Timothy Flynn
bf7c5cdf07 RequestServer: Create a cache metadata table
This currently just holds a cache version. If we make a breaking change
to the cache format, we can increment the version here to wipe the cache
database. This is certainly a blunt hammer approach; we will want to
handle such changes more gracefully when we can.
2025-11-02 13:03:29 -05:00
Timothy Flynn
5384f84550 RequestServer: Create disk cache writers for new requests immediately
We previously waited until we received all response headers before we
would create the cache entry. We now create one immediately, and handle
writing the headers in its own function. This will allow us to know if
a cache entry writer already exists for a given cache key, and thus
prevent creating a second writer at the same time.
2025-10-28 11:52:51 +01:00
Timothy Flynn
d67dc23960 RequestServer: Fix typo in CacheEntry::close_and_destroy_cache_entry 2025-10-28 11:52:51 +01:00
Timothy Flynn
6cf22c424e RequestServer: Remove extra verbose disk cache log entry
This isn't particularly useful anymore, and is especially verbose for
large responses.
2025-10-28 11:52:51 +01:00
Timothy Flynn
dc10c28b57 RequestServer: Remove erroneous placeholders from dbgln statements
Apparently Services aren't compiled with ENABLE_COMPILETIME_FORMAT_CHECK
2025-10-28 11:52:51 +01:00
Timothy Flynn
fc9233f198 RequestServer: Delete unreadable cache files (for now)
If we are unable to pipe the response body from a cache file to the
client, let's take the extra safe approach of deleting the cache file
for now. We already remove the file if we weren't able to read its
metadata during initialization.
2025-10-16 09:06:48 -04:00
Timothy Flynn
163e8e5b44 LibWebView+RequestServer: Support clearing the HTTP disk cache
This is a bit of a blunt hammer, but this hooks an action to clear the
HTTP disk cache into the existing Clear Cache action. Upon invocation,
it stops all existing cache entries from making further progress, and
then deletes the entire cache index and all cache files.

In the future, we will of course want more fine-grained control over
cache deletion, e.g. via an about:history page.
2025-10-14 13:40:33 +02:00
Timothy Flynn
3516a2344f LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).

The cache is broken into 2 categories of files:

1. An index file. This is a SQL database containing metadata about each
   cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
   amalgamation of all info needed to reconstruct an HTTP response. This
   includes the status code, headers, body, etc.

A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.

Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.

Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-14 13:40:33 +02:00