Let's not attempt to cache entries that are excessively large. We limit
the cache data size to be 1/8 of the total disk cache limit, with a cap
of 256 MiB.
Rather than letting our disk cache grow unbounded, let's impose a limit
on the estimated total disk cache size. The limits chosen are vaguely
inspired by Chromium.
We impose a total disk cache limit of 5 GiB. Chromium imposes an overall
limit of 1.25 GiB; I've chosen more here because we currently cache
uncompressed data from cURL.
The limit is further restricted by the amount of available disk space,
which we just check once at startup (as does Chromium). We will choose a
percentage of the free space available on systems with limited space.
Our eviction errs on the side of simplicity. We will remove the least
recently accessed entries until the total estimated cache size does not
exceed our limit. This could potentially be improved in the future. For
example, if the next entry to consider is 40 MiB, and we only need to
free 1 MiB of space, we could try evicting slightly more recently used
entries. This would prevent evicting more than we need to.
Apparently, sqlite will fill this placeholder value in with NULL if we
do not pass a value. The query being executed here is:
UPDATE CacheIndex
SET last_access_time = ?
WHERE cache_key = ? AND vary_key = ?;
HeaderList::delete involves a Vector::remove_all_matching internally.
So if an exempt header appeared again later in the header list, we would
be accessing the name string of the previously deleted header.
We now partition the HTTP disk cache based on the Vary response header.
If a cached response contains a Vary header, we look for each of the
header names in the outgoing HTTP request. The outgoing request must
match every header value in the original request for the cache entry
to be used; otherwise, a new request will be issued, and a separate
cache entry will be created.
Note that we must now defer creating the disk cache file itself until we
have received the response headers. The Vary key is computed from these
headers, and affects the partitioned disk cache file name.
There are further optimizations we can make here. If we have a Vary
mismatch, we could find the best candidate cached response and issue a
conditional HTTP request. The content server may then respond with an
HTTP 304 if the mismatched request headers are actually okay. But for
now, if we have a Vary mismatch, we issue an unconditional request as
a purely correctness-oriented patch.
We need to store request headers in order to handle Vary mismatches.
(Note we should also be using BLOB for header storage in sqlite, as they
are not necessarily UTF-8.)
A future commit will format memory cache debug messages similarly to the
disk cache messages. To make it easy to read them both at a glance when
both debug flags are turned on, let's add a prefix to these messages.
We currently have two ongoing implementations of RFC 9111, HTTP caching.
In order to consolidate these, this patch moves the implementation from
RequestServer to LibHTTP for re-use within LibWeb.
2025-11-29 08:35:02 -05:00
Renamed from Services/RequestServer/Cache/CacheIndex.cpp (Browse further)