Store JavaScript bytecode side data in the WebContent HTTP memory
cache and replay it when serving cached responses. Also update an
already-complete memory-cache entry when asynchronous bytecode cache
generation finishes, so the first source-only response does not keep
shadowing the disk-cache sidecar during same-process navigations.
Keep the HTTP memory-cache backfill keyed with the request headers that
populated the memory-cache entry, so Vary responses still receive their
generated bytecode sidecar.
Add LibHTTP coverage for round-tripping bytecode side data through a
memory-cache entry, attaching it after the response body has already
been cached, and matching Vary headers during updates. Add LibWeb
coverage for preserving the memory-cache request headers when cloning
responses.
This exposes the operation through RequestServer.ipc and RequestClient
so the client can create the stub before accessing a content-keyed
side-data file.
Write cache entries to a temporary file and rename them over the final
path only after the entry has been fully flushed. This keeps previously
mapped cache bodies tied to their original inode instead of exposing
them to truncation when the same cache key is replaced.
Clean up only the temporary file for incomplete writers so a failed
replacement does not remove the still-valid existing cache entry. Add
coverage that replaces an entry while an old body mapping remains live
and verifies that it still exposes the original bytes.
Map JavaScript bytecode cache sidecars from the HTTP disk cache instead
of copying them into anonymous shared buffers while handing response
headers to WebContent. Store the mapped data as ImmutableBytes on the
fetch response so script fetching can decode directly from the mapped
sidecar bytes.
Add LibHTTP coverage for retrieving associated cache data as a mappable
file, alongside the existing byte-buffer retrieval API.
Return the body length stored on CacheEntryReader instead of using the
footer field inherited from CacheEntry. Readers do not populate the
footer until validation, but RequestServer needs the size before then to
choose between pipe transfer and file-backed mapping.
Let CacheEntryWriter expose an explicit body-file handoff after a
successful flush. RequestServer sends that file to WebContent before the
request finishes, so Fetch can replace its retained memory-cache buffer
with a file-backed ImmutableBytes body.
Keep the plain flush path fd-free for existing callers, and add coverage
for mapping the body file returned by the writer. The focused disk and
memory cache web tests continue to pass.
Carry response chunks through LibRequests and ResourceLoader as a
ResponseData wrapper. A disk cache hit can retain its mapped storage,
while ordinary streamed chunks still use borrowed bytes.
Store completed memory cache entries as Core::ImmutableBytes. This lets
mapped disk-cache responses stay file-backed while retained by the HTTP
memory cache, instead of forcing another ByteBuffer copy.
Expose validated disk cache entry bodies as fd-backed byte ranges. Use
this for cache hits instead of creating a response socket.
Map received body ranges into ImmutableBytes on the client side. Keep
the streamed path for network responses. Adapt buffered callbacks to
receive ImmutableBytes, so cached responses can stay file-backed.
RequestServer, LibRequests, and LibWebView build with the new IPC
message and callback type.
Attach cached JavaScript bytecode sidecars to HTTP response headers so
WebContent can materialize classic and module scripts directly from a
decoded cache blob on cache hits.
Carry the disk cache vary key with the sidecar and reuse it when storing
fresh bytecode, avoiding mismatches against the augmented network
request headers used to create the cache entry.
Keep CORS-filtered module responses intact for status, MIME, and script
creation checks. Read bytecode sidecar data only from the internal
response, and treat decode or materialization failure as a cache miss
that falls back to normal source compilation.
Add disk cache helpers that store and retrieve sidecar payloads with
the same cache key and vary key as the HTTP response entry. The first
consumer is JavaScript bytecode.
Delete sidecars when the owning entry is removed, evicted, or replaced
by a fresh response. Count their on-disk size toward the cache budget so
bytecode data cannot grow outside eviction accounting.
Disk cache tests cover sidecar round-trips, replacement cleanup, and
eviction size accounting.
remove_entries_exceeding_cache_limit() is called after every network
response, but the cache is usually still under budget and nothing needs
to evict. Every one of those calls currently still runs the
window-function eviction SQL over the whole CacheIndex table just to
conclude there is nothing to do.
Short-circuit the call when the cache is already within its configured
size limit. To make that check cheap, maintain m_total_estimated_size
as a running total of the cache's estimated byte size, so the no-op
case becomes a single u64 compare and the DB is only touched when
there is real work.
Bookkeeping:
- Seed the total in CacheIndex::create() via a new
select_total_estimated_size statement (COALESCE(..., 0) so an empty
index returns 0 rather than NULL).
- Each Entry caches serialized_request_headers_size and
serialized_response_headers_size so we don't re-serialize to
recompute its footprint; Entry::estimated_size() centralizes the
arithmetic.
- create_entry() adds the new entry's size. Any row it displaces is
removed via DELETE ... RETURNING so the total stays accurate even
for entries that were never loaded into m_entries.
- remove_entry() and the bulk DELETE statements were extended with
the same RETURNING clause for the same reason.
- update_response_headers() shifts the total by the signed delta
between old and new serialized header size.
Also COALESCEs estimate_cache_size_accessed_since over an empty table
to 0 so callers don't have to special-case NULL.
create_entry() issues INSERT OR REPLACE in SQL, so the on-disk row is
correctly overwritten when a (cache_key, vary_key) pair is re-inserted.
But the in-memory m_entries vector was only appended to, leaving the
stale Entry alongside the new one. Subsequent find_entry() calls could
then return the old metadata even though the DB had moved on.
This just lets us create fewer cache directories during WPT. We do still
create cache entries on disk, so for WPT, we introduce an extra cache
key to prevent conflicts. There is an existing FIXME about this.
This implements the must-understand response cache directive per RFC
9111 Section 5.2.2.3. When a response contains must-understand, this
cache now ignores the no-store directive for status codes whose
caching behavior it implements. For status codes the cache does not
understand, the response is not stored.
This adds a settings box to about:settings to allow users to limit the
disk cache size. This will override the default 5 GiB limit. We do not
automatically delete cache data if the new limit is suddenly less than
the used disk space; this will happen on the next request. This allows
multiple changes to the settings in a row without thrashing the cache.
In the future, we can add more toggles, such as disabling the disk
cache altogether.
Let's not attempt to cache entries that are excessively large. We limit
the cache data size to be 1/8 of the total disk cache limit, with a cap
of 256 MiB.
Rather than letting our disk cache grow unbounded, let's impose a limit
on the estimated total disk cache size. The limits chosen are vaguely
inspired by Chromium.
We impose a total disk cache limit of 5 GiB. Chromium imposes an overall
limit of 1.25 GiB; I've chosen more here because we currently cache
uncompressed data from cURL.
The limit is further restricted by the amount of available disk space,
which we just check once at startup (as does Chromium). We will choose a
percentage of the free space available on systems with limited space.
Our eviction errs on the side of simplicity. We will remove the least
recently accessed entries until the total estimated cache size does not
exceed our limit. This could potentially be improved in the future. For
example, if the next entry to consider is 40 MiB, and we only need to
free 1 MiB of space, we could try evicting slightly more recently used
entries. This would prevent evicting more than we need to.
The caching RFC is quite strict about the format of date strings. If we
received a revalidation attribute with an invalid date string, we would
previously fail a runtime assertion. This was because to start a
revalidation request, we would simply check for the presence of any
revalidation header; but then when we issued the request, we would fail
to parse the header, and end up with all attributes being null.
We now don't parse the revalidation attributes at all. Whatever we
receive in the Last-Modified response header is what we will send in the
If-Modified-Since request header, verbatim. For better or worse, this is
how other browsers behave. So if the server sends us an invalid date
string, it can receive its own date format for revalidation.
Apparently, sqlite will fill this placeholder value in with NULL if we
do not pass a value. The query being executed here is:
UPDATE CacheIndex
SET last_access_time = ?
WHERE cache_key = ? AND vary_key = ?;
Our previous implementation was a bit too tolerant of bad header values.
For example, extracting a "max-age" from a header value of "abmax-agecd"
would have incorrectly parsed successfully.
We now find exact (case-insensitive) directive matches. We also handle
quoted string values, which may contain important delimeters that we
would have previously split on.
If a request failed, or was stopped, do not attempt to write the cache
entry footer to disk. Note that at this point, the cache index will not
have been created, thus this entry will not be used in the future. We do
still delete any partial file on disk.
This serves as a more general fix for the issue addressed in commit
9f2ac14521.
HeaderList::delete involves a Vector::remove_all_matching internally.
So if an exempt header appeared again later in the header list, we would
be accessing the name string of the previously deleted header.
We now partition the HTTP disk cache based on the Vary response header.
If a cached response contains a Vary header, we look for each of the
header names in the outgoing HTTP request. The outgoing request must
match every header value in the original request for the cache entry
to be used; otherwise, a new request will be issued, and a separate
cache entry will be created.
Note that we must now defer creating the disk cache file itself until we
have received the response headers. The Vary key is computed from these
headers, and affects the partitioned disk cache file name.
There are further optimizations we can make here. If we have a Vary
mismatch, we could find the best candidate cached response and issue a
conditional HTTP request. The content server may then respond with an
HTTP 304 if the mismatched request headers are actually okay. But for
now, if we have a Vary mismatch, we issue an unconditional request as
a purely correctness-oriented patch.
We need to store request headers in order to handle Vary mismatches.
(Note we should also be using BLOB for header storage in sqlite, as they
are not necessarily UTF-8.)
If the cache mode is no-store, we must not interact with the cache at
all.
If the cache mode is reload, we must not use any cached response.
If the cache-mode is only-if-cached or force-cache, we are permitted
to respond with stale cache responses.
Note that we currently cannot test only-if-cached in test-web. Setting
this mode also requires setting the cors mode to same-origin, but our
http-test-server infra requires setting the cors mode to cors.
Once a cache entry is not fresh, we now remove it from the memory cache.
We will avoid handling revalidation from within WebContent. Instead, we
will just forward the request to RequestServer, where the disk cache
will handle revalidation for itself if needed.
We currently disable the disk cache because the WPT runner will run more
than one RequestServer process at a time. The SQLite database does not
handle this concurrent read/write access well.
We will now enable the disk cache with a per-process database. This is
needed to ensure that WPT Fetch cache tests are sufficiently handled by
RequestServer.
We currently set the response time to when the cache entry writer is
created. This is more or less the same as the request start time, so
this is not correct.
This was a regression from 5384f84550.
That commit changed when the writer was created, but did not move the
setting of the response time to match.
We now set the response time to when the HTTP response headers have been
received (again), which matches how Chromium behaves:
https://source.chromium.org/chromium/chromium/src/+/refs/tags/144.0.7500.0:net/url_request/url_request_job.cc;l=425-433
If we have the response for a non-Range request in the memory cache, we
would previously use it in reply to Range requests. Similar to commit
878b00ae61f998a26aad7f50fae66cf969878ad6, we are just punting on Range
requests in the HTTP caches for now.
A future commit will format memory cache debug messages similarly to the
disk cache messages. To make it easy to read them both at a glance when
both debug flags are turned on, let's add a prefix to these messages.