Commit graph

17 commits

Author SHA1 Message Date
Aliaksandr Kalenik
14dc9e8ca2 LibHTTP: Short-circuit CacheIndex eviction when under its size limit
remove_entries_exceeding_cache_limit() is called after every network
response, but the cache is usually still under budget and nothing needs
to evict. Every one of those calls currently still runs the
window-function eviction SQL over the whole CacheIndex table just to
conclude there is nothing to do.

Short-circuit the call when the cache is already within its configured
size limit. To make that check cheap, maintain m_total_estimated_size
as a running total of the cache's estimated byte size, so the no-op
case becomes a single u64 compare and the DB is only touched when
there is real work.

Bookkeeping:
- Seed the total in CacheIndex::create() via a new
  select_total_estimated_size statement (COALESCE(..., 0) so an empty
  index returns 0 rather than NULL).
- Each Entry caches serialized_request_headers_size and
  serialized_response_headers_size so we don't re-serialize to
  recompute its footprint; Entry::estimated_size() centralizes the
  arithmetic.
- create_entry() adds the new entry's size. Any row it displaces is
  removed via DELETE ... RETURNING so the total stays accurate even
  for entries that were never loaded into m_entries.
- remove_entry() and the bulk DELETE statements were extended with
  the same RETURNING clause for the same reason.
- update_response_headers() shifts the total by the signed delta
  between old and new serialized header size.

Also COALESCEs estimate_cache_size_accessed_since over an empty table
to 0 so callers don't have to special-case NULL.
2026-04-19 01:31:37 +02:00
Aliaksandr Kalenik
d8b28a68cc LibHTTP: Replace loaded cache entries in CacheIndex
create_entry() issues INSERT OR REPLACE in SQL, so the on-disk row is
correctly overwritten when a (cache_key, vary_key) pair is re-inserted.
But the in-memory m_entries vector was only appended to, leaving the
stale Entry alongside the new one. Subsequent find_entry() calls could
then return the old metadata even though the DB had moved on.
2026-04-19 01:31:37 +02:00
Timothy Flynn
ef134c940e LibHTTP: Correctly normalize header whitespace in cache utilities
We also shouldn't trim whitespace at all when reading headers from the
cache index. We store them as-is and should therefore read them as-is.
2026-02-26 22:27:46 +01:00
Timothy Flynn
bda0820b8b LibHTTP: Use a memory-backed database for the disk cache in test modes
This just lets us create fewer cache directories during WPT. We do still
create cache entries on disk, so for WPT, we introduce an extra cache
key to prevent conflicts. There is an existing FIXME about this.
2026-02-15 15:25:30 -05:00
Shannon Booth
d3624c328f LibDatabase: Allow creating a memory backed database 2026-02-14 10:25:33 -05:00
Timothy Flynn
7d60d0bfb7 LibHTTP+LibWebView+RequestServer: Allow users to set disk cache limits
This adds a settings box to about:settings to allow users to limit the
disk cache size. This will override the default 5 GiB limit. We do not
automatically delete cache data if the new limit is suddenly less than
the used disk space; this will happen on the next request. This allows
multiple changes to the settings in a row without thrashing the cache.

In the future, we can add more toggles, such as disabling the disk
cache altogether.
2026-02-13 10:20:52 -05:00
Timothy Flynn
16fb2ea3b7 LibHTTP: Impose a limit on singular disk cache entry sizes
Let's not attempt to cache entries that are excessively large. We limit
the cache data size to be 1/8 of the total disk cache limit, with a cap
of 256 MiB.
2026-02-13 10:20:52 -05:00
Timothy Flynn
d773ba25cf LibHTTP: Impose a limit on the total disk cache size
Rather than letting our disk cache grow unbounded, let's impose a limit
on the estimated total disk cache size. The limits chosen are vaguely
inspired by Chromium.

We impose a total disk cache limit of 5 GiB. Chromium imposes an overall
limit of 1.25 GiB; I've chosen more here because we currently cache
uncompressed data from cURL.

The limit is further restricted by the amount of available disk space,
which we just check once at startup (as does Chromium). We will choose a
percentage of the free space available on systems with limited space.

Our eviction errs on the side of simplicity. We will remove the least
recently accessed entries until the total estimated cache size does not
exceed our limit. This could potentially be improved in the future. For
example, if the next entry to consider is 40 MiB, and we only need to
free 1 MiB of space, we could try evicting slightly more recently used
entries. This would prevent evicting more than we need to.
2026-02-13 10:20:52 -05:00
Timothy Flynn
5f2063d5d9 LibHTTP: Include request header length in the estimated disk cache size
Request headers were added in 36a826815d,
but this estimation was not updated.
2026-02-13 10:20:52 -05:00
Timothy Flynn
918f6a4c9f LibHTTP: Ensure we use the Vary key when updating last access time
Apparently, sqlite will fill this placeholder value in with NULL if we
do not pass a value. The query being executed here is:

    UPDATE CacheIndex
    SET last_access_time = ?
    WHERE cache_key = ? AND vary_key = ?;
2026-02-06 16:24:49 +01:00
Timothy Flynn
12552f0d72 LibHTTP: Avoid UAF while deleting exempt cache headers
HeaderList::delete involves a Vector::remove_all_matching internally.
So if an exempt header appeared again later in the header list, we would
be accessing the name string of the previously deleted header.
2026-01-22 13:18:29 -05:00
Timothy Flynn
d3041dc054 LibHTTP+LibWeb: Support the HTTP Vary response header
We now partition the HTTP disk cache based on the Vary response header.
If a cached response contains a Vary header, we look for each of the
header names in the outgoing HTTP request. The outgoing request must
match every header value in the original request for the cache entry
to be used; otherwise, a new request will be issued, and a separate
cache entry will be created.

Note that we must now defer creating the disk cache file itself until we
have received the response headers. The Vary key is computed from these
headers, and affects the partitioned disk cache file name.

There are further optimizations we can make here. If we have a Vary
mismatch, we could find the best candidate cached response and issue a
conditional HTTP request. The content server may then respond with an
HTTP 304 if the mismatched request headers are actually okay. But for
now, if we have a Vary mismatch, we issue an unconditional request as
a purely correctness-oriented patch.
2026-01-22 08:54:49 -05:00
Timothy Flynn
36a826815d LibHTTP+LibWeb+RequestServer: Store request headers in the HTTP caches
We need to store request headers in order to handle Vary mismatches.

(Note we should also be using BLOB for header storage in sqlite, as they
are not necessarily UTF-8.)
2026-01-22 08:54:49 -05:00
Timothy Flynn
04171d42f0 LibHTTP: Prefix disk cache debug messages with "[disk]" text
A future commit will format memory cache debug messages similarly to the
disk cache messages. To make it easy to read them both at a glance when
both debug flags are turned on, let's add a prefix to these messages.
2026-01-10 09:02:41 -05:00
Timothy Flynn
9c8322d1b3 LibHTTP: Use correct cache key type in disk cache index entry storage
We also don't need to store the cache key itself in the entry struct.
2025-12-21 09:24:51 -06:00
Timothy Flynn
aae8574d25 LibHTTP: Place HTTP disk cache log points behind a debug flag
These log points are quite verbose. Before we enable the disk cache by
default, let's place them behind a debug flag.
2025-12-02 12:19:42 +01:00
Timothy Flynn
21bbbacd07 LibHTTP+RequestServer: Move the HTTP cache implementation to LibHTTP
We currently have two ongoing implementations of RFC 9111, HTTP caching.
In order to consolidate these, this patch moves the implementation from
RequestServer to LibHTTP for re-use within LibWeb.
2025-11-29 08:35:02 -05:00
Renamed from Services/RequestServer/Cache/CacheIndex.cpp (Browse further)