Commit graph

110 commits

Author SHA1 Message Date
Andreas Kling
34d954e2d7 LibRegex: Add ECMAScriptRegex and migrate callers
Add `ECMAScriptRegex`, LibRegex's C++ facade for ECMAScript regexes.

The facade owns compilation, execution, captures, named groups, and
error translation for the Rust backend, which lets callers stop
depending on the legacy parser and matcher types directly. Use it in the
remaining non-LibJS callers: URLPattern, HTML input pattern handling,
and the places in LibHTTP that only needed token validation.

Where a full regex engine was unnecessary, replace those call sites with
direct character checks. Also update focused LibURL, LibHTTP, and WPT
coverage for the migrated callers and corrected surrogate handling.
2026-03-27 17:32:19 +01:00
Shannon Booth
d76330645a LibHTTP: Ignore empty list elements when extracting token headers
It turns out that the validation of header values in db5f16f042
was a bit over aggressive. extract_token_headers previously treated
empty list elements (empty or whitespace-only after trimming) as parse
failures. This is incorrect per RFC 9110, which specifies that
recipients must ignore empty list elements in comma-separated header
values.

> A recipient MUST parse and ignore a reasonable number of empty
> list elements
2026-03-16 13:55:26 +01:00
Shannon Booth
db5f16f042 LibHTTP: Parse token-list headers according to their ABNF
The previous implementation did not fully align with each
headers ABNF, so would not reject some headers as we should
have been doing.

Fixes 6 WPT subtests for

https://wpt.live/cors/access-control-expose-headers-parsing.window.html
2026-03-01 18:16:16 +00:00
Timothy Flynn
ef134c940e LibHTTP: Correctly normalize header whitespace in cache utilities
We also shouldn't trim whitespace at all when reading headers from the
cache index. We store them as-is and should therefore read them as-is.
2026-02-26 22:27:46 +01:00
Timothy Flynn
0652a33043 LibHTTP: Return a StringView from HTTP::normalize_header_value
This lets callers that do not need a string avoid a needless allocation.
All callers that do need a string will already either:

* Turn it into a ByteString themselves
* Pass this along to the isomorphic encoder
2026-02-26 22:27:46 +01:00
Timothy Flynn
3b5c5f68bb LibHTTP: Use IdentityHashTraits for HashMaps keyed by the cache key
The cache key itself is already an integral export of a SHA-1 hash of
some request fields. We don't need to hash it again for these maps.
2026-02-24 15:10:59 +01:00
Ben Wiederhake
2e51182560 LibHTTP: Remove unused header in HttpRequest 2026-02-23 12:15:23 +01:00
Ben Wiederhake
7ad95c78af LibHTTP: Remove unused header in HeaderList 2026-02-23 12:15:23 +01:00
Ben Wiederhake
2a369a2a26 LibHTTP: Remove unused header in ParsedCookie 2026-02-23 12:15:23 +01:00
Shannon Booth
2e3c59f791 LibHTTP: Simplify serializing a URL for cache storage
The serialization function already has a flag to skip the fragment
or not.
2026-02-18 12:52:19 -05:00
Timothy Flynn
bda0820b8b LibHTTP: Use a memory-backed database for the disk cache in test modes
This just lets us create fewer cache directories during WPT. We do still
create cache entries on disk, so for WPT, we introduce an extra cache
key to prevent conflicts. There is an existing FIXME about this.
2026-02-15 15:25:30 -05:00
Praise-Garfield
3e719be607 LibHTTP: Handle InvalidURL in parse_error_to_string
The ParseError::InvalidURL variant is returned by from_raw_request()
when a query string cannot be converted to a valid String. However,
parse_error_to_string() does not handle this variant, causing it to
fall through to VERIFY_NOT_REACHED() and crash. This adds the missing
case.
2026-02-14 14:35:01 -05:00
Praise-Garfield
9b8e341828 LibHTTP: Implement the must-understand cache directive
This implements the must-understand response cache directive per RFC
9111 Section 5.2.2.3. When a response contains must-understand, this
cache now ignores the no-store directive for status codes whose
caching behavior it implements. For status codes the cache does not
understand, the response is not stored.
2026-02-14 14:34:34 -05:00
Shannon Booth
d3624c328f LibDatabase: Allow creating a memory backed database 2026-02-14 10:25:33 -05:00
Timothy Flynn
7d60d0bfb7 LibHTTP+LibWebView+RequestServer: Allow users to set disk cache limits
This adds a settings box to about:settings to allow users to limit the
disk cache size. This will override the default 5 GiB limit. We do not
automatically delete cache data if the new limit is suddenly less than
the used disk space; this will happen on the next request. This allows
multiple changes to the settings in a row without thrashing the cache.

In the future, we can add more toggles, such as disabling the disk
cache altogether.
2026-02-13 10:20:52 -05:00
Timothy Flynn
16fb2ea3b7 LibHTTP: Impose a limit on singular disk cache entry sizes
Let's not attempt to cache entries that are excessively large. We limit
the cache data size to be 1/8 of the total disk cache limit, with a cap
of 256 MiB.
2026-02-13 10:20:52 -05:00
Timothy Flynn
d773ba25cf LibHTTP: Impose a limit on the total disk cache size
Rather than letting our disk cache grow unbounded, let's impose a limit
on the estimated total disk cache size. The limits chosen are vaguely
inspired by Chromium.

We impose a total disk cache limit of 5 GiB. Chromium imposes an overall
limit of 1.25 GiB; I've chosen more here because we currently cache
uncompressed data from cURL.

The limit is further restricted by the amount of available disk space,
which we just check once at startup (as does Chromium). We will choose a
percentage of the free space available on systems with limited space.

Our eviction errs on the side of simplicity. We will remove the least
recently accessed entries until the total estimated cache size does not
exceed our limit. This could potentially be improved in the future. For
example, if the next entry to consider is 40 MiB, and we only need to
free 1 MiB of space, we could try evicting slightly more recently used
entries. This would prevent evicting more than we need to.
2026-02-13 10:20:52 -05:00
Timothy Flynn
5f2063d5d9 LibHTTP: Include request header length in the estimated disk cache size
Request headers were added in 36a826815d,
but this estimation was not updated.
2026-02-13 10:20:52 -05:00
Praise-Garfield
b270b2cacb LibHTTP: Fix inverted Content-Range complete-length parsing
parse_single_byte_content_range_as_values() has the condition on
consume_specific('*') inverted. When the complete-length is a
numeric value like "1000", the negated check causes the wildcard
branch to run, discarding the length. When it is "*" (unknown),
the else branch tries to parse digits after consuming the "*",
which fails entirely.

Removing the "!" fixes both cases so that "*" correctly produces
an empty complete_length, and numeric values are parsed normally.

Also adds an EOF check after parsing to reject trailing garbage,
matching the pattern used by parse_single_range_header_value().
2026-02-13 09:39:49 +01:00
Timothy Flynn
d97a3d9b5a LibHTTP+RequestServer: Send revalidation attributes without parsing
The caching RFC is quite strict about the format of date strings. If we
received a revalidation attribute with an invalid date string, we would
previously fail a runtime assertion. This was because to start a
revalidation request, we would simply check for the presence of any
revalidation header; but then when we issued the request, we would fail
to parse the header, and end up with all attributes being null.

We now don't parse the revalidation attributes at all. Whatever we
receive in the Last-Modified response header is what we will send in the
If-Modified-Since request header, verbatim. For better or worse, this is
how other browsers behave. So if the server sends us an invalid date
string, it can receive its own date format for revalidation.
2026-02-10 09:09:53 -05:00
Timothy Flynn
d75aee2a56 LibHTTP+LibWeb: Move the IncludeCredentials enum to LibHTTP
This will be sent over IPC to RequestServer in an upcoming patch.
2026-02-10 12:21:20 +01:00
Timothy Flynn
8b10a3a39e LibHTTP: Clean up old cookie code a bit
* Transfer cookie-related enums over IPC as u8, rather than an int
* Use AK's safe ASCII ctype alternatives
* Use SCREAMING_CASE for constants
2026-02-10 12:21:20 +01:00
Timothy Flynn
8d97389038 LibHTTP+Everywhere: Move the cookie implementation to LibHTTP
This will allow parsing cookies outside of LibWeb.

LibHTTP is basically becoming the home of HTTP WG specs.
2026-02-10 12:21:20 +01:00
Timothy Flynn
918f6a4c9f LibHTTP: Ensure we use the Vary key when updating last access time
Apparently, sqlite will fill this placeholder value in with NULL if we
do not pass a value. The query being executed here is:

    UPDATE CacheIndex
    SET last_access_time = ?
    WHERE cache_key = ? AND vary_key = ?;
2026-02-06 16:24:49 +01:00
Zaggy1024
4eb310cd3f LibWeb: Skip range requests for media if the server won't accept them
Currently, this just respects the reported value from Accept-Ranges,
but we could also just try sending a range request and see if the
server rejects it, then fall back to a normal request after. For now,
this is fine, and we can make it use a fallback later if needed.
2026-01-29 05:22:27 -06:00
Zaggy1024
99020b50a3 LibHTTP: Implement extraction of the Content-Range values in HeaderList 2026-01-29 05:22:27 -06:00
Timothy Flynn
4585734696 LibHTTP: Honor the min-fresh Cache-Control request directive 2026-01-28 11:31:04 -05:00
Timothy Flynn
896ecb28ab LibHTTP: Honor the max-stale Cache-Control request directive 2026-01-28 11:31:04 -05:00
Timothy Flynn
4a728b1f29 LibHTTP: Honor the max-age Cache-Control request directive 2026-01-28 11:31:04 -05:00
Timothy Flynn
26ddd0a904 LibHTTP: Honor the no-cache Cache-Control request directive 2026-01-28 11:31:04 -05:00
Timothy Flynn
cb1ad8a904 LibHTTP: Honor the no-store Cache-Control request directive 2026-01-28 11:31:04 -05:00
Timothy Flynn
2918537596 LibHTTP: Add helper to extract a duration from a Cache-Control directive 2026-01-28 11:31:04 -05:00
Timothy Flynn
40800fd91e LibHTTP: Implement a strict method to extract Cache-Control directives
Our previous implementation was a bit too tolerant of bad header values.
For example, extracting a "max-age" from a header value of "abmax-agecd"
would have incorrectly parsed successfully.

We now find exact (case-insensitive) directive matches. We also handle
quoted string values, which may contain important delimeters that we
would have previously split on.
2026-01-28 11:31:04 -05:00
Timothy Flynn
54c2ecedca RequestServer: Do not flush the disk cache for unsuccessful requests
If a request failed, or was stopped, do not attempt to write the cache
entry footer to disk. Note that at this point, the cache index will not
have been created, thus this entry will not be used in the future. We do
still delete any partial file on disk.

This serves as a more general fix for the issue addressed in commit
9f2ac14521.
2026-01-23 14:24:20 +01:00
Timothy Flynn
12552f0d72 LibHTTP: Avoid UAF while deleting exempt cache headers
HeaderList::delete involves a Vector::remove_all_matching internally.
So if an exempt header appeared again later in the header list, we would
be accessing the name string of the previously deleted header.
2026-01-22 13:18:29 -05:00
Timothy Flynn
d3041dc054 LibHTTP+LibWeb: Support the HTTP Vary response header
We now partition the HTTP disk cache based on the Vary response header.
If a cached response contains a Vary header, we look for each of the
header names in the outgoing HTTP request. The outgoing request must
match every header value in the original request for the cache entry
to be used; otherwise, a new request will be issued, and a separate
cache entry will be created.

Note that we must now defer creating the disk cache file itself until we
have received the response headers. The Vary key is computed from these
headers, and affects the partitioned disk cache file name.

There are further optimizations we can make here. If we have a Vary
mismatch, we could find the best candidate cached response and issue a
conditional HTTP request. The content server may then respond with an
HTTP 304 if the mismatched request headers are actually okay. But for
now, if we have a Vary mismatch, we issue an unconditional request as
a purely correctness-oriented patch.
2026-01-22 08:54:49 -05:00
Timothy Flynn
36a826815d LibHTTP+LibWeb+RequestServer: Store request headers in the HTTP caches
We need to store request headers in order to handle Vary mismatches.

(Note we should also be using BLOB for header storage in sqlite, as they
are not necessarily UTF-8.)
2026-01-22 08:54:49 -05:00
Timothy Flynn
24da225b3b LibHTTP: Update disk cache entry format comment with latest format
In commit 20cd19be4d, HTTP headers were
moved from the cache entry to the cache index, but this comment was not
updated.
2026-01-22 08:54:49 -05:00
Timothy Flynn
aa1517b727 LibHTTP+LibWeb+RequestServer: Handle the Fetch API's cache mode
If the cache mode is no-store, we must not interact with the cache at
all.

If the cache mode is reload, we must not use any cached response.

If the cache-mode is only-if-cached or force-cache, we are permitted
to respond with stale cache responses.

Note that we currently cannot test only-if-cached in test-web. Setting
this mode also requires setting the cors mode to same-origin, but our
http-test-server infra requires setting the cors mode to cors.
2026-01-22 07:05:06 -05:00
Timothy Flynn
6b91199253 LibHTTP+LibWeb: Move Infrastructure::Request::CacheMode to LibHTTP
We will need to send this enum over IPC to RequestServer to affect the
disk cache's behavior.
2026-01-22 07:05:06 -05:00
Timothy Flynn
2ac219405f LibHTTP+LibWeb: Purge non-fresh entries from the memory cache
Once a cache entry is not fresh, we now remove it from the memory cache.
We will avoid handling revalidation from within WebContent. Instead, we
will just forward the request to RequestServer, where the disk cache
will handle revalidation for itself if needed.
2026-01-19 08:02:14 -05:00
Timothy Flynn
17d7c2b6bd LibHTTP: Allow revalidating heuristically cacheable responses
This is expected by WPT (the /fetch/http-cache/304-update.any.html test
in particular).
2026-01-19 08:02:14 -05:00
Timothy Flynn
bc1cafc716 LibHTTP+LibWebView+RequestServer: Allow using the disk cache during WPT
We currently disable the disk cache because the WPT runner will run more
than one RequestServer process at a time. The SQLite database does not
handle this concurrent read/write access well.

We will now enable the disk cache with a per-process database. This is
needed to ensure that WPT Fetch cache tests are sufficiently handled by
RequestServer.
2026-01-19 08:02:14 -05:00
Timothy Flynn
457a319cda LibHTTP: Define the DiskCache::cache_directory getter as const 2026-01-19 08:02:14 -05:00
Zaggy1024
84c0eb3dbf LibCore+LibHTTP+RequestServer: Send data via sockets instead of pipes
This brings the implementation on Unix in line with Windows, so we can
drop a few ifdefs.
2026-01-19 06:53:29 -05:00
Timothy Flynn
cb4da2c6c2 LibHTTP: Defer setting the response time until headers are received
We currently set the response time to when the cache entry writer is
created. This is more or less the same as the request start time, so
this is not correct.

This was a regression from 5384f84550.
That commit changed when the writer was created, but did not move the
setting of the response time to match.

We now set the response time to when the HTTP response headers have been
received (again), which matches how Chromium behaves:

https://source.chromium.org/chromium/chromium/src/+/refs/tags/144.0.7500.0:net/url_request/url_request_job.cc;l=425-433
2026-01-10 23:31:42 +01:00
Timothy Flynn
453764d3f0 LibHTTP: Do not respond to Range requests with cached full responses
If we have the response for a non-Range request in the memory cache, we
would previously use it in reply to Range requests. Similar to commit
878b00ae61f998a26aad7f50fae66cf969878ad6, we are just punting on Range
requests in the HTTP caches for now.
2026-01-10 09:02:41 -05:00
Timothy Flynn
b35645523c LibHTTP+LibWeb: Make memory cache debug logs consistent with disk cache
Let's also not yell.
2026-01-10 09:02:41 -05:00
Timothy Flynn
04171d42f0 LibHTTP: Prefix disk cache debug messages with "[disk]" text
A future commit will format memory cache debug messages similarly to the
disk cache messages. To make it easy to read them both at a glance when
both debug flags are turned on, let's add a prefix to these messages.
2026-01-10 09:02:41 -05:00
Timothy Flynn
0d99d54c46 LibHTTP+LibWeb: Do not cache range requests (for now)
We currently do not handle responses for range requests at all in our
HTTP caches. This means if we issue a request for a range of bytes=1-10,
that response will be served to a subsequent request for a range of
bytes=10-20. This is obviously invalid - so until we handle these
requests, just don't cache them for now.
2026-01-08 11:59:12 +01:00