Commit graph

134 commits

Author SHA1 Message Date
Shannon Booth
6719f01a40 LibHTTP: Treat PSL star-rule domains as public suffixes
Use IncludeStarRule::Yes for cookie public-suffix checks so domains not
explicitly listed in the PSL still get treated as public suffixes via
the implicit * rule. This fixes accepting cookies for bare TLD-like
domains.
2026-06-16 06:14:07 +02:00
Shannon Booth
3f7b31fc78 LibURL: Let Host use PublicSuffixData star rule matching
Ever since PublicSuffixData was created, it was using "no star rule"
matching, which is what is needed for the address bar to distinguish
between a domain and a search. URL::Host on the other hand requires
the fallback star rule. Which rule is needed depends on the use case
of the PSL. Support both use cases by a flag in PublicSuffixData.
2026-06-16 06:14:07 +02:00
Shannon Booth
928007356c LibHTTP: Preserve single-dot cookie domains
I suspect this is not an important case, but since both Firefox and
Chromium implement it, let's match their behaviour. While this does
not matter the exact letter of the spec, the relevant WPT test was
alongside this spec text as part of a spec change trying to align
to align spec behaviour with Chromium and Firefox, so I believe
what is implemented here to be the intention of the specification
authors.
2026-06-16 06:14:07 +02:00
Sam Atkins
76a10bb82d LibHTTP: Expose cookie date parser
DevTools needs to parse the same cookie expiry timestamp syntax that
Set-Cookie accepts. Expose the existing RFC6265 cookie-date parser so
callers do not need to duplicate a stricter HTTP-date parser.
2026-06-11 15:03:47 +01:00
Shannon Booth
5ef34ecb7a LibURL: Use libpsl for public suffix matching
Replace the generated public suffix table and custom matcher with a
direct LibURL PublicSuffixData implementation backed by libpsl. This
drops our PSL download/generator path and uses the same library already
used by libcurl.

Performance is comparable before and after, while LibURL binary size
is smaller.
2026-06-10 11:43:44 +02:00
Shannon Booth
9b80ac00be LibURL: Move registrable-domain PSL lookup to PublicSuffixData
Move the registrable-domain helper from URL into PublicSuffixData and
name it find_matching_registrable_domain().

This keeps it alongside find_matching_public_suffix(), making it clear
that both APIs only return results matched from the PSL data, while
Host::public_suffix() implements the URL Standard fallback to the
top-level domain.
2026-06-08 20:19:06 +02:00
Andreas Kling
72720bc229 LibWeb: Preserve JS bytecode in memory cache
Store JavaScript bytecode side data in the WebContent HTTP memory
cache and replay it when serving cached responses. Also update an
already-complete memory-cache entry when asynchronous bytecode cache
generation finishes, so the first source-only response does not keep
shadowing the disk-cache sidecar during same-process navigations.

Keep the HTTP memory-cache backfill keyed with the request headers that
populated the memory-cache entry, so Vary responses still receive their
generated bytecode sidecar.

Add LibHTTP coverage for round-tripping bytecode side data through a
memory-cache entry, attaching it after the response body has already
been cached, and matching Vary headers during updates. Add LibWeb
coverage for preserving the memory-cache request headers when cloning
responses.
2026-06-06 09:15:09 +02:00
Andreas Kling
164ed80244 Meta: Enable exit-time destructor warnings for libraries
Enable -Wexit-time-destructors for all in-tree library targets and
update process-lifetime library statics so they no longer register
exit-time destructors. Long-lived caches, lookup tables, singleton
registries, and generated constants now use NeverDestroyed or leaked
references where the data is intended to live until process exit.

Update LibWeb, LibLine, and the binding generators so regenerated
sources follow the same rule instead of reintroducing destructed
statics.
2026-06-04 19:20:49 +02:00
Luke Wilde
854d9c7da4 LibWeb+LibHTTP: Consult HSTS preload list before dynamic-store IPC
The preload list is static, immutable data compiled into LibHTTP.
ResourceLoader consults it in-process at the fetch layer before
falling back to the dynamic, per-profile store in the browser
process, so a preloaded host is upgraded without a synchronous IPC.
HSTSStore stays the dynamic store only; the preload list cannot be
unset by a max-age=0 response because it is consulted first, before
the store is ever queried.
2026-05-29 22:23:33 +02:00
Luke Wilde
08766d47f4 LibWeb+LibHTTP+LibWebView: Implement HSTS
When an HTTPS response carries a Strict-Transport-Security header, the
received policy is now respected. Subsequent HTTP requests to a known
HSTS host are upgraded to HTTPS before the fetch algorithm makes
further decisions such as CORS and mixed content.

Fixes tpexpress.co.uk, where an XHR redirects HTTPS -> HTTP -> HTTPS,
relying on a HSTS policy received on the document response to avoid the
CORS failure.
2026-05-29 22:23:33 +02:00
Ali Mohammad Pur
2419cce8d8 LibWasm+LibWeb: Serialize and save compiled wasm as a cache blob 2026-05-27 09:52:34 +02:00
Ali Mohammad Pur
ab6eac02ff LibHTTP+RequestServer: Add support for synthetic disk-cache entries
This exposes the operation through RequestServer.ipc and RequestClient
so the client can create the stub before accessing a content-keyed
side-data file.
2026-05-27 09:52:34 +02:00
Andreas Kling
efa7c1ec97 LibHTTP: Publish disk cache entries by rename
Write cache entries to a temporary file and rename them over the final
path only after the entry has been fully flushed. This keeps previously
mapped cache bodies tied to their original inode instead of exposing
them to truncation when the same cache key is replaced.

Clean up only the temporary file for incomplete writers so a failed
replacement does not remove the still-valid existing cache entry. Add
coverage that replaces an entry while an old body mapping remains live
and verifies that it still exposes the original bytes.
2026-05-16 08:13:35 +02:00
Andreas Kling
c0b19ff981 RequestServer: Send bytecode cache sidecars as files
Map JavaScript bytecode cache sidecars from the HTTP disk cache instead
of copying them into anonymous shared buffers while handing response
headers to WebContent. Store the mapped data as ImmutableBytes on the
fetch response so script fetching can decode directly from the mapped
sidecar bytes.

Add LibHTTP coverage for retrieving associated cache data as a mappable
file, alongside the existing byte-buffer retrieval API.
2026-05-16 08:13:35 +02:00
Andreas Kling
8ee6e38fd1 LibHTTP: Report cache reader body sizes
Return the body length stored on CacheEntryReader instead of using the
footer field inherited from CacheEntry. Readers do not populate the
footer until validation, but RequestServer needs the size before then to
choose between pipe transfer and file-backed mapping.
2026-05-16 08:13:35 +02:00
Andreas Kling
ff3c4db3a4 RequestServer: Reuse disk-cache files after downloads
Let CacheEntryWriter expose an explicit body-file handoff after a
successful flush. RequestServer sends that file to WebContent before the
request finishes, so Fetch can replace its retained memory-cache buffer
with a file-backed ImmutableBytes body.

Keep the plain flush path fd-free for existing callers, and add coverage
for mapping the body file returned by the writer. The focused disk and
memory cache web tests continue to pass.
2026-05-16 08:13:35 +02:00
Andreas Kling
54d2395a03 LibWeb: Preserve file-backed HTTP cache bodies
Carry response chunks through LibRequests and ResourceLoader as a
ResponseData wrapper. A disk cache hit can retain its mapped storage,
while ordinary streamed chunks still use borrowed bytes.

Store completed memory cache entries as Core::ImmutableBytes. This lets
mapped disk-cache responses stay file-backed while retained by the HTTP
memory cache, instead of forcing another ByteBuffer copy.
2026-05-16 08:13:35 +02:00
Andreas Kling
b165bdb874 RequestServer: Send disk cache hits as file-backed bodies
Expose validated disk cache entry bodies as fd-backed byte ranges. Use
this for cache hits instead of creating a response socket.

Map received body ranges into ImmutableBytes on the client side. Keep
the streamed path for network responses. Adapt buffered callbacks to
receive ImmutableBytes, so cached responses can stay file-backed.

RequestServer, LibRequests, and LibWebView build with the new IPC
message and callback type.
2026-05-16 08:13:35 +02:00
Timothy Flynn
c1a69e6c9e RequestServer: Delete cache entry files when the cache version is bumped
When we increment the cache version, we previously only wiped the cache
index table; we did not delete the corresponding cache entry files.
2026-05-11 09:37:35 -04:00
Timothy Flynn
f2fc39eabf RequestServer: Log orphaned cache files when HTTP_DISK_CACHE_DEBUG is on
When enabled, this will log all cache entry files which no longer appear
in the cache index.
2026-05-11 09:37:35 -04:00
Andreas Kling
c32b5a3f73 LibWeb+RequestServer: Send cached bytecode with responses
Attach cached JavaScript bytecode sidecars to HTTP response headers so
WebContent can materialize classic and module scripts directly from a
decoded cache blob on cache hits.

Carry the disk cache vary key with the sidecar and reuse it when storing
fresh bytecode, avoiding mismatches against the augmented network
request headers used to create the cache entry.

Keep CORS-filtered module responses intact for status, MIME, and script
creation checks. Read bytecode sidecar data only from the internal
response, and treat decode or materialization failure as a cache miss
that falls back to normal source compilation.
2026-05-06 08:20:06 +02:00
Andreas Kling
f70f485e48 LibHTTP: Store associated data with disk cache entries
Add disk cache helpers that store and retrieve sidecar payloads with
the same cache key and vary key as the HTTP response entry. The first
consumer is JavaScript bytecode.

Delete sidecars when the owning entry is removed, evicted, or replaced
by a fresh response. Count their on-disk size toward the cache budget so
bytecode data cannot grow outside eviction accounting.

Disk cache tests cover sidecar round-trips, replacement cleanup, and
eviction size accounting.
2026-05-06 08:20:06 +02:00
Aliaksandr Kalenik
14dc9e8ca2 LibHTTP: Short-circuit CacheIndex eviction when under its size limit
remove_entries_exceeding_cache_limit() is called after every network
response, but the cache is usually still under budget and nothing needs
to evict. Every one of those calls currently still runs the
window-function eviction SQL over the whole CacheIndex table just to
conclude there is nothing to do.

Short-circuit the call when the cache is already within its configured
size limit. To make that check cheap, maintain m_total_estimated_size
as a running total of the cache's estimated byte size, so the no-op
case becomes a single u64 compare and the DB is only touched when
there is real work.

Bookkeeping:
- Seed the total in CacheIndex::create() via a new
  select_total_estimated_size statement (COALESCE(..., 0) so an empty
  index returns 0 rather than NULL).
- Each Entry caches serialized_request_headers_size and
  serialized_response_headers_size so we don't re-serialize to
  recompute its footprint; Entry::estimated_size() centralizes the
  arithmetic.
- create_entry() adds the new entry's size. Any row it displaces is
  removed via DELETE ... RETURNING so the total stays accurate even
  for entries that were never loaded into m_entries.
- remove_entry() and the bulk DELETE statements were extended with
  the same RETURNING clause for the same reason.
- update_response_headers() shifts the total by the signed delta
  between old and new serialized header size.

Also COALESCEs estimate_cache_size_accessed_since over an empty table
to 0 so callers don't have to special-case NULL.
2026-04-19 01:31:37 +02:00
Aliaksandr Kalenik
d8b28a68cc LibHTTP: Replace loaded cache entries in CacheIndex
create_entry() issues INSERT OR REPLACE in SQL, so the on-disk row is
correctly overwritten when a (cache_key, vary_key) pair is re-inserted.
But the in-memory m_entries vector was only appended to, leaving the
stale Entry alongside the new one. Subsequent find_entry() calls could
then return the old metadata even though the DB had moved on.
2026-04-19 01:31:37 +02:00
Andreas Kling
34d954e2d7 LibRegex: Add ECMAScriptRegex and migrate callers
Add `ECMAScriptRegex`, LibRegex's C++ facade for ECMAScript regexes.

The facade owns compilation, execution, captures, named groups, and
error translation for the Rust backend, which lets callers stop
depending on the legacy parser and matcher types directly. Use it in the
remaining non-LibJS callers: URLPattern, HTML input pattern handling,
and the places in LibHTTP that only needed token validation.

Where a full regex engine was unnecessary, replace those call sites with
direct character checks. Also update focused LibURL, LibHTTP, and WPT
coverage for the migrated callers and corrected surrogate handling.
2026-03-27 17:32:19 +01:00
Shannon Booth
d76330645a LibHTTP: Ignore empty list elements when extracting token headers
It turns out that the validation of header values in db5f16f042
was a bit over aggressive. extract_token_headers previously treated
empty list elements (empty or whitespace-only after trimming) as parse
failures. This is incorrect per RFC 9110, which specifies that
recipients must ignore empty list elements in comma-separated header
values.

> A recipient MUST parse and ignore a reasonable number of empty
> list elements
2026-03-16 13:55:26 +01:00
Shannon Booth
db5f16f042 LibHTTP: Parse token-list headers according to their ABNF
The previous implementation did not fully align with each
headers ABNF, so would not reject some headers as we should
have been doing.

Fixes 6 WPT subtests for

https://wpt.live/cors/access-control-expose-headers-parsing.window.html
2026-03-01 18:16:16 +00:00
Timothy Flynn
ef134c940e LibHTTP: Correctly normalize header whitespace in cache utilities
We also shouldn't trim whitespace at all when reading headers from the
cache index. We store them as-is and should therefore read them as-is.
2026-02-26 22:27:46 +01:00
Timothy Flynn
0652a33043 LibHTTP: Return a StringView from HTTP::normalize_header_value
This lets callers that do not need a string avoid a needless allocation.
All callers that do need a string will already either:

* Turn it into a ByteString themselves
* Pass this along to the isomorphic encoder
2026-02-26 22:27:46 +01:00
Timothy Flynn
3b5c5f68bb LibHTTP: Use IdentityHashTraits for HashMaps keyed by the cache key
The cache key itself is already an integral export of a SHA-1 hash of
some request fields. We don't need to hash it again for these maps.
2026-02-24 15:10:59 +01:00
Ben Wiederhake
2e51182560 LibHTTP: Remove unused header in HttpRequest 2026-02-23 12:15:23 +01:00
Ben Wiederhake
7ad95c78af LibHTTP: Remove unused header in HeaderList 2026-02-23 12:15:23 +01:00
Ben Wiederhake
2a369a2a26 LibHTTP: Remove unused header in ParsedCookie 2026-02-23 12:15:23 +01:00
Shannon Booth
2e3c59f791 LibHTTP: Simplify serializing a URL for cache storage
The serialization function already has a flag to skip the fragment
or not.
2026-02-18 12:52:19 -05:00
Timothy Flynn
bda0820b8b LibHTTP: Use a memory-backed database for the disk cache in test modes
This just lets us create fewer cache directories during WPT. We do still
create cache entries on disk, so for WPT, we introduce an extra cache
key to prevent conflicts. There is an existing FIXME about this.
2026-02-15 15:25:30 -05:00
Praise-Garfield
3e719be607 LibHTTP: Handle InvalidURL in parse_error_to_string
The ParseError::InvalidURL variant is returned by from_raw_request()
when a query string cannot be converted to a valid String. However,
parse_error_to_string() does not handle this variant, causing it to
fall through to VERIFY_NOT_REACHED() and crash. This adds the missing
case.
2026-02-14 14:35:01 -05:00
Praise-Garfield
9b8e341828 LibHTTP: Implement the must-understand cache directive
This implements the must-understand response cache directive per RFC
9111 Section 5.2.2.3. When a response contains must-understand, this
cache now ignores the no-store directive for status codes whose
caching behavior it implements. For status codes the cache does not
understand, the response is not stored.
2026-02-14 14:34:34 -05:00
Shannon Booth
d3624c328f LibDatabase: Allow creating a memory backed database 2026-02-14 10:25:33 -05:00
Timothy Flynn
7d60d0bfb7 LibHTTP+LibWebView+RequestServer: Allow users to set disk cache limits
This adds a settings box to about:settings to allow users to limit the
disk cache size. This will override the default 5 GiB limit. We do not
automatically delete cache data if the new limit is suddenly less than
the used disk space; this will happen on the next request. This allows
multiple changes to the settings in a row without thrashing the cache.

In the future, we can add more toggles, such as disabling the disk
cache altogether.
2026-02-13 10:20:52 -05:00
Timothy Flynn
16fb2ea3b7 LibHTTP: Impose a limit on singular disk cache entry sizes
Let's not attempt to cache entries that are excessively large. We limit
the cache data size to be 1/8 of the total disk cache limit, with a cap
of 256 MiB.
2026-02-13 10:20:52 -05:00
Timothy Flynn
d773ba25cf LibHTTP: Impose a limit on the total disk cache size
Rather than letting our disk cache grow unbounded, let's impose a limit
on the estimated total disk cache size. The limits chosen are vaguely
inspired by Chromium.

We impose a total disk cache limit of 5 GiB. Chromium imposes an overall
limit of 1.25 GiB; I've chosen more here because we currently cache
uncompressed data from cURL.

The limit is further restricted by the amount of available disk space,
which we just check once at startup (as does Chromium). We will choose a
percentage of the free space available on systems with limited space.

Our eviction errs on the side of simplicity. We will remove the least
recently accessed entries until the total estimated cache size does not
exceed our limit. This could potentially be improved in the future. For
example, if the next entry to consider is 40 MiB, and we only need to
free 1 MiB of space, we could try evicting slightly more recently used
entries. This would prevent evicting more than we need to.
2026-02-13 10:20:52 -05:00
Timothy Flynn
5f2063d5d9 LibHTTP: Include request header length in the estimated disk cache size
Request headers were added in 36a826815d,
but this estimation was not updated.
2026-02-13 10:20:52 -05:00
Praise-Garfield
b270b2cacb LibHTTP: Fix inverted Content-Range complete-length parsing
parse_single_byte_content_range_as_values() has the condition on
consume_specific('*') inverted. When the complete-length is a
numeric value like "1000", the negated check causes the wildcard
branch to run, discarding the length. When it is "*" (unknown),
the else branch tries to parse digits after consuming the "*",
which fails entirely.

Removing the "!" fixes both cases so that "*" correctly produces
an empty complete_length, and numeric values are parsed normally.

Also adds an EOF check after parsing to reject trailing garbage,
matching the pattern used by parse_single_range_header_value().
2026-02-13 09:39:49 +01:00
Timothy Flynn
d97a3d9b5a LibHTTP+RequestServer: Send revalidation attributes without parsing
The caching RFC is quite strict about the format of date strings. If we
received a revalidation attribute with an invalid date string, we would
previously fail a runtime assertion. This was because to start a
revalidation request, we would simply check for the presence of any
revalidation header; but then when we issued the request, we would fail
to parse the header, and end up with all attributes being null.

We now don't parse the revalidation attributes at all. Whatever we
receive in the Last-Modified response header is what we will send in the
If-Modified-Since request header, verbatim. For better or worse, this is
how other browsers behave. So if the server sends us an invalid date
string, it can receive its own date format for revalidation.
2026-02-10 09:09:53 -05:00
Timothy Flynn
d75aee2a56 LibHTTP+LibWeb: Move the IncludeCredentials enum to LibHTTP
This will be sent over IPC to RequestServer in an upcoming patch.
2026-02-10 12:21:20 +01:00
Timothy Flynn
8b10a3a39e LibHTTP: Clean up old cookie code a bit
* Transfer cookie-related enums over IPC as u8, rather than an int
* Use AK's safe ASCII ctype alternatives
* Use SCREAMING_CASE for constants
2026-02-10 12:21:20 +01:00
Timothy Flynn
8d97389038 LibHTTP+Everywhere: Move the cookie implementation to LibHTTP
This will allow parsing cookies outside of LibWeb.

LibHTTP is basically becoming the home of HTTP WG specs.
2026-02-10 12:21:20 +01:00
Timothy Flynn
918f6a4c9f LibHTTP: Ensure we use the Vary key when updating last access time
Apparently, sqlite will fill this placeholder value in with NULL if we
do not pass a value. The query being executed here is:

    UPDATE CacheIndex
    SET last_access_time = ?
    WHERE cache_key = ? AND vary_key = ?;
2026-02-06 16:24:49 +01:00
Zaggy1024
4eb310cd3f LibWeb: Skip range requests for media if the server won't accept them
Currently, this just respects the reported value from Accept-Ranges,
but we could also just try sending a range request and see if the
server rejects it, then fall back to a normal request after. For now,
this is fine, and we can make it use a fallback later if needed.
2026-01-29 05:22:27 -06:00
Zaggy1024
99020b50a3 LibHTTP: Implement extraction of the Content-Range values in HeaderList 2026-01-29 05:22:27 -06:00