This is a bit of a blunt hammer, but this hooks an action to clear the
HTTP disk cache into the existing Clear Cache action. Upon invocation,
it stops all existing cache entries from making further progress, and
then deletes the entire cache index and all cache files.
In the future, we will of course want more fine-grained control over
cache deletion, e.g. via an about:history page.
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
This adds an overlay port for curl that adds the features required for
HTTP/3.
This is not quite compatible with the upstream vcpkg.json, because
enabling HTTP/2 makes it use the default SSL backend, which is
sectransp for macOS and schannel on Windows. These backends are not
compatible with ngtcp2. Additionally, we can not build curl with
multiple SSL backends when using ngtcp2.
I couldn't find a way to selectively disable/enable dependencies based
on what features are enabled, so I made HTTP/2 pick OpenSSL in our
overlay port. Upstream vcpkg will likely want to support wolfSSL and
GnuTLS backends for ngtcp2, so they'll be additional work to get this
into upstream.
By default, if multiple requests start to a newly seen origin, curl
will not wait for a connection to open to figure out if the server
supports multiplexing and will instead open a new connection for each
request (including a new TLS session and such)
This is particularly an issue for initial page load, where a complex
website could, for example, request tens of items at once (e.g. a bunch
of scripts).
We can be kinder to servers that support multiplexing by telling curl
to wait till an initial connection is established to determine if
multiplexing is supported.
On my machine and internet connection, this reduces the amount of
connections to github.githubassets.com on initial load of
https://github.com/LadybirdBrowser/ladybird from 12 to 2.
Without these fixes, RequestServer was likely to crash if the client
crashed (e.g. WebContent). This was because there was no error handling
for when writing to the client failed.
This is particularly an issue because RequestServer has shared
instances, so it would then crash every other client of RequestServer.
Then, because another RequestServer instance is not currently spun up,
it becomes impossible to start any clients that need a RequestServer
instance. Recreating a RequestServer should also be handled, but that's
not in the scope of this change.
We can tell curl that we failed to write data to the client and that
the request should be aborted by returning `CURL_WRITEFUNC_ERROR` from
the write callback.
It is also possible for requests to be destroyed with buffered data,
which is normal to happen if the client disappears
(i.e. ConnectionFromClient is destroyed) or the request is cancelled by
the client. We log a warning in case this is not expected, to assist
with debugging related issues.
...to become writable.
Solves triangular deadlock problem that happened in the following case
(copied from https://github.com/LadybirdBrowser/ladybird/issues/1816):
- The WebContent process is spinning on
`send_sync_but_allow_failure` waiting for the UI process to respond
- The UI process is spinning on `send_sync_but_allow_failure`, waiting
for RequestServer to respond
- RequestServer is stuck in this loop, trying to write to the
WebContent's socket file (when I attach to RS, we are always in the
sched_yield call, so we're spinning on EAGAIN).
For me the issue was reliably reproducible on Google Maps and with this
change we no longer deadlock there.
The assertion in `WebSocketImplCurl::did_connect()` keeps failing for
multiple websockets when loading `https://www.speedtest.net/` since
commit 14ebcd4. This fixes that by checking and returning false if
something went wrong and letting the caller function handle it.
The HTTPS server used by WPT will close TLS connections without sending
a "close notify" alert. For responses that did not have a Content-Length
header, curl treats this as an error.
Instead of wrapping all non-movable members of TransportSocket in OwnPtr
to keep it movable, make TransportSocket itself non-movable and wrap it
in OwnPtr.
This is no longer needed since `IPv6Address::from_string` supports
square brackets. After the update to curl, `CURLOPT_RESOLVE` now
supports replacing IPv6 hosts as well.
This has been a longstanding ergonomic issue with our IPC compiler. Non-
trivial types were previously passed by const&. So if we wanted to avoid
expensive copies, we would have to const_cast and move the data.
We now pass ownership of all transferred data to the client subclasses.
This allows us to remove const_cast from these methods, and allows us to
avoid some trivial expensive copies that we didn't bother to const_cast.
Before, if something went wrong with DNS lookup and there were unrelated
records (i.e. not A or AAAA) then we would still attempt to build a
resolve list. This resulted in curl errors related to the option itself
and displayed as "unknown network error" to the user.
Previously, we leaked the `curl_slist`s on every request. This also
validates the pointer we get from `curl_slist_append` before setting the
option.
Also, use the `set_option` helper for CURLOPT_RESOLVE as it will print
when there is an error.
...and make sure it will eventually complete (or fail) by adding a
timeout retry sequence.
Fixes an issue where RequestServer would stick around after exit,
waiting for piled up DNS requests for a long time.