LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
/*
|
|
|
|
|
* Copyright (c) 2025, Tim Flynn <trflynn89@ladybird.org>
|
|
|
|
|
*
|
|
|
|
|
* SPDX-License-Identifier: BSD-2-Clause
|
|
|
|
|
*/
|
|
|
|
|
|
2025-11-20 15:12:05 -05:00
|
|
|
#include <AK/HashFunctions.h>
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
#include <AK/ScopeGuard.h>
|
|
|
|
|
#include <LibCore/System.h>
|
|
|
|
|
#include <LibFileSystem/FileSystem.h>
|
2025-11-28 10:04:59 -05:00
|
|
|
#include <LibHTTP/Cache/CacheEntry.h>
|
|
|
|
|
#include <LibHTTP/Cache/CacheIndex.h>
|
|
|
|
|
#include <LibHTTP/Cache/DiskCache.h>
|
|
|
|
|
#include <LibHTTP/Cache/Utilities.h>
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
|
2025-11-28 10:04:59 -05:00
|
|
|
namespace HTTP {
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
|
|
|
|
|
ErrorOr<CacheHeader> CacheHeader::read_from_stream(Stream& stream)
|
|
|
|
|
{
|
|
|
|
|
CacheHeader header;
|
|
|
|
|
header.magic = TRY(stream.read_value<u32>());
|
|
|
|
|
header.version = TRY(stream.read_value<u32>());
|
2025-11-20 15:12:05 -05:00
|
|
|
header.key_hash = TRY(stream.read_value<u32>());
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
header.url_size = TRY(stream.read_value<u32>());
|
|
|
|
|
header.url_hash = TRY(stream.read_value<u32>());
|
|
|
|
|
header.status_code = TRY(stream.read_value<u32>());
|
|
|
|
|
header.reason_phrase_size = TRY(stream.read_value<u32>());
|
|
|
|
|
header.reason_phrase_hash = TRY(stream.read_value<u32>());
|
|
|
|
|
return header;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
ErrorOr<void> CacheHeader::write_to_stream(Stream& stream) const
|
|
|
|
|
{
|
|
|
|
|
TRY(stream.write_value(magic));
|
|
|
|
|
TRY(stream.write_value(version));
|
2025-11-20 15:12:05 -05:00
|
|
|
TRY(stream.write_value(key_hash));
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
TRY(stream.write_value(url_size));
|
|
|
|
|
TRY(stream.write_value(url_hash));
|
|
|
|
|
TRY(stream.write_value(status_code));
|
|
|
|
|
TRY(stream.write_value(reason_phrase_size));
|
|
|
|
|
TRY(stream.write_value(reason_phrase_hash));
|
|
|
|
|
return {};
|
|
|
|
|
}
|
|
|
|
|
|
2025-11-20 15:12:05 -05:00
|
|
|
u32 CacheHeader::hash() const
|
|
|
|
|
{
|
|
|
|
|
u32 hash = 0;
|
|
|
|
|
hash = pair_int_hash(hash, magic);
|
|
|
|
|
hash = pair_int_hash(hash, version);
|
|
|
|
|
hash = pair_int_hash(hash, key_hash);
|
|
|
|
|
hash = pair_int_hash(hash, url_size);
|
|
|
|
|
hash = pair_int_hash(hash, url_hash);
|
|
|
|
|
hash = pair_int_hash(hash, status_code);
|
|
|
|
|
hash = pair_int_hash(hash, reason_phrase_size);
|
|
|
|
|
hash = pair_int_hash(hash, reason_phrase_hash);
|
|
|
|
|
return hash;
|
|
|
|
|
}
|
|
|
|
|
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
ErrorOr<void> CacheFooter::write_to_stream(Stream& stream) const
|
|
|
|
|
{
|
|
|
|
|
TRY(stream.write_value(data_size));
|
2025-11-20 15:12:05 -05:00
|
|
|
TRY(stream.write_value(header_hash));
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
return {};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
ErrorOr<CacheFooter> CacheFooter::read_from_stream(Stream& stream)
|
|
|
|
|
{
|
|
|
|
|
CacheFooter footer;
|
|
|
|
|
footer.data_size = TRY(stream.read_value<u64>());
|
2025-11-20 15:12:05 -05:00
|
|
|
footer.header_hash = TRY(stream.read_value<u32>());
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
return footer;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
CacheEntry::CacheEntry(DiskCache& disk_cache, CacheIndex& index, u64 cache_key, String url, LexicalPath path, CacheHeader cache_header)
|
|
|
|
|
: m_disk_cache(disk_cache)
|
|
|
|
|
, m_index(index)
|
|
|
|
|
, m_cache_key(cache_key)
|
|
|
|
|
, m_url(move(url))
|
|
|
|
|
, m_path(move(path))
|
|
|
|
|
, m_cache_header(cache_header)
|
|
|
|
|
{
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void CacheEntry::remove()
|
|
|
|
|
{
|
|
|
|
|
(void)FileSystem::remove(m_path.string(), FileSystem::RecursionMode::Disallowed);
|
|
|
|
|
m_index.remove_entry(m_cache_key);
|
|
|
|
|
}
|
|
|
|
|
|
2025-10-27 07:31:23 -04:00
|
|
|
void CacheEntry::close_and_destroy_cache_entry()
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
{
|
|
|
|
|
m_disk_cache.cache_entry_closed({}, *this);
|
|
|
|
|
}
|
|
|
|
|
|
2025-11-18 10:36:55 -05:00
|
|
|
ErrorOr<NonnullOwnPtr<CacheEntryWriter>> CacheEntryWriter::create(DiskCache& disk_cache, CacheIndex& index, u64 cache_key, String url, UnixDateTime request_time, AK::Duration current_time_offset_for_testing)
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
{
|
|
|
|
|
auto path = path_for_cache_key(disk_cache.cache_directory(), cache_key);
|
|
|
|
|
|
|
|
|
|
auto unbuffered_file = TRY(Core::File::open(path.string(), Core::File::OpenMode::Write));
|
|
|
|
|
auto file = TRY(Core::OutputBufferedFile::create(move(unbuffered_file)));
|
|
|
|
|
|
|
|
|
|
CacheHeader cache_header;
|
2025-11-20 15:12:05 -05:00
|
|
|
cache_header.key_hash = u64_hash(cache_key);
|
2025-10-24 10:43:47 -04:00
|
|
|
cache_header.url_size = url.byte_count();
|
|
|
|
|
cache_header.url_hash = url.hash();
|
|
|
|
|
|
2025-11-18 10:36:55 -05:00
|
|
|
return adopt_own(*new CacheEntryWriter { disk_cache, index, cache_key, move(url), move(path), move(file), cache_header, request_time, current_time_offset_for_testing });
|
2025-10-24 10:43:47 -04:00
|
|
|
}
|
|
|
|
|
|
2025-11-18 10:36:55 -05:00
|
|
|
CacheEntryWriter::CacheEntryWriter(DiskCache& disk_cache, CacheIndex& index, u64 cache_key, String url, LexicalPath path, NonnullOwnPtr<Core::OutputBufferedFile> file, CacheHeader cache_header, UnixDateTime request_time, AK::Duration current_time_offset_for_testing)
|
2025-10-24 10:43:47 -04:00
|
|
|
: CacheEntry(disk_cache, index, cache_key, move(url), move(path), cache_header)
|
|
|
|
|
, m_file(move(file))
|
|
|
|
|
, m_request_time(request_time)
|
2025-11-18 10:36:55 -05:00
|
|
|
, m_response_time(UnixDateTime::now() + current_time_offset_for_testing)
|
|
|
|
|
, m_current_time_offset_for_testing(current_time_offset_for_testing)
|
2025-10-24 10:43:47 -04:00
|
|
|
{
|
|
|
|
|
}
|
|
|
|
|
|
2025-11-28 10:04:59 -05:00
|
|
|
ErrorOr<void> CacheEntryWriter::write_status_and_reason(u32 status_code, Optional<String> reason_phrase, HeaderList const& response_headers)
|
2025-10-24 10:43:47 -04:00
|
|
|
{
|
|
|
|
|
if (m_marked_for_deletion) {
|
|
|
|
|
close_and_destroy_cache_entry();
|
|
|
|
|
return Error::from_string_literal("Cache entry has been deleted");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
m_cache_header.status_code = status_code;
|
|
|
|
|
|
|
|
|
|
if (reason_phrase.has_value()) {
|
|
|
|
|
m_cache_header.reason_phrase_size = reason_phrase->byte_count();
|
|
|
|
|
m_cache_header.reason_phrase_hash = reason_phrase->hash();
|
|
|
|
|
}
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
|
|
|
|
|
auto result = [&]() -> ErrorOr<void> {
|
2025-10-30 07:24:05 -04:00
|
|
|
if (!is_cacheable(status_code, response_headers))
|
2025-10-24 10:43:47 -04:00
|
|
|
return Error::from_string_literal("Response is not cacheable");
|
|
|
|
|
|
2025-11-18 10:36:55 -05:00
|
|
|
auto freshness_lifetime = calculate_freshness_lifetime(status_code, response_headers, m_current_time_offset_for_testing);
|
|
|
|
|
auto current_age = calculate_age(response_headers, m_request_time, m_response_time, m_current_time_offset_for_testing);
|
2025-10-28 17:09:35 -04:00
|
|
|
|
|
|
|
|
// We can cache already-expired responses if there are other cache directives that allow us to revalidate the
|
|
|
|
|
// response on subsequent requests. For example, `Cache-Control: max-age=0, must-revalidate`.
|
|
|
|
|
if (cache_lifetime_status(response_headers, freshness_lifetime, current_age) == CacheLifetimeStatus::Expired)
|
2025-10-24 10:43:47 -04:00
|
|
|
return Error::from_string_literal("Response has already expired");
|
|
|
|
|
|
|
|
|
|
TRY(m_file->write_value(m_cache_header));
|
|
|
|
|
TRY(m_file->write_until_depleted(m_url));
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
if (reason_phrase.has_value())
|
2025-10-24 10:43:47 -04:00
|
|
|
TRY(m_file->write_until_depleted(*reason_phrase));
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
|
|
|
|
|
return {};
|
|
|
|
|
}();
|
|
|
|
|
|
|
|
|
|
if (result.is_error()) {
|
2025-10-29 15:36:33 -04:00
|
|
|
dbgln("\033[31;1mUnable to write status/reason to cache entry for\033[0m {}: {}", m_url, result.error());
|
2025-10-24 10:43:47 -04:00
|
|
|
|
|
|
|
|
remove();
|
|
|
|
|
close_and_destroy_cache_entry();
|
|
|
|
|
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
return result.release_error();
|
|
|
|
|
}
|
|
|
|
|
|
2025-10-24 10:43:47 -04:00
|
|
|
return {};
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
ErrorOr<void> CacheEntryWriter::write_data(ReadonlyBytes data)
|
|
|
|
|
{
|
2025-10-09 14:24:47 -04:00
|
|
|
if (m_marked_for_deletion) {
|
2025-10-27 07:31:23 -04:00
|
|
|
close_and_destroy_cache_entry();
|
2025-10-09 14:24:47 -04:00
|
|
|
return Error::from_string_literal("Cache entry has been deleted");
|
|
|
|
|
}
|
|
|
|
|
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
if (auto result = m_file->write_until_depleted(data); result.is_error()) {
|
2025-10-24 10:43:47 -04:00
|
|
|
dbgln("\033[31;1mUnable to write data to cache entry for\033[0m {}: {}", m_url, result.error());
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
|
|
|
|
|
remove();
|
2025-10-27 07:31:23 -04:00
|
|
|
close_and_destroy_cache_entry();
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
|
|
|
|
|
return result.release_error();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
m_cache_footer.data_size += data.size();
|
|
|
|
|
return {};
|
|
|
|
|
}
|
|
|
|
|
|
2025-11-28 10:04:59 -05:00
|
|
|
ErrorOr<void> CacheEntryWriter::flush(NonnullRefPtr<HeaderList> response_headers)
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
{
|
2025-10-27 07:31:23 -04:00
|
|
|
ScopeGuard guard { [&]() { close_and_destroy_cache_entry(); } };
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
|
2025-10-09 14:24:47 -04:00
|
|
|
if (m_marked_for_deletion)
|
|
|
|
|
return Error::from_string_literal("Cache entry has been deleted");
|
|
|
|
|
|
2025-11-20 15:12:05 -05:00
|
|
|
m_cache_footer.header_hash = m_cache_header.hash();
|
|
|
|
|
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
if (auto result = m_file->write_value(m_cache_footer); result.is_error()) {
|
2025-10-25 08:15:15 -04:00
|
|
|
dbgln("\033[31;1mUnable to flush cache entry for\033[0m {}: {}", m_url, result.error());
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
remove();
|
|
|
|
|
|
|
|
|
|
return result.release_error();
|
|
|
|
|
}
|
|
|
|
|
|
2025-10-30 07:24:05 -04:00
|
|
|
m_index.create_entry(m_cache_key, m_url, move(response_headers), m_cache_footer.data_size, m_request_time, m_response_time);
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
|
|
|
|
|
dbgln("\033[34;1mFinished caching\033[0m {} ({} bytes)", m_url, m_cache_footer.data_size);
|
|
|
|
|
return {};
|
|
|
|
|
}
|
|
|
|
|
|
2025-11-28 10:04:59 -05:00
|
|
|
ErrorOr<NonnullOwnPtr<CacheEntryReader>> CacheEntryReader::create(DiskCache& disk_cache, CacheIndex& index, u64 cache_key, NonnullRefPtr<HeaderList> response_headers, u64 data_size)
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
{
|
|
|
|
|
auto path = path_for_cache_key(disk_cache.cache_directory(), cache_key);
|
|
|
|
|
|
|
|
|
|
auto file = TRY(Core::File::open(path.string(), Core::File::OpenMode::Read));
|
|
|
|
|
auto fd = file->fd();
|
|
|
|
|
|
|
|
|
|
CacheHeader cache_header;
|
2025-11-18 15:24:24 -05:00
|
|
|
size_t cache_header_size { 0 };
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
|
|
|
|
|
String url;
|
|
|
|
|
Optional<String> reason_phrase;
|
|
|
|
|
|
|
|
|
|
auto result = [&]() -> ErrorOr<void> {
|
|
|
|
|
cache_header = TRY(file->read_value<CacheHeader>());
|
2025-11-18 15:24:24 -05:00
|
|
|
cache_header_size = TRY(file->tell());
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
|
|
|
|
|
if (cache_header.magic != CacheHeader::CACHE_MAGIC)
|
|
|
|
|
return Error::from_string_literal("Magic value mismatch");
|
2025-10-29 12:39:08 -04:00
|
|
|
if (cache_header.version != CACHE_VERSION)
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
return Error::from_string_literal("Version mismatch");
|
|
|
|
|
|
2025-11-20 15:12:05 -05:00
|
|
|
if (cache_header.key_hash != u64_hash(cache_key))
|
|
|
|
|
return Error::from_string_literal("Key hash mismatch");
|
|
|
|
|
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
url = TRY(String::from_stream(*file, cache_header.url_size));
|
|
|
|
|
if (url.hash() != cache_header.url_hash)
|
|
|
|
|
return Error::from_string_literal("URL hash mismatch");
|
|
|
|
|
|
|
|
|
|
if (cache_header.reason_phrase_size != 0) {
|
|
|
|
|
reason_phrase = TRY(String::from_stream(*file, cache_header.reason_phrase_size));
|
|
|
|
|
if (reason_phrase->hash() != cache_header.reason_phrase_hash)
|
|
|
|
|
return Error::from_string_literal("Reason phrase hash mismatch");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return {};
|
|
|
|
|
}();
|
|
|
|
|
|
|
|
|
|
if (result.is_error()) {
|
|
|
|
|
(void)FileSystem::remove(path.string(), FileSystem::RecursionMode::Disallowed);
|
|
|
|
|
return result.release_error();
|
|
|
|
|
}
|
|
|
|
|
|
2025-11-18 15:24:24 -05:00
|
|
|
auto data_offset = cache_header_size + cache_header.url_size + cache_header.reason_phrase_size;
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
|
2025-10-30 07:24:05 -04:00
|
|
|
return adopt_own(*new CacheEntryReader { disk_cache, index, cache_key, move(url), move(path), move(file), fd, cache_header, move(reason_phrase), move(response_headers), data_offset, data_size });
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
}
|
|
|
|
|
|
2025-11-28 10:04:59 -05:00
|
|
|
CacheEntryReader::CacheEntryReader(DiskCache& disk_cache, CacheIndex& index, u64 cache_key, String url, LexicalPath path, NonnullOwnPtr<Core::File> file, int fd, CacheHeader cache_header, Optional<String> reason_phrase, NonnullRefPtr<HeaderList> response_headers, u64 data_offset, u64 data_size)
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
: CacheEntry(disk_cache, index, cache_key, move(url), move(path), cache_header)
|
|
|
|
|
, m_file(move(file))
|
|
|
|
|
, m_fd(fd)
|
|
|
|
|
, m_reason_phrase(move(reason_phrase))
|
2025-10-30 07:24:05 -04:00
|
|
|
, m_response_headers(move(response_headers))
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
, m_data_offset(data_offset)
|
|
|
|
|
, m_data_size(data_size)
|
|
|
|
|
{
|
|
|
|
|
}
|
|
|
|
|
|
2025-11-28 10:04:59 -05:00
|
|
|
void CacheEntryReader::revalidation_succeeded(HeaderList const& response_headers)
|
2025-10-28 17:09:35 -04:00
|
|
|
{
|
|
|
|
|
dbgln("\033[34;1mCache revalidation succeeded for\033[0m {}", m_url);
|
|
|
|
|
|
|
|
|
|
update_header_fields(m_response_headers, response_headers);
|
|
|
|
|
m_index.update_response_headers(m_cache_key, m_response_headers);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void CacheEntryReader::revalidation_failed()
|
|
|
|
|
{
|
|
|
|
|
dbgln("\033[33;1mCache revalidation failed for\033[0m {}", m_url);
|
|
|
|
|
|
|
|
|
|
remove();
|
|
|
|
|
close_and_destroy_cache_entry();
|
|
|
|
|
}
|
|
|
|
|
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
void CacheEntryReader::pipe_to(int pipe_fd, Function<void(u64)> on_complete, Function<void(u64)> on_error)
|
|
|
|
|
{
|
|
|
|
|
VERIFY(m_pipe_fd == -1);
|
|
|
|
|
m_pipe_fd = pipe_fd;
|
|
|
|
|
|
|
|
|
|
m_on_pipe_complete = move(on_complete);
|
|
|
|
|
m_on_pipe_error = move(on_error);
|
|
|
|
|
|
2025-10-09 14:24:47 -04:00
|
|
|
if (m_marked_for_deletion) {
|
|
|
|
|
pipe_error(Error::from_string_literal("Cache entry has been deleted"));
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
m_pipe_write_notifier = Core::Notifier::construct(m_pipe_fd, Core::NotificationType::Write);
|
|
|
|
|
m_pipe_write_notifier->set_enabled(false);
|
|
|
|
|
|
|
|
|
|
m_pipe_write_notifier->on_activation = [this]() {
|
|
|
|
|
m_pipe_write_notifier->set_enabled(false);
|
|
|
|
|
pipe_without_blocking();
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
pipe_without_blocking();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void CacheEntryReader::pipe_without_blocking()
|
|
|
|
|
{
|
2025-10-09 14:24:47 -04:00
|
|
|
if (m_marked_for_deletion) {
|
|
|
|
|
pipe_error(Error::from_string_literal("Cache entry has been deleted"));
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
auto result = Core::System::transfer_file_through_pipe(m_fd, m_pipe_fd, m_data_offset + m_bytes_piped, m_data_size - m_bytes_piped);
|
|
|
|
|
|
|
|
|
|
if (result.is_error()) {
|
2025-10-09 14:24:47 -04:00
|
|
|
if (result.error().code() != EAGAIN && result.error().code() != EWOULDBLOCK)
|
|
|
|
|
pipe_error(result.release_error());
|
|
|
|
|
else
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
m_pipe_write_notifier->set_enabled(true);
|
|
|
|
|
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
m_bytes_piped += result.value();
|
|
|
|
|
|
|
|
|
|
if (m_bytes_piped == m_data_size) {
|
|
|
|
|
pipe_complete();
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
pipe_without_blocking();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void CacheEntryReader::pipe_complete()
|
|
|
|
|
{
|
|
|
|
|
if (auto result = read_and_validate_footer(); result.is_error()) {
|
|
|
|
|
dbgln("\033[31;1mError validating cache entry for\033[0m {}: {}", m_url, result.error());
|
|
|
|
|
remove();
|
|
|
|
|
|
|
|
|
|
if (m_on_pipe_error)
|
|
|
|
|
m_on_pipe_error(m_bytes_piped);
|
|
|
|
|
} else {
|
|
|
|
|
m_index.update_last_access_time(m_cache_key);
|
|
|
|
|
|
|
|
|
|
if (m_on_pipe_complete)
|
|
|
|
|
m_on_pipe_complete(m_bytes_piped);
|
|
|
|
|
}
|
|
|
|
|
|
2025-10-27 07:31:23 -04:00
|
|
|
close_and_destroy_cache_entry();
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
}
|
|
|
|
|
|
2025-10-09 14:24:47 -04:00
|
|
|
void CacheEntryReader::pipe_error(Error error)
|
|
|
|
|
{
|
|
|
|
|
dbgln("\033[31;1mError transferring cache to pipe for\033[0m {}: {}", m_url, error);
|
|
|
|
|
|
2025-10-15 12:56:00 -04:00
|
|
|
// FIXME: We may not want to actually remove the cache file for all errors. For now, let's assume the file is not
|
|
|
|
|
// useable at this point and remove it.
|
|
|
|
|
remove();
|
|
|
|
|
|
2025-10-09 14:24:47 -04:00
|
|
|
if (m_on_pipe_error)
|
|
|
|
|
m_on_pipe_error(m_bytes_piped);
|
|
|
|
|
|
2025-10-27 07:31:23 -04:00
|
|
|
close_and_destroy_cache_entry();
|
2025-10-09 14:24:47 -04:00
|
|
|
}
|
|
|
|
|
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
ErrorOr<void> CacheEntryReader::read_and_validate_footer()
|
|
|
|
|
{
|
|
|
|
|
TRY(m_file->seek(m_data_offset + m_data_size, SeekMode::SetPosition));
|
|
|
|
|
m_cache_footer = TRY(m_file->read_value<CacheFooter>());
|
|
|
|
|
|
|
|
|
|
if (m_cache_footer.data_size != m_data_size)
|
|
|
|
|
return Error::from_string_literal("Invalid data size in footer");
|
2025-11-20 15:12:05 -05:00
|
|
|
if (m_cache_footer.header_hash != m_cache_header.hash())
|
|
|
|
|
return Error::from_string_literal("Invalid header hash in footer");
|
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
|
|
|
|
|
|
|
|
return {};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
}
|