LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
/*
* Copyright ( c ) 2025 , Tim Flynn < trflynn89 @ ladybird . org >
*
* SPDX - License - Identifier : BSD - 2 - Clause
*/
2025-12-01 15:15:45 -05:00
# include <AK/Debug.h>
2025-10-29 15:36:33 -04:00
# include <AK/StringBuilder.h>
2025-11-28 10:04:59 -05:00
# include <LibHTTP/Cache/CacheIndex.h>
# include <LibHTTP/Cache/Utilities.h>
# include <LibHTTP/Cache/Version.h>
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
2025-11-28 10:04:59 -05:00
namespace HTTP {
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
2025-10-29 12:39:08 -04:00
static constexpr u32 CACHE_METADATA_KEY = 12389u ;
2025-11-28 10:04:59 -05:00
static ByteString serialize_headers ( HeaderList const & headers )
2025-10-29 15:36:33 -04:00
{
StringBuilder builder ;
2025-11-26 14:13:23 -05:00
for ( auto const & header : headers ) {
2025-10-29 15:36:33 -04:00
builder . append ( header . name ) ;
builder . append ( ' : ' ) ;
builder . append ( header . value ) ;
builder . append ( ' \n ' ) ;
}
return builder . to_byte_string ( ) ;
}
2025-11-28 10:04:59 -05:00
static NonnullRefPtr < HeaderList > deserialize_headers ( StringView serialized_headers )
2025-10-29 15:36:33 -04:00
{
2025-11-28 10:04:59 -05:00
auto headers = HeaderList : : create ( ) ;
2025-10-29 15:36:33 -04:00
serialized_headers . for_each_split_view ( ' \n ' , SplitBehavior : : Nothing , [ & ] ( StringView serialized_header ) {
auto index = serialized_header . find ( ' : ' ) ;
if ( ! index . has_value ( ) )
return ;
auto name = serialized_header . substring_view ( 0 , * index ) . trim_whitespace ( ) ;
if ( is_header_exempted_from_storage ( name ) )
return ;
auto value = serialized_header . substring_view ( * index + 1 ) . trim_whitespace ( ) ;
2025-11-26 14:13:23 -05:00
headers - > append ( { name , value } ) ;
2025-10-29 15:36:33 -04:00
} ) ;
return headers ;
}
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
ErrorOr < CacheIndex > CacheIndex : : create ( Database : : Database & database )
{
2025-10-29 12:39:08 -04:00
auto create_cache_metadata_table = TRY ( database . prepare_statement ( R " #(
CREATE TABLE IF NOT EXISTS CacheMetadata (
metadata_key INTEGER ,
version INTEGER ,
PRIMARY KEY ( metadata_key )
) ;
) # " sv));
database . execute_statement ( create_cache_metadata_table , { } ) ;
auto read_cache_version = TRY ( database . prepare_statement ( " SELECT version FROM CacheMetadata WHERE metadata_key = ?; " sv ) ) ;
auto cache_version = 0u ;
database . execute_statement (
read_cache_version ,
[ & ] ( auto statement_id ) { cache_version = database . result_column < u32 > ( statement_id , 0 ) ; } ,
CACHE_METADATA_KEY ) ;
if ( cache_version ! = CACHE_VERSION ) {
2025-11-20 15:11:01 -05:00
if ( cache_version ! = 0 )
2025-12-01 15:15:45 -05:00
dbgln_if ( HTTP_DISK_CACHE_DEBUG , " \033 [31;1mDisk cache version mismatch: \033 [0m stored version = {}, new version = {} " , cache_version , CACHE_VERSION ) ;
2025-10-29 12:39:08 -04:00
// FIXME: We should more elegantly handle minor changes, i.e. use ALTER TABLE to add fields to CacheIndex.
auto delete_cache_index_table = TRY ( database . prepare_statement ( " DROP TABLE IF EXISTS CacheIndex; " sv ) ) ;
database . execute_statement ( delete_cache_index_table , { } ) ;
auto set_cache_version = TRY ( database . prepare_statement ( " INSERT OR REPLACE INTO CacheMetadata VALUES (?, ?); " sv ) ) ;
database . execute_statement ( set_cache_version , { } , CACHE_METADATA_KEY , CACHE_VERSION ) ;
}
auto create_cache_index_table = TRY ( database . prepare_statement ( R " #(
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
CREATE TABLE IF NOT EXISTS CacheIndex (
cache_key INTEGER ,
url TEXT ,
2025-10-29 15:36:33 -04:00
response_headers TEXT ,
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
data_size INTEGER ,
request_time INTEGER ,
response_time INTEGER ,
last_access_time INTEGER ,
PRIMARY KEY ( cache_key )
2025-10-29 12:39:08 -04:00
) ;
) # " sv));
database . execute_statement ( create_cache_index_table , { } ) ;
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
Statements statements { } ;
2025-10-29 15:36:33 -04:00
statements . insert_entry = TRY ( database . prepare_statement ( " INSERT OR REPLACE INTO CacheIndex VALUES (?, ?, ?, ?, ?, ?, ?); " sv ) ) ;
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
statements . remove_entry = TRY ( database . prepare_statement ( " DELETE FROM CacheIndex WHERE cache_key = ?; " sv ) ) ;
2025-11-02 16:44:30 -05:00
statements . remove_entries_accessed_since = TRY ( database . prepare_statement ( " DELETE FROM CacheIndex WHERE last_access_time >= ? RETURNING cache_key; " sv ) ) ;
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
statements . select_entry = TRY ( database . prepare_statement ( " SELECT * FROM CacheIndex WHERE cache_key = ?; " sv ) ) ;
2025-10-28 17:09:35 -04:00
statements . update_response_headers = TRY ( database . prepare_statement ( " UPDATE CacheIndex SET response_headers = ? WHERE cache_key = ?; " sv ) ) ;
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
statements . update_last_access_time = TRY ( database . prepare_statement ( " UPDATE CacheIndex SET last_access_time = ? WHERE cache_key = ?; " sv ) ) ;
2025-11-02 13:10:27 -05:00
statements . estimate_cache_size_accessed_since = TRY ( database . prepare_statement ( " SELECT SUM(data_size) + SUM(OCTET_LENGTH(response_headers)) FROM CacheIndex WHERE last_access_time >= ?; " sv ) ) ;
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
return CacheIndex { database , statements } ;
}
CacheIndex : : CacheIndex ( Database : : Database & database , Statements statements )
: m_database ( database )
, m_statements ( statements )
{
}
2025-11-28 10:04:59 -05:00
void CacheIndex : : create_entry ( u64 cache_key , String url , NonnullRefPtr < HeaderList > response_headers , u64 data_size , UnixDateTime request_time , UnixDateTime response_time )
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
{
auto now = UnixDateTime : : now ( ) ;
2025-11-26 14:13:23 -05:00
for ( size_t i = 0 ; i < response_headers - > headers ( ) . size ( ) ; ) {
auto const & header = response_headers - > headers ( ) [ i ] ;
2025-11-16 11:49:06 -05:00
if ( is_header_exempted_from_storage ( header . name ) )
2025-11-26 14:13:23 -05:00
response_headers - > delete_ ( header . name ) ;
2025-11-16 11:49:06 -05:00
else
+ + i ;
}
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
Entry entry {
. cache_key = cache_key ,
. url = move ( url ) ,
2025-10-29 15:36:33 -04:00
. response_headers = move ( response_headers ) ,
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
. data_size = data_size ,
. request_time = request_time ,
. response_time = response_time ,
. last_access_time = now ,
} ;
2025-11-28 10:04:59 -05:00
m_database - > execute_statement ( m_statements . insert_entry , { } , entry . cache_key , entry . url , serialize_headers ( entry . response_headers ) , entry . data_size , entry . request_time , entry . response_time , entry . last_access_time ) ;
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
m_entries . set ( cache_key , move ( entry ) ) ;
}
void CacheIndex : : remove_entry ( u64 cache_key )
{
2025-11-28 10:04:59 -05:00
m_database - > execute_statement ( m_statements . remove_entry , { } , cache_key ) ;
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
m_entries . remove ( cache_key ) ;
}
2025-11-02 16:44:30 -05:00
void CacheIndex : : remove_entries_accessed_since ( UnixDateTime since , Function < void ( u64 cache_key ) > on_entry_removed )
2025-10-09 14:24:47 -04:00
{
2025-11-28 10:04:59 -05:00
m_database - > execute_statement (
2025-11-02 16:44:30 -05:00
m_statements . remove_entries_accessed_since ,
[ & ] ( auto statement_id ) {
2025-11-28 10:04:59 -05:00
auto cache_key = m_database - > result_column < u64 > ( statement_id , 0 ) ;
2025-11-02 16:44:30 -05:00
m_entries . remove ( cache_key ) ;
on_entry_removed ( cache_key ) ;
} ,
since ) ;
2025-10-09 14:24:47 -04:00
}
2025-11-28 10:04:59 -05:00
void CacheIndex : : update_response_headers ( u64 cache_key , NonnullRefPtr < HeaderList > response_headers )
2025-10-28 17:09:35 -04:00
{
auto entry = m_entries . get ( cache_key ) ;
if ( ! entry . has_value ( ) )
return ;
2025-11-28 10:04:59 -05:00
m_database - > execute_statement ( m_statements . update_response_headers , { } , serialize_headers ( response_headers ) , cache_key ) ;
2025-10-28 17:09:35 -04:00
entry - > response_headers = move ( response_headers ) ;
}
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
void CacheIndex : : update_last_access_time ( u64 cache_key )
{
auto entry = m_entries . get ( cache_key ) ;
if ( ! entry . has_value ( ) )
return ;
auto now = UnixDateTime : : now ( ) ;
2025-11-28 10:04:59 -05:00
m_database - > execute_statement ( m_statements . update_last_access_time , { } , now , cache_key ) ;
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
entry - > last_access_time = now ;
}
Optional < CacheIndex : : Entry & > CacheIndex : : find_entry ( u64 cache_key )
{
if ( auto entry = m_entries . get ( cache_key ) ; entry . has_value ( ) )
return entry ;
2025-11-28 10:04:59 -05:00
m_database - > execute_statement (
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
m_statements . select_entry , [ & ] ( auto statement_id ) {
int column = 0 ;
2025-11-28 10:04:59 -05:00
auto cache_key = m_database - > result_column < u64 > ( statement_id , column + + ) ;
auto url = m_database - > result_column < String > ( statement_id , column + + ) ;
auto response_headers = m_database - > result_column < ByteString > ( statement_id , column + + ) ;
auto data_size = m_database - > result_column < u64 > ( statement_id , column + + ) ;
auto request_time = m_database - > result_column < UnixDateTime > ( statement_id , column + + ) ;
auto response_time = m_database - > result_column < UnixDateTime > ( statement_id , column + + ) ;
auto last_access_time = m_database - > result_column < UnixDateTime > ( statement_id , column + + ) ;
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
2025-10-29 15:36:33 -04:00
Entry entry { cache_key , move ( url ) , deserialize_headers ( response_headers ) , data_size , request_time , response_time , last_access_time } ;
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
m_entries . set ( cache_key , move ( entry ) ) ;
} ,
cache_key ) ;
return m_entries . get ( cache_key ) ;
}
2025-11-28 10:04:59 -05:00
Requests : : CacheSizes CacheIndex : : estimate_cache_size_accessed_since ( UnixDateTime since )
2025-11-02 13:10:27 -05:00
{
Requests : : CacheSizes sizes ;
2025-11-28 10:04:59 -05:00
m_database - > execute_statement (
2025-11-02 13:10:27 -05:00
m_statements . estimate_cache_size_accessed_since ,
2025-11-28 10:04:59 -05:00
[ & ] ( auto statement_id ) { sizes . since_requested_time = m_database - > result_column < u64 > ( statement_id , 0 ) ; } ,
2025-11-02 13:10:27 -05:00
since ) ;
2025-11-28 10:04:59 -05:00
m_database - > execute_statement (
2025-11-02 13:10:27 -05:00
m_statements . estimate_cache_size_accessed_since ,
2025-11-28 10:04:59 -05:00
[ & ] ( auto statement_id ) { sizes . total = m_database - > result_column < u64 > ( statement_id , 0 ) ; } ,
2025-11-02 13:10:27 -05:00
UnixDateTime : : earliest ( ) ) ;
return sizes ;
}
LibRequests+RequestServer: Begin implementing an HTTP disk cache
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
2025-10-07 19:59:21 -04:00
}