ladybird/Services/RequestServer/Cache/CacheIndex.h

/*
 * Copyright (c) 2025, Tim Flynn <trflynn89@ladybird.org>
 *
 * SPDX-License-Identifier: BSD-2-Clause
 */

#pragma once

#include <AK/Error.h>
#include <AK/HashMap.h>
#include <AK/Time.h>
#include <AK/Types.h>
#include <LibDatabase/Database.h>

namespace RequestServer {

// The cache index is a SQL database containing metadata about each cache entry. An entry in the index is created once
// the entire cache entry has been successfully written to disk.
class CacheIndex {
    struct Entry {
        u64 cache_key { 0 };

        String url;
        u64 data_size { 0 };

        UnixDateTime request_time;
        UnixDateTime response_time;
        UnixDateTime last_access_time;
    };

public:
    static ErrorOr<CacheIndex> create(Database::Database&);

    void create_entry(u64 cache_key, String url, u64 data_size, UnixDateTime request_time, UnixDateTime response_time);
    void remove_entry(u64 cache_key);
    void remove_all_entries();

    Optional<Entry&> find_entry(u64 cache_key);

    void update_last_access_time(u64 cache_key);

private:
    struct Statements {
        Database::StatementID insert_entry { 0 };
        Database::StatementID remove_entry { 0 };
        Database::StatementID remove_all_entries { 0 };
        Database::StatementID select_entry { 0 };
        Database::StatementID update_last_access_time { 0 };
    };

    CacheIndex(Database::Database&, Statements);

    Database::Database& m_database;
    Statements m_statements;

    HashMap<u32, Entry> m_entries;
};

}
LibRequests+RequestServer: Begin implementing an HTTP disk cache This adds a disk cache for HTTP responses received from the network. For now, we take a rather conservative approach to caching. We don't cache a response until we're 100% sure it is cacheable (there are heuristics we can implement in the future based on the absence of specific headers). The cache is broken into 2 categories of files: 1. An index file. This is a SQL database containing metadata about each cache entry (URL, timestamps, etc.). 2. Cache files. Each cached response is in its own file. The file is an amalgamation of all info needed to reconstruct an HTTP response. This includes the status code, headers, body, etc. A cache entry is created once we receive the headers for a response. The index, however, is not updated at this point. We stream the body into the cache entry as it is received. Once we've successfully cached the entire body, we create an index entry in the database. If any of these steps failed along the way, the cache entry is removed and the index is left untouched. Subsequent requests are checked for cache hits from the index. If a hit is found, we read just enough of the cache entry to inform WebContent of the status code and headers. The body of the response is piped to WC via syscalls, such that the transfer happens entirely in the kernel; no need to allocate the memory for the body in userspace (WC still allocates a buffer to hold the data, of course). If an error occurs while piping the body, we currently error out the request. There is a FIXME to switch to a network request. Cache hits are also validated for freshness before they are used. If a response has expired, we remove it and its index entry, and proceed with a network request. 2025-10-07 19:59:21 -04:00			`/*`
			`* Copyright (c) 2025, Tim Flynn <trflynn89@ladybird.org>`
			`*`
			`* SPDX-License-Identifier: BSD-2-Clause`
			`*/`

			`#pragma once`

			`#include <AK/Error.h>`
			`#include <AK/HashMap.h>`
			`#include <AK/Time.h>`
			`#include <AK/Types.h>`
			`#include <LibDatabase/Database.h>`

			`namespace RequestServer {`

			`// The cache index is a SQL database containing metadata about each cache entry. An entry in the index is created once`
			`// the entire cache entry has been successfully written to disk.`
			`class CacheIndex {`
			`struct Entry {`
			`u64 cache_key { 0 };`

			`String url;`
			`u64 data_size { 0 };`

			`UnixDateTime request_time;`
			`UnixDateTime response_time;`
			`UnixDateTime last_access_time;`
			`};`

			`public:`
			`static ErrorOr<CacheIndex> create(Database::Database&);`

			`void create_entry(u64 cache_key, String url, u64 data_size, UnixDateTime request_time, UnixDateTime response_time);`
			`void remove_entry(u64 cache_key);`
LibWebView+RequestServer: Support clearing the HTTP disk cache This is a bit of a blunt hammer, but this hooks an action to clear the HTTP disk cache into the existing Clear Cache action. Upon invocation, it stops all existing cache entries from making further progress, and then deletes the entire cache index and all cache files. In the future, we will of course want more fine-grained control over cache deletion, e.g. via an about:history page. 2025-10-09 14:24:47 -04:00			`void remove_all_entries();`
LibRequests+RequestServer: Begin implementing an HTTP disk cache This adds a disk cache for HTTP responses received from the network. For now, we take a rather conservative approach to caching. We don't cache a response until we're 100% sure it is cacheable (there are heuristics we can implement in the future based on the absence of specific headers). The cache is broken into 2 categories of files: 1. An index file. This is a SQL database containing metadata about each cache entry (URL, timestamps, etc.). 2. Cache files. Each cached response is in its own file. The file is an amalgamation of all info needed to reconstruct an HTTP response. This includes the status code, headers, body, etc. A cache entry is created once we receive the headers for a response. The index, however, is not updated at this point. We stream the body into the cache entry as it is received. Once we've successfully cached the entire body, we create an index entry in the database. If any of these steps failed along the way, the cache entry is removed and the index is left untouched. Subsequent requests are checked for cache hits from the index. If a hit is found, we read just enough of the cache entry to inform WebContent of the status code and headers. The body of the response is piped to WC via syscalls, such that the transfer happens entirely in the kernel; no need to allocate the memory for the body in userspace (WC still allocates a buffer to hold the data, of course). If an error occurs while piping the body, we currently error out the request. There is a FIXME to switch to a network request. Cache hits are also validated for freshness before they are used. If a response has expired, we remove it and its index entry, and proceed with a network request. 2025-10-07 19:59:21 -04:00
			`Optional<Entry&> find_entry(u64 cache_key);`

			`void update_last_access_time(u64 cache_key);`

			`private:`
			`struct Statements {`
			`Database::StatementID insert_entry { 0 };`
			`Database::StatementID remove_entry { 0 };`
LibWebView+RequestServer: Support clearing the HTTP disk cache This is a bit of a blunt hammer, but this hooks an action to clear the HTTP disk cache into the existing Clear Cache action. Upon invocation, it stops all existing cache entries from making further progress, and then deletes the entire cache index and all cache files. In the future, we will of course want more fine-grained control over cache deletion, e.g. via an about:history page. 2025-10-09 14:24:47 -04:00			`Database::StatementID remove_all_entries { 0 };`
LibRequests+RequestServer: Begin implementing an HTTP disk cache This adds a disk cache for HTTP responses received from the network. For now, we take a rather conservative approach to caching. We don't cache a response until we're 100% sure it is cacheable (there are heuristics we can implement in the future based on the absence of specific headers). The cache is broken into 2 categories of files: 1. An index file. This is a SQL database containing metadata about each cache entry (URL, timestamps, etc.). 2. Cache files. Each cached response is in its own file. The file is an amalgamation of all info needed to reconstruct an HTTP response. This includes the status code, headers, body, etc. A cache entry is created once we receive the headers for a response. The index, however, is not updated at this point. We stream the body into the cache entry as it is received. Once we've successfully cached the entire body, we create an index entry in the database. If any of these steps failed along the way, the cache entry is removed and the index is left untouched. Subsequent requests are checked for cache hits from the index. If a hit is found, we read just enough of the cache entry to inform WebContent of the status code and headers. The body of the response is piped to WC via syscalls, such that the transfer happens entirely in the kernel; no need to allocate the memory for the body in userspace (WC still allocates a buffer to hold the data, of course). If an error occurs while piping the body, we currently error out the request. There is a FIXME to switch to a network request. Cache hits are also validated for freshness before they are used. If a response has expired, we remove it and its index entry, and proceed with a network request. 2025-10-07 19:59:21 -04:00			`Database::StatementID select_entry { 0 };`
			`Database::StatementID update_last_access_time { 0 };`
			`};`

			`CacheIndex(Database::Database&, Statements);`

			`Database::Database& m_database;`
			`Statements m_statements;`

			`HashMap<u32, Entry> m_entries;`
			`};`

			`}`