LibWeb: Throw out decoded UTF-32 data in HTMLTokenizer after parser runs

This ends up saving quite a bit of memory on many pages, since UTF-32 uses 4 bytes per code points. As an example, it reduces the footprint on https://gymgrossisten.com/ by 2 MiB.
Author: https://github.com/awesomekling Commit: 3593c3b687 Pull-request: https://github.com/LadybirdBrowser/ladybird/pull/6561
2025-12-08 06:09:58 +00:00 · 2025-10-23 21:45:00 +02:00 · 2025-10-23 21:45:00 +02:00 · 3593c3b687 · 2025-10-24 06:54:24 +00:00
commit 3593c3b687
parent b10f2993b3
3 changed files with 18 additions and 0 deletions
--- a/Libraries/LibWeb/HTML/Parser/HTMLTokenizer.h
+++ b/Libraries/LibWeb/HTML/Parser/HTMLTokenizer.h
@ -145,6 +145,8 @@ public:
    // This permanently cuts off the tokenizer input stream.
    void abort() { m_aborted = true; }

+    void parser_did_run(Badge<HTMLParser>);
+
 private:
    void skip(size_t count);
    Optional<u32> next_code_point(StopAtInsertionPoint);