ladybird/Libraries/LibWeb/Fetch/Fetching/FetchedDataReceiver.h
Andreas Kling 37bdcc3488 LibWeb: Support MIME type sniffing for streaming HTTP responses
Previously, when loading a document, we would try to sniff the MIME
type by reading from the response body's source. However, for streaming
HTTP responses, the body source is Empty (the data comes through the
stream instead), so we had no bytes to sniff.

This caused pages like hypr.land (which sends no Content-Type header)
to be misidentified as plain text instead of HTML, since the MIME
sniffing algorithm would receive zero bytes and fall back to the
default type.

The fix captures the first bytes of the response body during fetch,
storing them on the Body object. These bytes are the "resource header"
defined by the MIME Sniffing spec - up to 1445 bytes, which is enough
to identify any MIME type the spec can detect.

Since bytes may arrive asynchronously during streaming, we use a
callback mechanism: if bytes aren't ready yet when load_document()
needs them, it registers a callback that fires once enough bytes have
been captured (or the stream ends).

The flow is:
1. FetchedDataReceiver receives network bytes, buffers them
2. When Body is created, buffered bytes are flushed to Body's sniff
   buffer, and subsequent bytes are appended as they arrive
3. Before calling load_document(), Navigable waits for sniff bytes
4. load_document() passes the bytes to MimeSniff::Resource::sniff()
2026-01-24 15:21:26 +01:00

70 lines
1.9 KiB
C++

/*
* Copyright (c) 2024-2026, Tim Flynn <trflynn89@ladybird.org>
* Copyright (c) 2025, Aliaksandr Kalenik <kalenik.aliaksandr@gmail.com>
*
* SPDX-License-Identifier: BSD-2-Clause
*/
#pragma once
#include <AK/ByteBuffer.h>
#include <LibGC/CellAllocator.h>
#include <LibHTTP/Forward.h>
#include <LibJS/Heap/Cell.h>
#include <LibWeb/Forward.h>
namespace Web::Fetch::Fetching {
class FetchedDataReceiver final : public JS::Cell {
GC_CELL(FetchedDataReceiver, JS::Cell);
GC_DECLARE_ALLOCATOR(FetchedDataReceiver);
public:
virtual ~FetchedDataReceiver() override;
void set_pending_promise(GC::Ref<WebIDL::Promise>);
void set_response(GC::Ref<Fetch::Infrastructure::Response const> response) { m_response = response; }
void set_body(GC::Ref<Fetch::Infrastructure::Body> body);
enum class NetworkState {
Ongoing,
Complete,
Error,
};
void handle_network_bytes(ReadonlyBytes, NetworkState);
private:
FetchedDataReceiver(GC::Ref<Infrastructure::FetchParams const>, GC::Ref<Streams::ReadableStream>, RefPtr<HTTP::MemoryCache>);
virtual void visit_edges(Visitor& visitor) override;
void pull_bytes_into_stream();
void close_stream();
bool buffer_is_eof() const { return m_pulled_bytes == m_buffer.size(); }
ByteBuffer copy_unpulled_bytes();
GC::Ref<Infrastructure::FetchParams const> m_fetch_params;
GC::Ptr<Fetch::Infrastructure::Response const> m_response;
GC::Ptr<Fetch::Infrastructure::Body> m_body;
GC::Ref<Streams::ReadableStream> m_stream;
GC::Ptr<WebIDL::Promise> m_pending_promise;
RefPtr<HTTP::MemoryCache> m_http_cache;
ByteBuffer m_buffer;
size_t m_pulled_bytes { 0 };
enum class LifecycleState {
Receiving,
CompletePending,
ReadyToClose,
Closed,
};
LifecycleState m_lifecycle_state { LifecycleState::Receiving };
bool m_has_unfulfilled_promise { false };
};
}