Commit graph

179 commits

Author SHA1 Message Date
Shannon Booth
13bae036d8 LibWeb: Do not refer to removed C++ DAFSA code generator 2026-06-05 20:23:24 +02:00
Andreas Kling
164ed80244 Meta: Enable exit-time destructor warnings for libraries
Enable -Wexit-time-destructors for all in-tree library targets and
update process-lifetime library statics so they no longer register
exit-time destructors. Long-lived caches, lookup tables, singleton
registries, and generated constants now use NeverDestroyed or leaked
references where the data is intended to live until process exit.

Update LibWeb, LibLine, and the binding generators so regenerated
sources follow the same rule instead of reintroducing destructed
statics.
2026-06-04 19:20:49 +02:00
Sam Atkins
bb4f8a6621 LibWeb: Track parser-created style sheets
Create parser-blocking style sheets when parser-created `<style>`
elements are popped from the stack of open elements, and ignore dynamic
style updates while those elements are still open in the parser.

Make the shared style-element script-blocking predicate describe the
active style sheet instance. Stale script-blocking entries are removed
when that style sheet is replaced or removed.
2026-06-04 16:39:54 +01:00
R-Goc
520a7c8ebd AK+LibWeb: Centralize FFI helper functions
This commit creates a central FFIHelpers.h header which implements
common conversions from FFI.
2026-05-28 14:15:43 -05:00
Vanand Gasparyan
f3a3488cda Rust: Set import granularity to Item
By default, `rustfmt` persists the import granularity. In practice, most
Rust code has import granularity "Module" due to LSP's actions.

"Item" gets rid of import groupings and achieves cleaner diffs and
better conflict resolution. Better greppability is a positive side
effect.

Note: it's an unstable rustfmt feature. `cargo +nightly fmt` must be
used instead of `cargo fmt`.
2026-05-28 06:52:18 +02:00
aplefull
201e380854 LibWeb: Allow UTF-8 detection for file: scheme 2026-05-25 23:45:28 +02:00
Andreas Kling
ca97f68cb7 LibWeb: Normalize decoded HTML string parsing
Preserve leading BOMs when parsing already-decoded HTML strings, since
those strings do not go through the encoded byte decoder path.

Decoded markup from JS strings can also contain WTF-8 for lone surrogate
code units. Keep the common scalar UTF-8 path to a single validation and
copy, but replace surrogates before handing bytes to the Rust tokenizer.

Add text coverage for DOMParser and innerHTML string parsing, including
leading BOMs, text and attributes, lone high and low surrogates, and a
valid surrogate pair.
2026-05-24 10:14:17 +02:00
Shannon Booth
637fd51595 LibWeb: Unify WebIDL C++ type generation
Represent WebIDL C++ types with a single CppType model that tracks
nullability, optional presence, and contained storage.

GC-like values now use GC::Ref/GC::Ptr directly, while containers choose
"plain", "Root", or "Conservative" container types depending on what
they contain. For example, sequence<Element> becomes a RootVector of
GC::Ref values, while sequence<SomeDictionary> becomes a
ConservativeVector only when the dictionary contains GC-like values.
This moves the generated bindings away from wrapping GC values in
GC::Root by default.

This has broad fallout as the types passed to interfaces for GC
objects changes almost fully across the board.
2026-05-23 18:26:12 +02:00
Martin Chrástek
cd3c72dfda LibWeb: Implement chardetng-based encoding detection for HTML parsing 2026-05-23 11:57:33 +02:00
Sam Atkins
34382a2aca LibWeb/HTML: Add missing include for KeywordStyleValue 2026-05-20 13:00:50 +01:00
Andreas Kling
936bb9ca53 LibWeb: Use Rust preload scanner
Replace the C++ speculative HTML parser token walk with the Rust
preload scanner. Keep URL resolution, duplicate suppression, and fetch
issuance in C++ so the scanner only emits base href updates and fetch
candidates.

Use the scanner callback result to stop iteration when the speculative
parser has been stopped.

Update parser comments that still described speculative mock element
production.
2026-05-18 00:23:52 +02:00
Andreas Kling
411c6654e8 LibWeb: Add Rust preload scanner
Add a Rust scanner that walks pending HTML parser input and emits base
href updates or speculative fetch candidates. Keep URL parsing and fetch
issuance in C++ for now, where the Document and request objects live.

Allow the scan callback to stop iteration so the C++ speculative parser
can preserve its stop hook once it is wired up.

Expose a shared Attribute helper for resolving interned local names and
use it from the Rust parser and preload scanner instead of repeating the
same lookup pattern.

Cover link rel handling, preload destination filtering, crossorigin
mapping, and template/foreign-content skipping with Rust unit tests.
2026-05-18 00:23:52 +02:00
Andreas Kling
ccf5a278ab LibWeb: Keep deferred document.close cleanup on its parser
document.close() can defer script-created parser cleanup while a
parser-blocking script is pending. If document.open() installs a new
parser before the old parser resumes, the deferred action must clean up
the parser that scheduled it instead of the document's current parser.

Capture that parser before installing the deferred action. This keeps
the parked cleanup from affecting a parser installed by a later
document.open() call.
2026-05-17 15:35:56 +02:00
Andreas Kling
29784ea397 LibWeb: Remove the C++ HTML tree builder
Delete the old C++ tree-construction implementation and helper classes
that became unused once the Rust parser is unconditional. Remove the C++
stack of open elements, active formatting elements, speculative mock
element, and tree-builder-only token storage.

Keep the C++ parser entry points that still own LibWeb DOM integration,
encoding detection, tokenizer bridging, incremental parsing, and the
speculative parser support used by resource discovery.
2026-05-17 15:35:56 +02:00
Andreas Kling
a7ece4b062 LibWeb: Make the Rust HTML parser unconditional
Remove the runtime selector between the old C++ tree builder and the new
Rust implementation. Always construct HTML documents and fragments with
the Rust parser now that it matches the existing tests.

Simplify dump-html-tree by dropping the backend option that only made
sense while both parser implementations were available.
2026-05-17 15:35:56 +02:00
Andreas Kling
f49335f210 LibWeb: Align declarative shadow root parsing
Teach the Rust parser to recognize declarative shadow root templates and
pass the parsed mode, slot assignment, clonable, serializable, and
focus-delegation flags to the C++ DOM host.

Expose shadowRootSlotAssignment reflection with the spec-defined named
missing and invalid value defaults, and extend the ShadowDOM text test
coverage for the reflected property and parser-created shadow roots.
2026-05-17 15:35:56 +02:00
Andreas Kling
54879bc916 LibWeb: Complete Rust HTML tree construction
Finish the Rust implementation of the spec tree-construction algorithms
needed by the LibWeb test suite. Add the remaining table modes, foster
parenting, scope helpers, adoption agency handling, ruby/list/form and
select cases, frameset state, foreign-content edge cases, and parser
host callbacks.

Preserve behavior that depends on the C++ DOM integration, including
parser-created custom element reactions, fragment quirks mode, arbitrary
fragment namespaces, template fragment mode, fragment form ownership,
MathML annotation-xml boundaries, contextual fragment scripts, parser
script source positions, document.close() parser state, void-element
insertion, and duplicate attribute tracking.

Add focused tests for the parser edge cases that are easy to regress at
the boundary between the Rust tree builder and the C++ DOM host.
2026-05-17 15:35:56 +02:00
Andreas Kling
de12062515 LibWeb: Wire Rust parser scripts and fragments
Preserve Rust parser state across tokenizer runs and stop cleanly when
a parser-blocking script has to execute. Thread the pending script back
through the existing C++ parser entry point so document.write(), input
insertion points, and script bookkeeping continue to use the normal
LibWeb machinery.

Add the fragment parser setup needed by innerHTML and contextual
fragment parsing, including context elements, form ownership, tokenizer
state selection, text coalescing, and foreign-content integration.
2026-05-17 15:35:56 +02:00
Andreas Kling
2e9875770e LibWeb: Add initial Rust HTML tree construction
Implement the first Rust tree builder pass around the tokenizer and the
LibWeb DOM host hooks. Cover the document setup, insertion-mode
dispatch, ordinary body insertion, basic table handling, active
formatting element reconstruction, and foreign-content routing.

Leave the C++ parser available at runtime so the new path can be tested
against the old implementation while the remaining tree-construction
algorithms are filled in.
2026-05-17 15:35:56 +02:00
Andreas Kling
09296315c2 LibWeb: Add Rust HTML parser host plumbing
Add the C++ and Rust scaffolding that lets the tree builder live in
Rust while the DOM remains owned by LibWeb. Keep the exported surface
small: Rust stores parser state, and C++ provides node creation,
insertion, script, template, and GC hooks.

Route dump-html-tree through the selectable parser backend so the new
implementation can be exercised beside the existing parser while it is
being brought up.
2026-05-17 15:35:56 +02:00
Andreas Kling
5b63cb5f37 LibWeb: Avoid unsafe tokenizer state conversion
Replace the FFI tokenizer state transmute with an explicit conversion
from the incoming numeric value. The old code range-checked against the
last state before transmuting, which matched today's contiguous enum but
left the conversion dependent on that layout detail.

Returning early for unknown values keeps the FFI boundary tolerant while
removing a source of possible invalid enum discriminants.
2026-05-17 15:35:56 +02:00
Andreas Kling
171e3adf01 LibWeb: Replace the HTML tokenizer with Rust
Replace the C++ HTML tokenizer with a Rust implementation behind the
existing HTMLTokenizer API.

Keep the parser-facing integration points for streaming input,
insertion points, document.write(), EOF insertion, parser aborts,
speculative parser input, and last start tag tracking. The generated
FFI handle stays an implementation detail of HTMLTokenizer, so callers
keep a single tokenizer class.

Preserve duplicate attributes through FFI so C++ token normalization can
record the duplicate-attribute signal used by CSP nonce checks. Keep
bulk tag-name and attribute scans capped at the active insertion point
so streamed parser input is spliced at the right offset.

Use generated DAFSA tables for named character references and intern
common tag and attribute names to reduce FFI marshalling overhead. This
also fixes attribute name source positions, nested old insertion points,
and aborted fast-path handling.

TestHTMLTokenizer covers duplicate attributes and insertion points in
fast tag-name, attribute-name, and quoted-value scans. A CSP text test
covers duplicate nonce attributes on parser-created script elements.
The tokenizer dump fixtures still match, TestHTMLTokenizer passes, and
the full release test-web run passes with 6981 tests and 226 skipped.
2026-05-15 21:01:40 +02:00
Shannon Booth
d595369ae4 LibWeb: Let document.write() reenter parser from parser-blocking scripts
When resuming after an async wait for a pending parser-blocking script,
clear the parser pause flag before executing the script. The spec has
unblocked the tokenizer by this point, and document.write() calls from
the script must be able to synchronously process inserted markup up to
the insertion point.

This fixes ordering for document.write()'d inserted scripts during
external parser-blocking script execution.
2026-05-15 19:49:45 +02:00
Sam Atkins
73c4b77f68 LibWeb/HTML: Support align attributes on table sections and rows
thead, tbody, tfoot, tr, td, and th all have an `align` presentational
attribute with identical definitions. We previously only supported it
for td and th, and also allowed arbitrary text-align values instead of
the 4 dictated by the spec.
2026-04-30 15:20:22 +02:00
Andreas Kling
a538d2b160 Revert "LibWeb: Have speculative HTML parser populate the preload map"
This reverts commit b88cbb1b74.

Appears to be causing large regressions on WPT.
2026-04-30 04:55:31 +02:00
Aliaksandr Kalenik
b88cbb1b74 LibWeb: Have speculative HTML parser populate the preload map
When the regular HTML parser is blocked on an external script, the
speculative parser scans ahead and pre-fetches discoverable
sub-resources. Previously those fetches were tracked only in the
parser's own URL list and never registered in the document's preload
map, so when the regular parser later reached each element fetch()'s
consume_a_preloaded_resource() lookup found nothing and issued a
duplicate request — every parser-blocked sub-resource was fetched
twice.

issue_speculative_fetch now creates a PreloadEntry, registers it
under create_a_preload_key(request) in the document's preload map,
and supplies a processResponseConsumeBody callback that populates
the entry. The map insertion happens after fetch() starts because
fetch() runs consume_a_preloaded_resource() synchronously, so
registering the entry beforehand would short-circuit the
speculative fetch itself.

The body-handling steps (1, 2, 5 of the preload algorithm's
processResponseConsumeBody) are factored into a shared
deliver_preload_response helper used by both the speculative parser
and HTMLLinkElement::preload.
2026-04-29 15:59:22 +02:00
Aliaksandr Kalenik
4762c4fa5c LibWeb: Add incremental HTML parsing
Introduce IncrementalDocumentParser, which streams the response body
through a TextCodec::StreamingDecoder into the HTMLTokenizer one chunk
at a time. The tokenizer pauses when it runs out of input and resumes
once the next chunk is appended; when the body closes we close the
tokenizer's input stream so it can finish the parse.

DocumentLoading routes HTML responses through the new parser instead of
buffering the full body before handing it to HTMLParser.
2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik
01fa8a27ac LibWeb: Extract HTMLParser::run_until_completion()
Pull the post-parse-action setup, run loop, and post-parse invocation
out of HTMLParser::run(URL, ...) into a new run_until_completion()
method. The URL overload still calls it; behavior is unchanged. The
incremental parser will use this entry point directly without going
through the URL-setting overload.
2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik
f499edefae LibWeb: Track whether HTMLParser is script-created
Add a ScriptCreatedParser flag plumbed through HTMLParser's constructor
and create_for_scripting(). Only document.open()'s parser sets it to
Yes. Document::close() step 3 now checks is_script_created() so it
correctly skips parsers that weren't created via document.open(),
matching the spec.

Previously the check was just `if (!m_parser)`, which incorrectly let
document.close() insert an EOF into a network-driven parser. The bug
was mostly latent because the network parser used to finish quickly,
but it matters once the network parser stays alive for the duration of
a streamed parse.
2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik
c8368882b8 LibWeb: Allow tokenizer to run out of characters mid-tokenization
Add can_run_out_of_characters() and use it in the
NamedCharacterReference state and consume_next_if_match() so that an
open input stream gets the same code-point-at-a-time treatment as an
active document.write insertion point. Without this, a network chunk
that ends partway through a named character reference or a
multi-character match would make the tokenizer commit to a "no match"
decision before the remaining bytes arrive.

No behavior change for existing callers: the new helper still returns
false once the input stream is closed (which the StringView constructor
sets immediately).
2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik
c439f810f2 LibWeb: Track input stream closed state in HTMLTokenizer
Add an explicit "input stream closed" flag plus the streaming-input API
(append_to_input_stream, close_input_stream, is_input_stream_closed) to
let a future incremental driver feed bytes as they arrive. Rewrite
should_pause_before_next_input_character so the tokenizer pauses when
the buffer is exhausted but more bytes may still arrive, including the
case where a chunk ends in CR (CRLF normalization needs one code point
of lookahead).

Existing call sites are unaffected: the StringView constructor
immediately marks the input stream closed, and insert_eof() now also
closes the stream so document.close() drives the same exit path.
2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik
b6ffd51d1c LibWeb: Pause tokenizer at a CR right before the insertion point
HTML newline normalization collapses CRLF into a single LF, so
next_code_point() needs one code point of lookahead at a CR to decide
whether the CR stands alone or is the first half of a CRLF pair. When
the tokenizer is paused at the insertion point and the next code point
to consume is a CR sitting one position before it, that lookahead has
not been written yet.

Previously the tokenizer consumed the CR and emitted it as LF, so a
subsequent document.write() that began with LF surfaced as a second
LF instead of being absorbed into the original CRLF pair.

Stop one code point earlier in this case and wait for the next write
to arrive. This makes four html5lib write_single WPT tests pass.
2026-04-27 21:44:56 +02:00
Aliaksandr Kalenik
c44c36416e LibWeb: Preserve old insertion points across reentrant scripts
The HTML parser's script end tag algorithms save the current insertion
point in an "old insertion point" local before executing a script, then
restore that local after script execution. Ladybird modeled that local
as a single tokenizer field, so nested script execution via
document.write() could overwrite the outer script's saved value.

Keep a stack of old insertion points instead, and adjust saved offsets
when document.write() inserts new input before them. This keeps the
normal script and SVG script paths aligned with the spec text while
leaving the parser-blocking script resume path to set the insertion
point to undefined again.
2026-04-27 18:02:19 +02:00
Aliaksandr Kalenik
53fa1b19f1 LibWeb: Make external SVG script fetches async
Replace the spin_until in SVGScriptElement::process_the_script_element
with an async fetch that mirrors HTMLScriptElement's mark_as_ready
pattern. External SVG scripts now fetch and execute asynchronously,
matching Chromium's behavior.

For HTML-embedded SVG scripts, the parser pauses via the existing
schedule_resume_check infrastructure, extended to support SVG scripts
through a new pending_parsing_blocking_svg_script slot on Document.
For top-level XML/SVG documents, scripts execute when their fetch
completes; the load event is delayed via DocumentLoadEventDelayer which
the existing XMLDocumentBuilder::document_end already waits on.
2026-04-27 03:04:07 +02:00
Aliaksandr Kalenik
70ac025eff LibWeb: Implement the speculative HTML parser
When the HTML parser blocks on a synchronous external script, run a
separate tokenizer over the unparsed input and issue speculative fetches
for the resources it finds (script src, link rel=stylesheet|preload, img
src), with <base href> tracking and template/foreign-content skipping.

Also fills in the previously-stubbed "consume a preloaded resource"
algorithm and the document's "map of preloaded resources", so that
<link rel="preload"> followed by a matching consumer deduplicates to
a single fetch.
2026-04-26 18:48:29 +02:00
Aliaksandr Kalenik
b1ccab81ad LibWeb: Replace spin_until in HTMLParser::handle_text with async resume
Spinning a nested event loop to wait for a parser-blocking script blocks
the calling thread, can deadlock, and creates reentrancy hazards. Switch
to an event-driven pause/resume model, mirroring the prior
HTMLParserEndState refactor (df96b69e7a).

Three WPT document.write tests flip from Fail to Pass and are
rebaselined: all write an external script via document.write() followed
by inline content. With spin_until, control did not return to the caller
of document.write() between writing the script and observing its effects
so the test's order assertions saw a different sequence than the spec
mandates.
2026-04-26 10:44:45 +02:00
Pavel Shliak
0e98fdccd5 LibWeb/HTML: Fix ruby parse error check for rp/rt 2026-04-22 15:30:41 +01:00
Shannon Booth
8642801889 LibWeb: Set fragment scripting mode from the context document
This corresponds with the editorial change to the HTML standard
introducing the parsing mode enum of:

01c45cede

And a follow up normative change of:

508706c80

Making fragment parsing derive its scripting mode from the context
document.
2026-04-14 23:01:36 +02:00
Andreas Kling
88d4d1b1a6 LibWeb: Use VM helpers for execution context access
Inline JS-to-JS frames no longer live in the raw execution context
vector, so LibWeb callers that need to inspect or pop contexts now go
through VM helpers instead of peeking into that storage directly.

This keeps the execution context bookkeeping encapsulated while
preserving existing microtask and realm-entry checks.
2026-04-13 18:29:43 +02:00
Aliaksandr Kalenik
df96b69e7a LibWeb: Replace spin_until in HTMLParser::the_end() with state machine
HTMLParser::the_end() had three spin_until calls that blocked the event
loop: step 5 (deferred scripts), step 7 (ASAP scripts), and step 8
(load event delay). This replaces them with an HTMLParserEndState state
machine that progresses asynchronously via callbacks.

The state machine has three phases matching the three spin_until calls:
- WaitingForDeferredScripts: loops executing ready deferred scripts
- WaitingForASAPScripts: waits for ASAP script lists to empty
- WaitingForLoadEventDelay: waits for nothing to delay the load event

Notification triggers re-evaluate the state machine when conditions
change: HTMLScriptElement::mark_as_ready, stylesheet unblocking in
StyleElementBase/HTMLLinkElement, did_stop_being_active_document, and
DocumentLoadEventDelayer decrements. NavigableContainer state changes
(session history readiness, content navigable cleared, lazy load flag)
also trigger re-evaluation of the load event delay check.

Key design decisions and why:

1. Microtask checkpoint in schedule_progress_check(): The old spin_until
   called perform_a_microtask_checkpoint() before checking conditions.
   This is critical because HTMLImageElement::update_the_image_data step
   8 queues a microtask that creates the DocumentLoadEventDelayer.
   Without the checkpoint, check_progress() would see zero delayers and
   complete before images start delaying the load event.

2. deferred_invoke in schedule_progress_check():
   I tried Core::Timer (0ms), queue_global_task, and synchronous calls.
   Timers caused non-deterministic ordering with the HTML event loop's
   task processing timer, leading to image layout tests failing (wrong
   subtest pass/fail patterns). Synchronous calls fired too early during
   image load processing before dimensions were set, causing 0-height
   images in layout tests. queue_global_task had task ordering issues
   with the session history traversal queue. deferred_invoke runs after
   the current callback returns but within the same event loop pump,
   giving the right balance.

3. Navigation load event guard (m_navigation_load_event_guard): During
   cross-document navigation, finalize_a_cross_document_navigation step
   2 calls set_delaying_load_events(false) before the session history
   traversal activates the new document. This creates a transient state
   where the parent's load event delay check sees the about:blank (which
   has ready_for_post_load_tasks=true) as the active document and
   completes prematurely.
2026-03-28 23:14:55 +01:00
Sam Atkins
ed6a5f25a0 LibWeb: Implement scoped custom element registries 2026-03-27 19:49:55 +00:00
Luke Wilde
0381c40cb4 LibWeb: Reset non-FACEs and don't associate them to a form during parse
(FACE stands for form-associated custom element)
2026-03-25 13:18:15 +00:00
Jelle Raaijmakers
428a47cb7c LibWeb: Reduce size of Optional<HTMLToken> 2026-03-20 12:03:36 +01:00
Tim Ledbetter
36f59a406e LibWeb: Put HTML parser debug message behind a flag 2026-03-10 11:14:04 +01:00
Andreas Kling
e87f889e31 Everywhere: Abandon Swift adoption
After making no progress on this for a very long time, let's acknowledge
it's not going anywhere and remove it from the codebase.
2026-02-17 10:48:09 -05:00
Aliaksandr Kalenik
30e4779acb AK+LibWeb: Reduce recompilation impact of DOM/Node.h
Remove includes from Node.h that are only needed for forward
declarations (AccessibilityTreeNode.h, XMLSerializer.h,
JsonObjectSerializer.h). Extract StyleInvalidationReason and
FragmentSerializationMode enums into standalone lightweight
headers so downstream headers (CSSStyleSheet.h, CSSStyleProperties.h,
HTMLParser.h) can include just the enum they need instead of all of
Node.h. Replace Node.h with forward declarations in headers that only
use Node by pointer/reference.

This breaks the circular dependency between Node.h and
AccessibilityTreeNode.h, reducing AccessibilityTreeNode.h's
recompilation footprint from ~1399 to ~25 files.
2026-02-11 20:02:28 +01:00
Andreas Kling
9b987baf0e LibWeb: Bail out of the_end() spin_untils for inactive documents
When a document is navigated away from while HTMLParser::the_end() is
spinning the event loop (steps 7 and 8), the spin_until stays on the
call stack indefinitely, causing all subsequent event processing on the
same event loop to happen within nested spin_until pumping. Add
is_fully_active() checks to bail out early in this case.
2026-02-10 21:19:35 +01:00
Jelle Raaijmakers
ae20ecf857 AK+Everywhere: Add Vector::contains(predicate) and use it
No functional changes.
2026-01-08 15:27:30 +00:00
Andreas Kling
a9cc425cde LibJS+LibWeb: Add missing GC marking visits
This adds visit_edges(Cell::Visitor&) methods to various helper structs
that contain GC pointers, and makes sure they are called from owning
GC-heap-allocated objects as needed.

These were found by our Clang plugin after expanding its capabilities.
The added rules will be enforced by CI going forward.
2026-01-07 12:48:58 +01:00
Feng Yu
b58fcaeecf LibWeb: Add HTMLSelectedContentElement for customizable select
Introduce the HTMLSelectedContentElement and integrate it into
<select>, <option> and HTMLParser.

See whatwg/html#10548.

There are two bugs with WPT tests which causes the third subtest
in selectedcontent.html and selectedcontent-mutations.html fail.
See whatwg/html#11882, web-platform-tests/wpt#55849.
2025-12-12 12:06:24 +00:00