Commit graph

104 commits

Author SHA1 Message Date
Shannon Booth
637fd51595 LibWeb: Unify WebIDL C++ type generation
Represent WebIDL C++ types with a single CppType model that tracks
nullability, optional presence, and contained storage.

GC-like values now use GC::Ref/GC::Ptr directly, while containers choose
"plain", "Root", or "Conservative" container types depending on what
they contain. For example, sequence<Element> becomes a RootVector of
GC::Ref values, while sequence<SomeDictionary> becomes a
ConservativeVector only when the dictionary contains GC-like values.
This moves the generated bindings away from wrapping GC values in
GC::Root by default.

This has broad fallout as the types passed to interfaces for GC
objects changes almost fully across the board.
2026-05-23 18:26:12 +02:00
Sam Atkins
34382a2aca LibWeb/HTML: Add missing include for KeywordStyleValue 2026-05-20 13:00:50 +01:00
Andreas Kling
936bb9ca53 LibWeb: Use Rust preload scanner
Replace the C++ speculative HTML parser token walk with the Rust
preload scanner. Keep URL resolution, duplicate suppression, and fetch
issuance in C++ so the scanner only emits base href updates and fetch
candidates.

Use the scanner callback result to stop iteration when the speculative
parser has been stopped.

Update parser comments that still described speculative mock element
production.
2026-05-18 00:23:52 +02:00
Andreas Kling
29784ea397 LibWeb: Remove the C++ HTML tree builder
Delete the old C++ tree-construction implementation and helper classes
that became unused once the Rust parser is unconditional. Remove the C++
stack of open elements, active formatting elements, speculative mock
element, and tree-builder-only token storage.

Keep the C++ parser entry points that still own LibWeb DOM integration,
encoding detection, tokenizer bridging, incremental parsing, and the
speculative parser support used by resource discovery.
2026-05-17 15:35:56 +02:00
Andreas Kling
a7ece4b062 LibWeb: Make the Rust HTML parser unconditional
Remove the runtime selector between the old C++ tree builder and the new
Rust implementation. Always construct HTML documents and fragments with
the Rust parser now that it matches the existing tests.

Simplify dump-html-tree by dropping the backend option that only made
sense while both parser implementations were available.
2026-05-17 15:35:56 +02:00
Andreas Kling
f49335f210 LibWeb: Align declarative shadow root parsing
Teach the Rust parser to recognize declarative shadow root templates and
pass the parsed mode, slot assignment, clonable, serializable, and
focus-delegation flags to the C++ DOM host.

Expose shadowRootSlotAssignment reflection with the spec-defined named
missing and invalid value defaults, and extend the ShadowDOM text test
coverage for the reflected property and parser-created shadow roots.
2026-05-17 15:35:56 +02:00
Andreas Kling
54879bc916 LibWeb: Complete Rust HTML tree construction
Finish the Rust implementation of the spec tree-construction algorithms
needed by the LibWeb test suite. Add the remaining table modes, foster
parenting, scope helpers, adoption agency handling, ruby/list/form and
select cases, frameset state, foreign-content edge cases, and parser
host callbacks.

Preserve behavior that depends on the C++ DOM integration, including
parser-created custom element reactions, fragment quirks mode, arbitrary
fragment namespaces, template fragment mode, fragment form ownership,
MathML annotation-xml boundaries, contextual fragment scripts, parser
script source positions, document.close() parser state, void-element
insertion, and duplicate attribute tracking.

Add focused tests for the parser edge cases that are easy to regress at
the boundary between the Rust tree builder and the C++ DOM host.
2026-05-17 15:35:56 +02:00
Andreas Kling
de12062515 LibWeb: Wire Rust parser scripts and fragments
Preserve Rust parser state across tokenizer runs and stop cleanly when
a parser-blocking script has to execute. Thread the pending script back
through the existing C++ parser entry point so document.write(), input
insertion points, and script bookkeeping continue to use the normal
LibWeb machinery.

Add the fragment parser setup needed by innerHTML and contextual
fragment parsing, including context elements, form ownership, tokenizer
state selection, text coalescing, and foreign-content integration.
2026-05-17 15:35:56 +02:00
Andreas Kling
2e9875770e LibWeb: Add initial Rust HTML tree construction
Implement the first Rust tree builder pass around the tokenizer and the
LibWeb DOM host hooks. Cover the document setup, insertion-mode
dispatch, ordinary body insertion, basic table handling, active
formatting element reconstruction, and foreign-content routing.

Leave the C++ parser available at runtime so the new path can be tested
against the old implementation while the remaining tree-construction
algorithms are filled in.
2026-05-17 15:35:56 +02:00
Andreas Kling
09296315c2 LibWeb: Add Rust HTML parser host plumbing
Add the C++ and Rust scaffolding that lets the tree builder live in
Rust while the DOM remains owned by LibWeb. Keep the exported surface
small: Rust stores parser state, and C++ provides node creation,
insertion, script, template, and GC hooks.

Route dump-html-tree through the selectable parser backend so the new
implementation can be exercised beside the existing parser while it is
being brought up.
2026-05-17 15:35:56 +02:00
Shannon Booth
d595369ae4 LibWeb: Let document.write() reenter parser from parser-blocking scripts
When resuming after an async wait for a pending parser-blocking script,
clear the parser pause flag before executing the script. The spec has
unblocked the tokenizer by this point, and document.write() calls from
the script must be able to synchronously process inserted markup up to
the insertion point.

This fixes ordering for document.write()'d inserted scripts during
external parser-blocking script execution.
2026-05-15 19:49:45 +02:00
Sam Atkins
73c4b77f68 LibWeb/HTML: Support align attributes on table sections and rows
thead, tbody, tfoot, tr, td, and th all have an `align` presentational
attribute with identical definitions. We previously only supported it
for td and th, and also allowed arbitrary text-align values instead of
the 4 dictated by the spec.
2026-04-30 15:20:22 +02:00
Aliaksandr Kalenik
01fa8a27ac LibWeb: Extract HTMLParser::run_until_completion()
Pull the post-parse-action setup, run loop, and post-parse invocation
out of HTMLParser::run(URL, ...) into a new run_until_completion()
method. The URL overload still calls it; behavior is unchanged. The
incremental parser will use this entry point directly without going
through the URL-setting overload.
2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik
f499edefae LibWeb: Track whether HTMLParser is script-created
Add a ScriptCreatedParser flag plumbed through HTMLParser's constructor
and create_for_scripting(). Only document.open()'s parser sets it to
Yes. Document::close() step 3 now checks is_script_created() so it
correctly skips parsers that weren't created via document.open(),
matching the spec.

Previously the check was just `if (!m_parser)`, which incorrectly let
document.close() insert an EOF into a network-driven parser. The bug
was mostly latent because the network parser used to finish quickly,
but it matters once the network parser stays alive for the duration of
a streamed parse.
2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik
c44c36416e LibWeb: Preserve old insertion points across reentrant scripts
The HTML parser's script end tag algorithms save the current insertion
point in an "old insertion point" local before executing a script, then
restore that local after script execution. Ladybird modeled that local
as a single tokenizer field, so nested script execution via
document.write() could overwrite the outer script's saved value.

Keep a stack of old insertion points instead, and adjust saved offsets
when document.write() inserts new input before them. This keeps the
normal script and SVG script paths aligned with the spec text while
leaving the parser-blocking script resume path to set the insertion
point to undefined again.
2026-04-27 18:02:19 +02:00
Aliaksandr Kalenik
53fa1b19f1 LibWeb: Make external SVG script fetches async
Replace the spin_until in SVGScriptElement::process_the_script_element
with an async fetch that mirrors HTMLScriptElement's mark_as_ready
pattern. External SVG scripts now fetch and execute asynchronously,
matching Chromium's behavior.

For HTML-embedded SVG scripts, the parser pauses via the existing
schedule_resume_check infrastructure, extended to support SVG scripts
through a new pending_parsing_blocking_svg_script slot on Document.
For top-level XML/SVG documents, scripts execute when their fetch
completes; the load event is delayed via DocumentLoadEventDelayer which
the existing XMLDocumentBuilder::document_end already waits on.
2026-04-27 03:04:07 +02:00
Aliaksandr Kalenik
70ac025eff LibWeb: Implement the speculative HTML parser
When the HTML parser blocks on a synchronous external script, run a
separate tokenizer over the unparsed input and issue speculative fetches
for the resources it finds (script src, link rel=stylesheet|preload, img
src), with <base href> tracking and template/foreign-content skipping.

Also fills in the previously-stubbed "consume a preloaded resource"
algorithm and the document's "map of preloaded resources", so that
<link rel="preload"> followed by a matching consumer deduplicates to
a single fetch.
2026-04-26 18:48:29 +02:00
Aliaksandr Kalenik
b1ccab81ad LibWeb: Replace spin_until in HTMLParser::handle_text with async resume
Spinning a nested event loop to wait for a parser-blocking script blocks
the calling thread, can deadlock, and creates reentrancy hazards. Switch
to an event-driven pause/resume model, mirroring the prior
HTMLParserEndState refactor (df96b69e7a).

Three WPT document.write tests flip from Fail to Pass and are
rebaselined: all write an external script via document.write() followed
by inline content. With spin_until, control did not return to the caller
of document.write() between writing the script and observing its effects
so the test's order assertions saw a different sequence than the spec
mandates.
2026-04-26 10:44:45 +02:00
Pavel Shliak
0e98fdccd5 LibWeb/HTML: Fix ruby parse error check for rp/rt 2026-04-22 15:30:41 +01:00
Shannon Booth
8642801889 LibWeb: Set fragment scripting mode from the context document
This corresponds with the editorial change to the HTML standard
introducing the parsing mode enum of:

01c45cede

And a follow up normative change of:

508706c80

Making fragment parsing derive its scripting mode from the context
document.
2026-04-14 23:01:36 +02:00
Andreas Kling
88d4d1b1a6 LibWeb: Use VM helpers for execution context access
Inline JS-to-JS frames no longer live in the raw execution context
vector, so LibWeb callers that need to inspect or pop contexts now go
through VM helpers instead of peeking into that storage directly.

This keeps the execution context bookkeeping encapsulated while
preserving existing microtask and realm-entry checks.
2026-04-13 18:29:43 +02:00
Aliaksandr Kalenik
df96b69e7a LibWeb: Replace spin_until in HTMLParser::the_end() with state machine
HTMLParser::the_end() had three spin_until calls that blocked the event
loop: step 5 (deferred scripts), step 7 (ASAP scripts), and step 8
(load event delay). This replaces them with an HTMLParserEndState state
machine that progresses asynchronously via callbacks.

The state machine has three phases matching the three spin_until calls:
- WaitingForDeferredScripts: loops executing ready deferred scripts
- WaitingForASAPScripts: waits for ASAP script lists to empty
- WaitingForLoadEventDelay: waits for nothing to delay the load event

Notification triggers re-evaluate the state machine when conditions
change: HTMLScriptElement::mark_as_ready, stylesheet unblocking in
StyleElementBase/HTMLLinkElement, did_stop_being_active_document, and
DocumentLoadEventDelayer decrements. NavigableContainer state changes
(session history readiness, content navigable cleared, lazy load flag)
also trigger re-evaluation of the load event delay check.

Key design decisions and why:

1. Microtask checkpoint in schedule_progress_check(): The old spin_until
   called perform_a_microtask_checkpoint() before checking conditions.
   This is critical because HTMLImageElement::update_the_image_data step
   8 queues a microtask that creates the DocumentLoadEventDelayer.
   Without the checkpoint, check_progress() would see zero delayers and
   complete before images start delaying the load event.

2. deferred_invoke in schedule_progress_check():
   I tried Core::Timer (0ms), queue_global_task, and synchronous calls.
   Timers caused non-deterministic ordering with the HTML event loop's
   task processing timer, leading to image layout tests failing (wrong
   subtest pass/fail patterns). Synchronous calls fired too early during
   image load processing before dimensions were set, causing 0-height
   images in layout tests. queue_global_task had task ordering issues
   with the session history traversal queue. deferred_invoke runs after
   the current callback returns but within the same event loop pump,
   giving the right balance.

3. Navigation load event guard (m_navigation_load_event_guard): During
   cross-document navigation, finalize_a_cross_document_navigation step
   2 calls set_delaying_load_events(false) before the session history
   traversal activates the new document. This creates a transient state
   where the parent's load event delay check sees the about:blank (which
   has ready_for_post_load_tasks=true) as the active document and
   completes prematurely.
2026-03-28 23:14:55 +01:00
Sam Atkins
ed6a5f25a0 LibWeb: Implement scoped custom element registries 2026-03-27 19:49:55 +00:00
Luke Wilde
0381c40cb4 LibWeb: Reset non-FACEs and don't associate them to a form during parse
(FACE stands for form-associated custom element)
2026-03-25 13:18:15 +00:00
Tim Ledbetter
36f59a406e LibWeb: Put HTML parser debug message behind a flag 2026-03-10 11:14:04 +01:00
Andreas Kling
e87f889e31 Everywhere: Abandon Swift adoption
After making no progress on this for a very long time, let's acknowledge
it's not going anywhere and remove it from the codebase.
2026-02-17 10:48:09 -05:00
Andreas Kling
9b987baf0e LibWeb: Bail out of the_end() spin_untils for inactive documents
When a document is navigated away from while HTMLParser::the_end() is
spinning the event loop (steps 7 and 8), the spin_until stays on the
call stack indefinitely, causing all subsequent event processing on the
same event loop to happen within nested spin_until pumping. Add
is_fully_active() checks to bail out early in this case.
2026-02-10 21:19:35 +01:00
Jelle Raaijmakers
ae20ecf857 AK+Everywhere: Add Vector::contains(predicate) and use it
No functional changes.
2026-01-08 15:27:30 +00:00
Andreas Kling
a9cc425cde LibJS+LibWeb: Add missing GC marking visits
This adds visit_edges(Cell::Visitor&) methods to various helper structs
that contain GC pointers, and makes sure they are called from owning
GC-heap-allocated objects as needed.

These were found by our Clang plugin after expanding its capabilities.
The added rules will be enforced by CI going forward.
2026-01-07 12:48:58 +01:00
Feng Yu
b58fcaeecf LibWeb: Add HTMLSelectedContentElement for customizable select
Introduce the HTMLSelectedContentElement and integrate it into
<select>, <option> and HTMLParser.

See whatwg/html#10548.

There are two bugs with WPT tests which causes the third subtest
in selectedcontent.html and selectedcontent-mutations.html fail.
See whatwg/html#11882, web-platform-tests/wpt#55849.
2025-12-12 12:06:24 +00:00
Feng Yu
d2029b1814 LibWeb: Relax HTML parser to allow more tags inside <select>
This implements parsing part of customizable <select> spec update.
See whatwg/html PR #10548.

Two failing subtests in `html5lib_innerHTML_tests_innerHTML_1.html`
and `customizable-select/select-parsing.html` are due to the spec
still disallowing `<input>` inside `<select>`, even though Chrome
has already implemented this behavoir (see whatwg/html#11288).
2025-12-04 17:17:01 +00:00
Sam Atkins
a25cb679fb LibWeb/HTML: Update spec text related to template's content
Corresponds to:
aa52274b5a
2025-11-27 10:26:13 +00:00
Sam Atkins
8ca4833885 LibWeb/HTML: Update spec text in create_element_for()
No behaviour changes.
2025-11-26 09:52:47 +01:00
Sam Atkins
6e2f8166f4 LibWeb/HTML: Combine duplicate parsing branches
These are combined in the current spec. No behaviour change.
2025-11-26 09:52:47 +01:00
Sam Atkins
6a4ab26b48 LibWeb/HTML: Return early from find_appropriate_place_for_inserting_node
Step 2.(a).5 says to abort, but we were instead carrying on and would
run steps 3 and 4. Those steps would not change the result at all, but
this avoids a little unnecessary work.

I wrapped a couple of comments at 120 columns while I was at it.
2025-11-26 09:52:47 +01:00
Sam Atkins
418e22d65a LibWeb/HTML: Bring hand_in_head in HTML parser more up to date
A couple of spec text changes I noticed, and use `has_attribute()`
instead of manually checking it.
2025-11-26 09:52:47 +01:00
Psychpsyo
100f37995f Everywhere: Clean up AD-HOC and FIXME comments without colons 2025-11-13 15:56:04 +01:00
Lorenz A
f8330a2ec5 LibWeb: Do not execute unclosed SVG script tags 2025-11-09 01:43:46 +01:00
Andreas Kling
3593c3b687 LibWeb: Throw out decoded UTF-32 data in HTMLTokenizer after parser runs
This ends up saving quite a bit of memory on many pages, since UTF-32
uses 4 bytes per code points.

As an example, it reduces the footprint on https://gymgrossisten.com/
by 2 MiB.
2025-10-24 08:52:53 +02:00
mikiubo
0b715b20a2 LibWeb: Make HTML fragment parsing return ExceptionOr
Update Element::parse_fragment and Node::unsafely_set_html to
propagate exceptions.

This refactor is needed as a prerequisite for implementing the XML
fragment parser, which requires consistent error handling in fragment
parsing.
2025-10-23 11:06:39 +01:00
Lorenz A
6afd39b16a LibWeb: Keep the tokens in ListOfActiveFormattingElements 2025-10-21 23:36:07 +02:00
Tete17
82f56e30ed LibWeb: Adapt the parsing of script elements to accommodate TrustedTypes 2025-09-16 10:57:34 +02:00
Lorenz A
47796e7967 LibWeb: Serialize HTML attribute names as per spec 2025-09-15 10:08:12 +02:00
euro20179
e442aa6e10 LibWeb: Ensure parser cannot change the mode is handled
This fixes at least 1 wpt bug where text/plain documents are rendered in
quirks mode. The test in question: https://wpt.live/html/browsers/browsing-the-web/read-text/load-text-plain.html
2025-09-07 11:11:43 +01:00
Luke Wilde
b17783bb10 Everywhere: Change west consts caught by clang-format-21 to east consts 2025-08-29 18:18:55 +01:00
Tim Ledbetter
cb1a1a5cb5 LibWeb: Replace is<T>() with as_if<T>() where possible 2025-08-25 18:45:00 +02:00
Sam Atkins
99bce9a94d LibWeb/CSS: Replace CSSUnitValue with DimensionStyleValue
CSSUnitValue is a typed-om type which we will implement separately in
the future. However, it still seems useful to give our dimension values
a base class. (Maybe they could be templated in the future?) So instead
of deleting it entirely, rename it to DimensionStyleValue and make its
API match our style better.
2025-08-08 15:19:03 +01:00
Sam Atkins
c57975c9fd LibWeb: Move and rename CSSStyleValue to StyleValues/StyleValue.{h,cpp}
This reverts 0e3487b9ab.

Back when I made that change, I thought we could make our StyleValue
classes match the typed-om definitions directly. However, they have
different requirements. Typed-om types need to be mutable and GCed,
whereas StyleValues are immutable and ideally wouldn't require a JS VM.

While I was already making such a cataclysmic change, I've moved it into
the StyleValues directory, because it *not* being there has bothered me
for a long time. 😅
2025-08-08 15:19:03 +01:00
Timothy Flynn
8b6e3cb735 LibWeb+LibUnicode+WebContent: Port DOM:CharacterData to UTF-16
This replaces the underlying storage of CharacterData with Utf16String
and deals with the fallout.
2025-07-24 19:00:20 +02:00
Andrew Kaster
3040ca4311 LibWeb: Remove noisy debug messages from HTMLParser 2025-07-09 16:26:49 -06:00