Enable -Wexit-time-destructors for all in-tree library targets and
update process-lifetime library statics so they no longer register
exit-time destructors. Long-lived caches, lookup tables, singleton
registries, and generated constants now use NeverDestroyed or leaked
references where the data is intended to live until process exit.
Update LibWeb, LibLine, and the binding generators so regenerated
sources follow the same rule instead of reintroducing destructed
statics.
Create parser-blocking style sheets when parser-created `<style>`
elements are popped from the stack of open elements, and ignore dynamic
style updates while those elements are still open in the parser.
Make the shared style-element script-blocking predicate describe the
active style sheet instance. Stale script-blocking entries are removed
when that style sheet is replaced or removed.
By default, `rustfmt` persists the import granularity. In practice, most
Rust code has import granularity "Module" due to LSP's actions.
"Item" gets rid of import groupings and achieves cleaner diffs and
better conflict resolution. Better greppability is a positive side
effect.
Note: it's an unstable rustfmt feature. `cargo +nightly fmt` must be
used instead of `cargo fmt`.
Preserve leading BOMs when parsing already-decoded HTML strings, since
those strings do not go through the encoded byte decoder path.
Decoded markup from JS strings can also contain WTF-8 for lone surrogate
code units. Keep the common scalar UTF-8 path to a single validation and
copy, but replace surrogates before handing bytes to the Rust tokenizer.
Add text coverage for DOMParser and innerHTML string parsing, including
leading BOMs, text and attributes, lone high and low surrogates, and a
valid surrogate pair.
Represent WebIDL C++ types with a single CppType model that tracks
nullability, optional presence, and contained storage.
GC-like values now use GC::Ref/GC::Ptr directly, while containers choose
"plain", "Root", or "Conservative" container types depending on what
they contain. For example, sequence<Element> becomes a RootVector of
GC::Ref values, while sequence<SomeDictionary> becomes a
ConservativeVector only when the dictionary contains GC-like values.
This moves the generated bindings away from wrapping GC values in
GC::Root by default.
This has broad fallout as the types passed to interfaces for GC
objects changes almost fully across the board.
Replace the C++ speculative HTML parser token walk with the Rust
preload scanner. Keep URL resolution, duplicate suppression, and fetch
issuance in C++ so the scanner only emits base href updates and fetch
candidates.
Use the scanner callback result to stop iteration when the speculative
parser has been stopped.
Update parser comments that still described speculative mock element
production.
Add a Rust scanner that walks pending HTML parser input and emits base
href updates or speculative fetch candidates. Keep URL parsing and fetch
issuance in C++ for now, where the Document and request objects live.
Allow the scan callback to stop iteration so the C++ speculative parser
can preserve its stop hook once it is wired up.
Expose a shared Attribute helper for resolving interned local names and
use it from the Rust parser and preload scanner instead of repeating the
same lookup pattern.
Cover link rel handling, preload destination filtering, crossorigin
mapping, and template/foreign-content skipping with Rust unit tests.
document.close() can defer script-created parser cleanup while a
parser-blocking script is pending. If document.open() installs a new
parser before the old parser resumes, the deferred action must clean up
the parser that scheduled it instead of the document's current parser.
Capture that parser before installing the deferred action. This keeps
the parked cleanup from affecting a parser installed by a later
document.open() call.
Delete the old C++ tree-construction implementation and helper classes
that became unused once the Rust parser is unconditional. Remove the C++
stack of open elements, active formatting elements, speculative mock
element, and tree-builder-only token storage.
Keep the C++ parser entry points that still own LibWeb DOM integration,
encoding detection, tokenizer bridging, incremental parsing, and the
speculative parser support used by resource discovery.
Remove the runtime selector between the old C++ tree builder and the new
Rust implementation. Always construct HTML documents and fragments with
the Rust parser now that it matches the existing tests.
Simplify dump-html-tree by dropping the backend option that only made
sense while both parser implementations were available.
Teach the Rust parser to recognize declarative shadow root templates and
pass the parsed mode, slot assignment, clonable, serializable, and
focus-delegation flags to the C++ DOM host.
Expose shadowRootSlotAssignment reflection with the spec-defined named
missing and invalid value defaults, and extend the ShadowDOM text test
coverage for the reflected property and parser-created shadow roots.
Finish the Rust implementation of the spec tree-construction algorithms
needed by the LibWeb test suite. Add the remaining table modes, foster
parenting, scope helpers, adoption agency handling, ruby/list/form and
select cases, frameset state, foreign-content edge cases, and parser
host callbacks.
Preserve behavior that depends on the C++ DOM integration, including
parser-created custom element reactions, fragment quirks mode, arbitrary
fragment namespaces, template fragment mode, fragment form ownership,
MathML annotation-xml boundaries, contextual fragment scripts, parser
script source positions, document.close() parser state, void-element
insertion, and duplicate attribute tracking.
Add focused tests for the parser edge cases that are easy to regress at
the boundary between the Rust tree builder and the C++ DOM host.
Preserve Rust parser state across tokenizer runs and stop cleanly when
a parser-blocking script has to execute. Thread the pending script back
through the existing C++ parser entry point so document.write(), input
insertion points, and script bookkeeping continue to use the normal
LibWeb machinery.
Add the fragment parser setup needed by innerHTML and contextual
fragment parsing, including context elements, form ownership, tokenizer
state selection, text coalescing, and foreign-content integration.
Implement the first Rust tree builder pass around the tokenizer and the
LibWeb DOM host hooks. Cover the document setup, insertion-mode
dispatch, ordinary body insertion, basic table handling, active
formatting element reconstruction, and foreign-content routing.
Leave the C++ parser available at runtime so the new path can be tested
against the old implementation while the remaining tree-construction
algorithms are filled in.
Add the C++ and Rust scaffolding that lets the tree builder live in
Rust while the DOM remains owned by LibWeb. Keep the exported surface
small: Rust stores parser state, and C++ provides node creation,
insertion, script, template, and GC hooks.
Route dump-html-tree through the selectable parser backend so the new
implementation can be exercised beside the existing parser while it is
being brought up.
Replace the FFI tokenizer state transmute with an explicit conversion
from the incoming numeric value. The old code range-checked against the
last state before transmuting, which matched today's contiguous enum but
left the conversion dependent on that layout detail.
Returning early for unknown values keeps the FFI boundary tolerant while
removing a source of possible invalid enum discriminants.
Replace the C++ HTML tokenizer with a Rust implementation behind the
existing HTMLTokenizer API.
Keep the parser-facing integration points for streaming input,
insertion points, document.write(), EOF insertion, parser aborts,
speculative parser input, and last start tag tracking. The generated
FFI handle stays an implementation detail of HTMLTokenizer, so callers
keep a single tokenizer class.
Preserve duplicate attributes through FFI so C++ token normalization can
record the duplicate-attribute signal used by CSP nonce checks. Keep
bulk tag-name and attribute scans capped at the active insertion point
so streamed parser input is spliced at the right offset.
Use generated DAFSA tables for named character references and intern
common tag and attribute names to reduce FFI marshalling overhead. This
also fixes attribute name source positions, nested old insertion points,
and aborted fast-path handling.
TestHTMLTokenizer covers duplicate attributes and insertion points in
fast tag-name, attribute-name, and quoted-value scans. A CSP text test
covers duplicate nonce attributes on parser-created script elements.
The tokenizer dump fixtures still match, TestHTMLTokenizer passes, and
the full release test-web run passes with 6981 tests and 226 skipped.
When resuming after an async wait for a pending parser-blocking script,
clear the parser pause flag before executing the script. The spec has
unblocked the tokenizer by this point, and document.write() calls from
the script must be able to synchronously process inserted markup up to
the insertion point.
This fixes ordering for document.write()'d inserted scripts during
external parser-blocking script execution.
thead, tbody, tfoot, tr, td, and th all have an `align` presentational
attribute with identical definitions. We previously only supported it
for td and th, and also allowed arbitrary text-align values instead of
the 4 dictated by the spec.
When the regular HTML parser is blocked on an external script, the
speculative parser scans ahead and pre-fetches discoverable
sub-resources. Previously those fetches were tracked only in the
parser's own URL list and never registered in the document's preload
map, so when the regular parser later reached each element fetch()'s
consume_a_preloaded_resource() lookup found nothing and issued a
duplicate request — every parser-blocked sub-resource was fetched
twice.
issue_speculative_fetch now creates a PreloadEntry, registers it
under create_a_preload_key(request) in the document's preload map,
and supplies a processResponseConsumeBody callback that populates
the entry. The map insertion happens after fetch() starts because
fetch() runs consume_a_preloaded_resource() synchronously, so
registering the entry beforehand would short-circuit the
speculative fetch itself.
The body-handling steps (1, 2, 5 of the preload algorithm's
processResponseConsumeBody) are factored into a shared
deliver_preload_response helper used by both the speculative parser
and HTMLLinkElement::preload.
Introduce IncrementalDocumentParser, which streams the response body
through a TextCodec::StreamingDecoder into the HTMLTokenizer one chunk
at a time. The tokenizer pauses when it runs out of input and resumes
once the next chunk is appended; when the body closes we close the
tokenizer's input stream so it can finish the parse.
DocumentLoading routes HTML responses through the new parser instead of
buffering the full body before handing it to HTMLParser.
Pull the post-parse-action setup, run loop, and post-parse invocation
out of HTMLParser::run(URL, ...) into a new run_until_completion()
method. The URL overload still calls it; behavior is unchanged. The
incremental parser will use this entry point directly without going
through the URL-setting overload.
Add a ScriptCreatedParser flag plumbed through HTMLParser's constructor
and create_for_scripting(). Only document.open()'s parser sets it to
Yes. Document::close() step 3 now checks is_script_created() so it
correctly skips parsers that weren't created via document.open(),
matching the spec.
Previously the check was just `if (!m_parser)`, which incorrectly let
document.close() insert an EOF into a network-driven parser. The bug
was mostly latent because the network parser used to finish quickly,
but it matters once the network parser stays alive for the duration of
a streamed parse.
Add can_run_out_of_characters() and use it in the
NamedCharacterReference state and consume_next_if_match() so that an
open input stream gets the same code-point-at-a-time treatment as an
active document.write insertion point. Without this, a network chunk
that ends partway through a named character reference or a
multi-character match would make the tokenizer commit to a "no match"
decision before the remaining bytes arrive.
No behavior change for existing callers: the new helper still returns
false once the input stream is closed (which the StringView constructor
sets immediately).
Add an explicit "input stream closed" flag plus the streaming-input API
(append_to_input_stream, close_input_stream, is_input_stream_closed) to
let a future incremental driver feed bytes as they arrive. Rewrite
should_pause_before_next_input_character so the tokenizer pauses when
the buffer is exhausted but more bytes may still arrive, including the
case where a chunk ends in CR (CRLF normalization needs one code point
of lookahead).
Existing call sites are unaffected: the StringView constructor
immediately marks the input stream closed, and insert_eof() now also
closes the stream so document.close() drives the same exit path.
HTML newline normalization collapses CRLF into a single LF, so
next_code_point() needs one code point of lookahead at a CR to decide
whether the CR stands alone or is the first half of a CRLF pair. When
the tokenizer is paused at the insertion point and the next code point
to consume is a CR sitting one position before it, that lookahead has
not been written yet.
Previously the tokenizer consumed the CR and emitted it as LF, so a
subsequent document.write() that began with LF surfaced as a second
LF instead of being absorbed into the original CRLF pair.
Stop one code point earlier in this case and wait for the next write
to arrive. This makes four html5lib write_single WPT tests pass.
The HTML parser's script end tag algorithms save the current insertion
point in an "old insertion point" local before executing a script, then
restore that local after script execution. Ladybird modeled that local
as a single tokenizer field, so nested script execution via
document.write() could overwrite the outer script's saved value.
Keep a stack of old insertion points instead, and adjust saved offsets
when document.write() inserts new input before them. This keeps the
normal script and SVG script paths aligned with the spec text while
leaving the parser-blocking script resume path to set the insertion
point to undefined again.
Replace the spin_until in SVGScriptElement::process_the_script_element
with an async fetch that mirrors HTMLScriptElement's mark_as_ready
pattern. External SVG scripts now fetch and execute asynchronously,
matching Chromium's behavior.
For HTML-embedded SVG scripts, the parser pauses via the existing
schedule_resume_check infrastructure, extended to support SVG scripts
through a new pending_parsing_blocking_svg_script slot on Document.
For top-level XML/SVG documents, scripts execute when their fetch
completes; the load event is delayed via DocumentLoadEventDelayer which
the existing XMLDocumentBuilder::document_end already waits on.
When the HTML parser blocks on a synchronous external script, run a
separate tokenizer over the unparsed input and issue speculative fetches
for the resources it finds (script src, link rel=stylesheet|preload, img
src), with <base href> tracking and template/foreign-content skipping.
Also fills in the previously-stubbed "consume a preloaded resource"
algorithm and the document's "map of preloaded resources", so that
<link rel="preload"> followed by a matching consumer deduplicates to
a single fetch.
Spinning a nested event loop to wait for a parser-blocking script blocks
the calling thread, can deadlock, and creates reentrancy hazards. Switch
to an event-driven pause/resume model, mirroring the prior
HTMLParserEndState refactor (df96b69e7a).
Three WPT document.write tests flip from Fail to Pass and are
rebaselined: all write an external script via document.write() followed
by inline content. With spin_until, control did not return to the caller
of document.write() between writing the script and observing its effects
so the test's order assertions saw a different sequence than the spec
mandates.
This corresponds with the editorial change to the HTML standard
introducing the parsing mode enum of:
01c45cede
And a follow up normative change of:
508706c80
Making fragment parsing derive its scripting mode from the context
document.
Inline JS-to-JS frames no longer live in the raw execution context
vector, so LibWeb callers that need to inspect or pop contexts now go
through VM helpers instead of peeking into that storage directly.
This keeps the execution context bookkeeping encapsulated while
preserving existing microtask and realm-entry checks.
HTMLParser::the_end() had three spin_until calls that blocked the event
loop: step 5 (deferred scripts), step 7 (ASAP scripts), and step 8
(load event delay). This replaces them with an HTMLParserEndState state
machine that progresses asynchronously via callbacks.
The state machine has three phases matching the three spin_until calls:
- WaitingForDeferredScripts: loops executing ready deferred scripts
- WaitingForASAPScripts: waits for ASAP script lists to empty
- WaitingForLoadEventDelay: waits for nothing to delay the load event
Notification triggers re-evaluate the state machine when conditions
change: HTMLScriptElement::mark_as_ready, stylesheet unblocking in
StyleElementBase/HTMLLinkElement, did_stop_being_active_document, and
DocumentLoadEventDelayer decrements. NavigableContainer state changes
(session history readiness, content navigable cleared, lazy load flag)
also trigger re-evaluation of the load event delay check.
Key design decisions and why:
1. Microtask checkpoint in schedule_progress_check(): The old spin_until
called perform_a_microtask_checkpoint() before checking conditions.
This is critical because HTMLImageElement::update_the_image_data step
8 queues a microtask that creates the DocumentLoadEventDelayer.
Without the checkpoint, check_progress() would see zero delayers and
complete before images start delaying the load event.
2. deferred_invoke in schedule_progress_check():
I tried Core::Timer (0ms), queue_global_task, and synchronous calls.
Timers caused non-deterministic ordering with the HTML event loop's
task processing timer, leading to image layout tests failing (wrong
subtest pass/fail patterns). Synchronous calls fired too early during
image load processing before dimensions were set, causing 0-height
images in layout tests. queue_global_task had task ordering issues
with the session history traversal queue. deferred_invoke runs after
the current callback returns but within the same event loop pump,
giving the right balance.
3. Navigation load event guard (m_navigation_load_event_guard): During
cross-document navigation, finalize_a_cross_document_navigation step
2 calls set_delaying_load_events(false) before the session history
traversal activates the new document. This creates a transient state
where the parent's load event delay check sees the about:blank (which
has ready_for_post_load_tasks=true) as the active document and
completes prematurely.
Remove includes from Node.h that are only needed for forward
declarations (AccessibilityTreeNode.h, XMLSerializer.h,
JsonObjectSerializer.h). Extract StyleInvalidationReason and
FragmentSerializationMode enums into standalone lightweight
headers so downstream headers (CSSStyleSheet.h, CSSStyleProperties.h,
HTMLParser.h) can include just the enum they need instead of all of
Node.h. Replace Node.h with forward declarations in headers that only
use Node by pointer/reference.
This breaks the circular dependency between Node.h and
AccessibilityTreeNode.h, reducing AccessibilityTreeNode.h's
recompilation footprint from ~1399 to ~25 files.
When a document is navigated away from while HTMLParser::the_end() is
spinning the event loop (steps 7 and 8), the spin_until stays on the
call stack indefinitely, causing all subsequent event processing on the
same event loop to happen within nested spin_until pumping. Add
is_fully_active() checks to bail out early in this case.
This adds visit_edges(Cell::Visitor&) methods to various helper structs
that contain GC pointers, and makes sure they are called from owning
GC-heap-allocated objects as needed.
These were found by our Clang plugin after expanding its capabilities.
The added rules will be enforced by CI going forward.
Introduce the HTMLSelectedContentElement and integrate it into
<select>, <option> and HTMLParser.
See whatwg/html#10548.
There are two bugs with WPT tests which causes the third subtest
in selectedcontent.html and selectedcontent-mutations.html fail.
See whatwg/html#11882, web-platform-tests/wpt#55849.