Commit graph

85 commits

Author SHA1 Message Date
Shannon Booth
8642801889 LibWeb: Set fragment scripting mode from the context document
This corresponds with the editorial change to the HTML standard
introducing the parsing mode enum of:

01c45cede

And a follow up normative change of:

508706c80

Making fragment parsing derive its scripting mode from the context
document.
2026-04-14 23:01:36 +02:00
Andreas Kling
88d4d1b1a6 LibWeb: Use VM helpers for execution context access
Inline JS-to-JS frames no longer live in the raw execution context
vector, so LibWeb callers that need to inspect or pop contexts now go
through VM helpers instead of peeking into that storage directly.

This keeps the execution context bookkeeping encapsulated while
preserving existing microtask and realm-entry checks.
2026-04-13 18:29:43 +02:00
Aliaksandr Kalenik
df96b69e7a LibWeb: Replace spin_until in HTMLParser::the_end() with state machine
HTMLParser::the_end() had three spin_until calls that blocked the event
loop: step 5 (deferred scripts), step 7 (ASAP scripts), and step 8
(load event delay). This replaces them with an HTMLParserEndState state
machine that progresses asynchronously via callbacks.

The state machine has three phases matching the three spin_until calls:
- WaitingForDeferredScripts: loops executing ready deferred scripts
- WaitingForASAPScripts: waits for ASAP script lists to empty
- WaitingForLoadEventDelay: waits for nothing to delay the load event

Notification triggers re-evaluate the state machine when conditions
change: HTMLScriptElement::mark_as_ready, stylesheet unblocking in
StyleElementBase/HTMLLinkElement, did_stop_being_active_document, and
DocumentLoadEventDelayer decrements. NavigableContainer state changes
(session history readiness, content navigable cleared, lazy load flag)
also trigger re-evaluation of the load event delay check.

Key design decisions and why:

1. Microtask checkpoint in schedule_progress_check(): The old spin_until
   called perform_a_microtask_checkpoint() before checking conditions.
   This is critical because HTMLImageElement::update_the_image_data step
   8 queues a microtask that creates the DocumentLoadEventDelayer.
   Without the checkpoint, check_progress() would see zero delayers and
   complete before images start delaying the load event.

2. deferred_invoke in schedule_progress_check():
   I tried Core::Timer (0ms), queue_global_task, and synchronous calls.
   Timers caused non-deterministic ordering with the HTML event loop's
   task processing timer, leading to image layout tests failing (wrong
   subtest pass/fail patterns). Synchronous calls fired too early during
   image load processing before dimensions were set, causing 0-height
   images in layout tests. queue_global_task had task ordering issues
   with the session history traversal queue. deferred_invoke runs after
   the current callback returns but within the same event loop pump,
   giving the right balance.

3. Navigation load event guard (m_navigation_load_event_guard): During
   cross-document navigation, finalize_a_cross_document_navigation step
   2 calls set_delaying_load_events(false) before the session history
   traversal activates the new document. This creates a transient state
   where the parent's load event delay check sees the about:blank (which
   has ready_for_post_load_tasks=true) as the active document and
   completes prematurely.
2026-03-28 23:14:55 +01:00
Sam Atkins
ed6a5f25a0 LibWeb: Implement scoped custom element registries 2026-03-27 19:49:55 +00:00
Luke Wilde
0381c40cb4 LibWeb: Reset non-FACEs and don't associate them to a form during parse
(FACE stands for form-associated custom element)
2026-03-25 13:18:15 +00:00
Tim Ledbetter
36f59a406e LibWeb: Put HTML parser debug message behind a flag 2026-03-10 11:14:04 +01:00
Andreas Kling
e87f889e31 Everywhere: Abandon Swift adoption
After making no progress on this for a very long time, let's acknowledge
it's not going anywhere and remove it from the codebase.
2026-02-17 10:48:09 -05:00
Andreas Kling
9b987baf0e LibWeb: Bail out of the_end() spin_untils for inactive documents
When a document is navigated away from while HTMLParser::the_end() is
spinning the event loop (steps 7 and 8), the spin_until stays on the
call stack indefinitely, causing all subsequent event processing on the
same event loop to happen within nested spin_until pumping. Add
is_fully_active() checks to bail out early in this case.
2026-02-10 21:19:35 +01:00
Jelle Raaijmakers
ae20ecf857 AK+Everywhere: Add Vector::contains(predicate) and use it
No functional changes.
2026-01-08 15:27:30 +00:00
Andreas Kling
a9cc425cde LibJS+LibWeb: Add missing GC marking visits
This adds visit_edges(Cell::Visitor&) methods to various helper structs
that contain GC pointers, and makes sure they are called from owning
GC-heap-allocated objects as needed.

These were found by our Clang plugin after expanding its capabilities.
The added rules will be enforced by CI going forward.
2026-01-07 12:48:58 +01:00
Feng Yu
b58fcaeecf LibWeb: Add HTMLSelectedContentElement for customizable select
Introduce the HTMLSelectedContentElement and integrate it into
<select>, <option> and HTMLParser.

See whatwg/html#10548.

There are two bugs with WPT tests which causes the third subtest
in selectedcontent.html and selectedcontent-mutations.html fail.
See whatwg/html#11882, web-platform-tests/wpt#55849.
2025-12-12 12:06:24 +00:00
Feng Yu
d2029b1814 LibWeb: Relax HTML parser to allow more tags inside <select>
This implements parsing part of customizable <select> spec update.
See whatwg/html PR #10548.

Two failing subtests in `html5lib_innerHTML_tests_innerHTML_1.html`
and `customizable-select/select-parsing.html` are due to the spec
still disallowing `<input>` inside `<select>`, even though Chrome
has already implemented this behavoir (see whatwg/html#11288).
2025-12-04 17:17:01 +00:00
Sam Atkins
a25cb679fb LibWeb/HTML: Update spec text related to template's content
Corresponds to:
aa52274b5a
2025-11-27 10:26:13 +00:00
Sam Atkins
8ca4833885 LibWeb/HTML: Update spec text in create_element_for()
No behaviour changes.
2025-11-26 09:52:47 +01:00
Sam Atkins
6e2f8166f4 LibWeb/HTML: Combine duplicate parsing branches
These are combined in the current spec. No behaviour change.
2025-11-26 09:52:47 +01:00
Sam Atkins
6a4ab26b48 LibWeb/HTML: Return early from find_appropriate_place_for_inserting_node
Step 2.(a).5 says to abort, but we were instead carrying on and would
run steps 3 and 4. Those steps would not change the result at all, but
this avoids a little unnecessary work.

I wrapped a couple of comments at 120 columns while I was at it.
2025-11-26 09:52:47 +01:00
Sam Atkins
418e22d65a LibWeb/HTML: Bring hand_in_head in HTML parser more up to date
A couple of spec text changes I noticed, and use `has_attribute()`
instead of manually checking it.
2025-11-26 09:52:47 +01:00
Psychpsyo
100f37995f Everywhere: Clean up AD-HOC and FIXME comments without colons 2025-11-13 15:56:04 +01:00
Lorenz A
f8330a2ec5 LibWeb: Do not execute unclosed SVG script tags 2025-11-09 01:43:46 +01:00
Andreas Kling
3593c3b687 LibWeb: Throw out decoded UTF-32 data in HTMLTokenizer after parser runs
This ends up saving quite a bit of memory on many pages, since UTF-32
uses 4 bytes per code points.

As an example, it reduces the footprint on https://gymgrossisten.com/
by 2 MiB.
2025-10-24 08:52:53 +02:00
mikiubo
0b715b20a2 LibWeb: Make HTML fragment parsing return ExceptionOr
Update Element::parse_fragment and Node::unsafely_set_html to
propagate exceptions.

This refactor is needed as a prerequisite for implementing the XML
fragment parser, which requires consistent error handling in fragment
parsing.
2025-10-23 11:06:39 +01:00
Lorenz A
6afd39b16a LibWeb: Keep the tokens in ListOfActiveFormattingElements 2025-10-21 23:36:07 +02:00
Tete17
82f56e30ed LibWeb: Adapt the parsing of script elements to accommodate TrustedTypes 2025-09-16 10:57:34 +02:00
Lorenz A
47796e7967 LibWeb: Serialize HTML attribute names as per spec 2025-09-15 10:08:12 +02:00
euro20179
e442aa6e10 LibWeb: Ensure parser cannot change the mode is handled
This fixes at least 1 wpt bug where text/plain documents are rendered in
quirks mode. The test in question: https://wpt.live/html/browsers/browsing-the-web/read-text/load-text-plain.html
2025-09-07 11:11:43 +01:00
Luke Wilde
b17783bb10 Everywhere: Change west consts caught by clang-format-21 to east consts 2025-08-29 18:18:55 +01:00
Tim Ledbetter
cb1a1a5cb5 LibWeb: Replace is<T>() with as_if<T>() where possible 2025-08-25 18:45:00 +02:00
Sam Atkins
99bce9a94d LibWeb/CSS: Replace CSSUnitValue with DimensionStyleValue
CSSUnitValue is a typed-om type which we will implement separately in
the future. However, it still seems useful to give our dimension values
a base class. (Maybe they could be templated in the future?) So instead
of deleting it entirely, rename it to DimensionStyleValue and make its
API match our style better.
2025-08-08 15:19:03 +01:00
Sam Atkins
c57975c9fd LibWeb: Move and rename CSSStyleValue to StyleValues/StyleValue.{h,cpp}
This reverts 0e3487b9ab.

Back when I made that change, I thought we could make our StyleValue
classes match the typed-om definitions directly. However, they have
different requirements. Typed-om types need to be mutable and GCed,
whereas StyleValues are immutable and ideally wouldn't require a JS VM.

While I was already making such a cataclysmic change, I've moved it into
the StyleValues directory, because it *not* being there has bothered me
for a long time. 😅
2025-08-08 15:19:03 +01:00
Timothy Flynn
8b6e3cb735 LibWeb+LibUnicode+WebContent: Port DOM:CharacterData to UTF-16
This replaces the underlying storage of CharacterData with Utf16String
and deals with the fallout.
2025-07-24 19:00:20 +02:00
Andrew Kaster
3040ca4311 LibWeb: Remove noisy debug messages from HTMLParser 2025-07-09 16:26:49 -06:00
Luke Wilde
2368641de5 LibWeb: Track if element was created from token with dupe attributes
This is required for CSP to ignore the nonce attribute to prevent
duplicate attributes hijacking the attribute.

See https://w3c.github.io/webappsec-csp/#security-nonce-hijacking
2025-07-09 15:52:54 -06:00
Tim Ledbetter
8828e0d791 LibWeb: Avoid updating muted state on muted content attribute changes
The `muted` content attribute should only affect the state of the
`muted` IDL property when the media element is first created. The
attribute should have no dynamic effect.
2025-06-27 09:14:54 +12:00
mikiubo
ff78746be1 LibWeb: Set readyState to complete for DOMParser documents
Documents created via DOMParser.parseFromString()
are parsed synchronously and do not participate in the
browsing context's loading pipeline.

This patch ensures that if the document has no browsing context
(i.e. was parsed via DOMParser),
its readiness is set to "complete" synchronously.

Fixes WPT:
domparsing/xmldomparser.html
2025-06-25 20:49:03 +12:00
Sam Atkins
423cdd447d LibWeb+LibGfx: Apply editorial punctuation/whitespace/markup fixes
Corresponds to d426109ea1
and fd08f81d06
2025-06-25 03:12:19 +12:00
Sam Atkins
a35d14eab3 LibWeb/HTML: Add or update spec steps in HTML parser
Partly corresponds to this which adds numbering to some substeps:
d426109ea1

This is not a complete review of all the spec steps to check that
they're up to date - I just updated the parts affected by that above
commit, and then added some `->` marks to places I noticed it was
missing. There may be actual spec differences still.

An actual change that needs tackling later is that `handle_in_head()`'s
branch for `<template>` has some new steps related to custom element
registries.
2025-06-25 03:12:19 +12:00
Shannon Booth
1fed3d27c1 LibWeb/HTML: Don't set opaque origin on innerHTML document
This appears to no longer be necessary now that we are properly
handling the loading of image data for the <img> element.
2025-06-24 09:56:14 +02:00
Shannon Booth
e0d7278820 LibURL+LibWeb: Make URL::Origin default constructor private
Instead, porting over all users to use the newly created
Origin::create_opaque factory function. This also requires porting
over some users of Origin to avoid default construction.
2025-06-17 20:54:03 +02:00
Shannon Booth
5deb8ba2f8 LibWeb: Explicitly set Document's origin
As part of the effort of removing the default constructor of
Origin, since document has the origin set after construction,
port Document's origin over to an Optional<Origin>.

This exposes that we were never setting the origin of the document
during fragment parsing. For now, to maintain previous behaviour,
let's explicitly set it to an opaque origin.
2025-06-17 20:54:03 +02:00
Gingeh
f1eaecc630 LibWeb: Escape "<" and ">" when serializing attribute values
See https://github.com/whatwg/html/pull/6362
2025-05-22 07:55:34 +01:00
Shannon Booth
579730d861 LibWeb: Prefer using equals_ignoring_ascii_case
Which has an optmization if both size of the string being passed
through are FlyStrings, which actually ends up being the case
in some places during selector matching comparing attribute names.
Instead of maintaining more overloads of
Infra::is_ascii_case_insensitive_match, switch
everything over to equals_ignoring_ascii_case instead.
2025-05-21 13:45:02 +01:00
Andreas Kling
263b125782 LibWeb: Let HTMLTokenizer walk over code points instead of UTF-8
Instead of using UTF-8 iterators to traverse the HTMLTokenizer input
stream one code point at a time, we now do a one-shot conversion up
front from the input encoding to a Vector<u32> of Unicode code points.

This simplifies the tokenizer logic somewhat, and ends up being faster
as well, so win-win.

1.02x speedup on Speedometer 2.1
2025-05-11 01:13:20 +02:00
Shannon Booth
084cceab5c LibWeb: Split out SimilarOriginWindowAgent from HTML::Agent
To allow for adding the concept of a WorkerAgent to be reused
between shared and dedicated workers. An event loop is the
commonality between the different agent types, though, there
are some differences between those event loops which we customize
on the construction of the HTML::EventLoop.
2025-04-25 14:07:51 +02:00
Andreas Kling
27efb3b140 LibWeb: When declarative shadow attachment fails, continue in right spot
If attachment fails for whatever reason (e.g the host element is not
allowed to be a host), the HTML spec tells us to insert the template
element anyway and proceed.

Before this change, we were recomputing the insertion location at this
point, which caused it to be *inside* the template element. Inserting
the template element into itself didn't work, and so the DOM would end
up incorrect.

The fix here is to simply use the insertion point we determined earlier
in the same function, before putting a template element on the stack of
open elements. We already do this elsewhere.

Fixes at least 228 subtests on WPT. :^)
2025-04-25 11:01:17 +02:00
Andreas Kling
e5d62e9915 LibWeb: Track whether HTMLLinkElement was enabled when created by parser
This information is needed by the script-blocking style sheet logic, and
its absence was causing a WPT test to crash.
2025-04-24 18:26:54 +02:00
Andrew Kaster
6d11414957 LibWeb: Make storage of CSS::StyleValues const-correct
Now we consistently use `RefPtr<StyleValue const>` for all StyleValues.
2025-04-16 10:41:44 -06:00
Andrew Kaster
8cfac6ed71 LibWeb: Store a SpeculativeHTMLParser on the HTML Parser
The parser was previously added, but unused. Actually attaching one to
the HTML Parser will let us test the limits of Swift interop.
2025-04-16 09:02:27 -06:00
Sam Atkins
7367150536 LibWeb/HTML: Update FIXME to not reset form-associated custom elements
Corresponds to e6bdd0557a
2025-04-02 17:28:06 +01:00
Sam Atkins
415dd1be06 LibWeb/HTML: Remove "flag" word from usage of "page showing"
Corresponds to 30935f3474
2025-03-14 20:33:25 +00:00
InvalidUsernameException
0e1eb4d4a7 LibWeb: Respect scroll position set by script during page load
When setting scroll position during page load we need to consider
whether we actually have a fragment to scroll to. A script may already
have run at that point and may already have set a scroll position.

If there is an actual fragment to scroll to, it is fine to scroll to
that fragment, since it should take precedence. If we don't have a
fragment however, we should not unnecessarily overwrite the scroll
position set by the script back to (0, 0).

Since this problem is caused by a spec bug, I have tested the behavior
in the three major browsers engines. Unfortunately they do not agree
fully with each other. If there is no fragment at all (e.g. `foo.html`),
all browsers will respect the scroll position set by the script. If
there is a fragment (e.g. `foo.html#bar`), all browsers will set the
scroll position to the fragment element and ignore the one set by
script. However, when the fragment is empty (e.g. `foo.html#`), then
Blink and WebKit will set scroll position to the fragment, while Gecko
will set scroll position from script. Since all of this is ad-hoc
behavior anyway, I simply implemented the Blink/WebKit behavior because
of the majority vote for now.

This fixes a regression introduced in 51102254b5.
2025-03-10 17:14:13 +01:00