ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2026-06-27 19:51:03 +00:00

Author	SHA1	Message	Date
Shannon Booth	13bae036d8	LibWeb: Do not refer to removed C++ DAFSA code generator	2026-06-05 20:23:24 +02:00
Andreas Kling	164ed80244	Meta: Enable exit-time destructor warnings for libraries Enable -Wexit-time-destructors for all in-tree library targets and update process-lifetime library statics so they no longer register exit-time destructors. Long-lived caches, lookup tables, singleton registries, and generated constants now use NeverDestroyed or leaked references where the data is intended to live until process exit. Update LibWeb, LibLine, and the binding generators so regenerated sources follow the same rule instead of reintroducing destructed statics.	2026-06-04 19:20:49 +02:00
Sam Atkins	bb4f8a6621	LibWeb: Track parser-created style sheets Create parser-blocking style sheets when parser-created `<style>` elements are popped from the stack of open elements, and ignore dynamic style updates while those elements are still open in the parser. Make the shared style-element script-blocking predicate describe the active style sheet instance. Stale script-blocking entries are removed when that style sheet is replaced or removed.	2026-06-04 16:39:54 +01:00
R-Goc	520a7c8ebd	AK+LibWeb: Centralize FFI helper functions This commit creates a central FFIHelpers.h header which implements common conversions from FFI.	2026-05-28 14:15:43 -05:00
Vanand Gasparyan	f3a3488cda	Rust: Set import granularity to Item By default, `rustfmt` persists the import granularity. In practice, most Rust code has import granularity "Module" due to LSP's actions. "Item" gets rid of import groupings and achieves cleaner diffs and better conflict resolution. Better greppability is a positive side effect. Note: it's an unstable rustfmt feature. `cargo +nightly fmt` must be used instead of `cargo fmt`.	2026-05-28 06:52:18 +02:00
aplefull	201e380854	LibWeb: Allow UTF-8 detection for file: scheme	2026-05-25 23:45:28 +02:00
Andreas Kling	ca97f68cb7	LibWeb: Normalize decoded HTML string parsing Preserve leading BOMs when parsing already-decoded HTML strings, since those strings do not go through the encoded byte decoder path. Decoded markup from JS strings can also contain WTF-8 for lone surrogate code units. Keep the common scalar UTF-8 path to a single validation and copy, but replace surrogates before handing bytes to the Rust tokenizer. Add text coverage for DOMParser and innerHTML string parsing, including leading BOMs, text and attributes, lone high and low surrogates, and a valid surrogate pair.	2026-05-24 10:14:17 +02:00
Shannon Booth	637fd51595	LibWeb: Unify WebIDL C++ type generation Represent WebIDL C++ types with a single CppType model that tracks nullability, optional presence, and contained storage. GC-like values now use GC::Ref/GC::Ptr directly, while containers choose "plain", "Root", or "Conservative" container types depending on what they contain. For example, sequence<Element> becomes a RootVector of GC::Ref values, while sequence<SomeDictionary> becomes a ConservativeVector only when the dictionary contains GC-like values. This moves the generated bindings away from wrapping GC values in GC::Root by default. This has broad fallout as the types passed to interfaces for GC objects changes almost fully across the board.	2026-05-23 18:26:12 +02:00
Martin Chrástek	cd3c72dfda	LibWeb: Implement `chardetng`-based encoding detection for HTML parsing	2026-05-23 11:57:33 +02:00
Sam Atkins	34382a2aca	LibWeb/HTML: Add missing include for KeywordStyleValue	2026-05-20 13:00:50 +01:00
Andreas Kling	936bb9ca53	LibWeb: Use Rust preload scanner Replace the C++ speculative HTML parser token walk with the Rust preload scanner. Keep URL resolution, duplicate suppression, and fetch issuance in C++ so the scanner only emits base href updates and fetch candidates. Use the scanner callback result to stop iteration when the speculative parser has been stopped. Update parser comments that still described speculative mock element production.	2026-05-18 00:23:52 +02:00
Andreas Kling	411c6654e8	LibWeb: Add Rust preload scanner Add a Rust scanner that walks pending HTML parser input and emits base href updates or speculative fetch candidates. Keep URL parsing and fetch issuance in C++ for now, where the Document and request objects live. Allow the scan callback to stop iteration so the C++ speculative parser can preserve its stop hook once it is wired up. Expose a shared Attribute helper for resolving interned local names and use it from the Rust parser and preload scanner instead of repeating the same lookup pattern. Cover link rel handling, preload destination filtering, crossorigin mapping, and template/foreign-content skipping with Rust unit tests.	2026-05-18 00:23:52 +02:00
Andreas Kling	ccf5a278ab	LibWeb: Keep deferred document.close cleanup on its parser document.close() can defer script-created parser cleanup while a parser-blocking script is pending. If document.open() installs a new parser before the old parser resumes, the deferred action must clean up the parser that scheduled it instead of the document's current parser. Capture that parser before installing the deferred action. This keeps the parked cleanup from affecting a parser installed by a later document.open() call.	2026-05-17 15:35:56 +02:00
Andreas Kling	29784ea397	LibWeb: Remove the C++ HTML tree builder Delete the old C++ tree-construction implementation and helper classes that became unused once the Rust parser is unconditional. Remove the C++ stack of open elements, active formatting elements, speculative mock element, and tree-builder-only token storage. Keep the C++ parser entry points that still own LibWeb DOM integration, encoding detection, tokenizer bridging, incremental parsing, and the speculative parser support used by resource discovery.	2026-05-17 15:35:56 +02:00
Andreas Kling	a7ece4b062	LibWeb: Make the Rust HTML parser unconditional Remove the runtime selector between the old C++ tree builder and the new Rust implementation. Always construct HTML documents and fragments with the Rust parser now that it matches the existing tests. Simplify dump-html-tree by dropping the backend option that only made sense while both parser implementations were available.	2026-05-17 15:35:56 +02:00
Andreas Kling	f49335f210	LibWeb: Align declarative shadow root parsing Teach the Rust parser to recognize declarative shadow root templates and pass the parsed mode, slot assignment, clonable, serializable, and focus-delegation flags to the C++ DOM host. Expose shadowRootSlotAssignment reflection with the spec-defined named missing and invalid value defaults, and extend the ShadowDOM text test coverage for the reflected property and parser-created shadow roots.	2026-05-17 15:35:56 +02:00
Andreas Kling	54879bc916	LibWeb: Complete Rust HTML tree construction Finish the Rust implementation of the spec tree-construction algorithms needed by the LibWeb test suite. Add the remaining table modes, foster parenting, scope helpers, adoption agency handling, ruby/list/form and select cases, frameset state, foreign-content edge cases, and parser host callbacks. Preserve behavior that depends on the C++ DOM integration, including parser-created custom element reactions, fragment quirks mode, arbitrary fragment namespaces, template fragment mode, fragment form ownership, MathML annotation-xml boundaries, contextual fragment scripts, parser script source positions, document.close() parser state, void-element insertion, and duplicate attribute tracking. Add focused tests for the parser edge cases that are easy to regress at the boundary between the Rust tree builder and the C++ DOM host.	2026-05-17 15:35:56 +02:00
Andreas Kling	de12062515	LibWeb: Wire Rust parser scripts and fragments Preserve Rust parser state across tokenizer runs and stop cleanly when a parser-blocking script has to execute. Thread the pending script back through the existing C++ parser entry point so document.write(), input insertion points, and script bookkeeping continue to use the normal LibWeb machinery. Add the fragment parser setup needed by innerHTML and contextual fragment parsing, including context elements, form ownership, tokenizer state selection, text coalescing, and foreign-content integration.	2026-05-17 15:35:56 +02:00
Andreas Kling	2e9875770e	LibWeb: Add initial Rust HTML tree construction Implement the first Rust tree builder pass around the tokenizer and the LibWeb DOM host hooks. Cover the document setup, insertion-mode dispatch, ordinary body insertion, basic table handling, active formatting element reconstruction, and foreign-content routing. Leave the C++ parser available at runtime so the new path can be tested against the old implementation while the remaining tree-construction algorithms are filled in.	2026-05-17 15:35:56 +02:00
Andreas Kling	09296315c2	LibWeb: Add Rust HTML parser host plumbing Add the C++ and Rust scaffolding that lets the tree builder live in Rust while the DOM remains owned by LibWeb. Keep the exported surface small: Rust stores parser state, and C++ provides node creation, insertion, script, template, and GC hooks. Route dump-html-tree through the selectable parser backend so the new implementation can be exercised beside the existing parser while it is being brought up.	2026-05-17 15:35:56 +02:00
Andreas Kling	5b63cb5f37	LibWeb: Avoid unsafe tokenizer state conversion Replace the FFI tokenizer state transmute with an explicit conversion from the incoming numeric value. The old code range-checked against the last state before transmuting, which matched today's contiguous enum but left the conversion dependent on that layout detail. Returning early for unknown values keeps the FFI boundary tolerant while removing a source of possible invalid enum discriminants.	2026-05-17 15:35:56 +02:00
Andreas Kling	171e3adf01	LibWeb: Replace the HTML tokenizer with Rust Replace the C++ HTML tokenizer with a Rust implementation behind the existing HTMLTokenizer API. Keep the parser-facing integration points for streaming input, insertion points, document.write(), EOF insertion, parser aborts, speculative parser input, and last start tag tracking. The generated FFI handle stays an implementation detail of HTMLTokenizer, so callers keep a single tokenizer class. Preserve duplicate attributes through FFI so C++ token normalization can record the duplicate-attribute signal used by CSP nonce checks. Keep bulk tag-name and attribute scans capped at the active insertion point so streamed parser input is spliced at the right offset. Use generated DAFSA tables for named character references and intern common tag and attribute names to reduce FFI marshalling overhead. This also fixes attribute name source positions, nested old insertion points, and aborted fast-path handling. TestHTMLTokenizer covers duplicate attributes and insertion points in fast tag-name, attribute-name, and quoted-value scans. A CSP text test covers duplicate nonce attributes on parser-created script elements. The tokenizer dump fixtures still match, TestHTMLTokenizer passes, and the full release test-web run passes with 6981 tests and 226 skipped.	2026-05-15 21:01:40 +02:00
Shannon Booth	d595369ae4	LibWeb: Let document.write() reenter parser from parser-blocking scripts When resuming after an async wait for a pending parser-blocking script, clear the parser pause flag before executing the script. The spec has unblocked the tokenizer by this point, and document.write() calls from the script must be able to synchronously process inserted markup up to the insertion point. This fixes ordering for document.write()'d inserted scripts during external parser-blocking script execution.	2026-05-15 19:49:45 +02:00
Sam Atkins	73c4b77f68	LibWeb/HTML: Support `align` attributes on table sections and rows thead, tbody, tfoot, tr, td, and th all have an `align` presentational attribute with identical definitions. We previously only supported it for td and th, and also allowed arbitrary text-align values instead of the 4 dictated by the spec.	2026-04-30 15:20:22 +02:00
Andreas Kling	a538d2b160	Revert "LibWeb: Have speculative HTML parser populate the preload map" This reverts commit `b88cbb1b74`. Appears to be causing large regressions on WPT.	2026-04-30 04:55:31 +02:00
Aliaksandr Kalenik	b88cbb1b74	LibWeb: Have speculative HTML parser populate the preload map When the regular HTML parser is blocked on an external script, the speculative parser scans ahead and pre-fetches discoverable sub-resources. Previously those fetches were tracked only in the parser's own URL list and never registered in the document's preload map, so when the regular parser later reached each element fetch()'s consume_a_preloaded_resource() lookup found nothing and issued a duplicate request — every parser-blocked sub-resource was fetched twice. issue_speculative_fetch now creates a PreloadEntry, registers it under create_a_preload_key(request) in the document's preload map, and supplies a processResponseConsumeBody callback that populates the entry. The map insertion happens after fetch() starts because fetch() runs consume_a_preloaded_resource() synchronously, so registering the entry beforehand would short-circuit the speculative fetch itself. The body-handling steps (1, 2, 5 of the preload algorithm's processResponseConsumeBody) are factored into a shared deliver_preload_response helper used by both the speculative parser and HTMLLinkElement::preload.	2026-04-29 15:59:22 +02:00
Aliaksandr Kalenik	4762c4fa5c	LibWeb: Add incremental HTML parsing Introduce IncrementalDocumentParser, which streams the response body through a TextCodec::StreamingDecoder into the HTMLTokenizer one chunk at a time. The tokenizer pauses when it runs out of input and resumes once the next chunk is appended; when the body closes we close the tokenizer's input stream so it can finish the parse. DocumentLoading routes HTML responses through the new parser instead of buffering the full body before handing it to HTMLParser.	2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik	01fa8a27ac	LibWeb: Extract HTMLParser::run_until_completion() Pull the post-parse-action setup, run loop, and post-parse invocation out of HTMLParser::run(URL, ...) into a new run_until_completion() method. The URL overload still calls it; behavior is unchanged. The incremental parser will use this entry point directly without going through the URL-setting overload.	2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik	f499edefae	LibWeb: Track whether HTMLParser is script-created Add a ScriptCreatedParser flag plumbed through HTMLParser's constructor and create_for_scripting(). Only document.open()'s parser sets it to Yes. Document::close() step 3 now checks is_script_created() so it correctly skips parsers that weren't created via document.open(), matching the spec. Previously the check was just `if (!m_parser)`, which incorrectly let document.close() insert an EOF into a network-driven parser. The bug was mostly latent because the network parser used to finish quickly, but it matters once the network parser stays alive for the duration of a streamed parse.	2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik	c8368882b8	LibWeb: Allow tokenizer to run out of characters mid-tokenization Add can_run_out_of_characters() and use it in the NamedCharacterReference state and consume_next_if_match() so that an open input stream gets the same code-point-at-a-time treatment as an active document.write insertion point. Without this, a network chunk that ends partway through a named character reference or a multi-character match would make the tokenizer commit to a "no match" decision before the remaining bytes arrive. No behavior change for existing callers: the new helper still returns false once the input stream is closed (which the StringView constructor sets immediately).	2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik	c439f810f2	LibWeb: Track input stream closed state in HTMLTokenizer Add an explicit "input stream closed" flag plus the streaming-input API (append_to_input_stream, close_input_stream, is_input_stream_closed) to let a future incremental driver feed bytes as they arrive. Rewrite should_pause_before_next_input_character so the tokenizer pauses when the buffer is exhausted but more bytes may still arrive, including the case where a chunk ends in CR (CRLF normalization needs one code point of lookahead). Existing call sites are unaffected: the StringView constructor immediately marks the input stream closed, and insert_eof() now also closes the stream so document.close() drives the same exit path.	2026-04-29 04:12:44 +02:00
Aliaksandr Kalenik	b6ffd51d1c	LibWeb: Pause tokenizer at a CR right before the insertion point HTML newline normalization collapses CRLF into a single LF, so next_code_point() needs one code point of lookahead at a CR to decide whether the CR stands alone or is the first half of a CRLF pair. When the tokenizer is paused at the insertion point and the next code point to consume is a CR sitting one position before it, that lookahead has not been written yet. Previously the tokenizer consumed the CR and emitted it as LF, so a subsequent document.write() that began with LF surfaced as a second LF instead of being absorbed into the original CRLF pair. Stop one code point earlier in this case and wait for the next write to arrive. This makes four html5lib write_single WPT tests pass.	2026-04-27 21:44:56 +02:00
Aliaksandr Kalenik	c44c36416e	LibWeb: Preserve old insertion points across reentrant scripts The HTML parser's script end tag algorithms save the current insertion point in an "old insertion point" local before executing a script, then restore that local after script execution. Ladybird modeled that local as a single tokenizer field, so nested script execution via document.write() could overwrite the outer script's saved value. Keep a stack of old insertion points instead, and adjust saved offsets when document.write() inserts new input before them. This keeps the normal script and SVG script paths aligned with the spec text while leaving the parser-blocking script resume path to set the insertion point to undefined again.	2026-04-27 18:02:19 +02:00
Aliaksandr Kalenik	53fa1b19f1	LibWeb: Make external SVG script fetches async Replace the spin_until in SVGScriptElement::process_the_script_element with an async fetch that mirrors HTMLScriptElement's mark_as_ready pattern. External SVG scripts now fetch and execute asynchronously, matching Chromium's behavior. For HTML-embedded SVG scripts, the parser pauses via the existing schedule_resume_check infrastructure, extended to support SVG scripts through a new pending_parsing_blocking_svg_script slot on Document. For top-level XML/SVG documents, scripts execute when their fetch completes; the load event is delayed via DocumentLoadEventDelayer which the existing XMLDocumentBuilder::document_end already waits on.	2026-04-27 03:04:07 +02:00
Aliaksandr Kalenik	70ac025eff	LibWeb: Implement the speculative HTML parser When the HTML parser blocks on a synchronous external script, run a separate tokenizer over the unparsed input and issue speculative fetches for the resources it finds (script src, link rel=stylesheet\|preload, img src), with <base href> tracking and template/foreign-content skipping. Also fills in the previously-stubbed "consume a preloaded resource" algorithm and the document's "map of preloaded resources", so that <link rel="preload"> followed by a matching consumer deduplicates to a single fetch.	2026-04-26 18:48:29 +02:00
Aliaksandr Kalenik	b1ccab81ad	LibWeb: Replace spin_until in HTMLParser::handle_text with async resume Spinning a nested event loop to wait for a parser-blocking script blocks the calling thread, can deadlock, and creates reentrancy hazards. Switch to an event-driven pause/resume model, mirroring the prior HTMLParserEndState refactor (`df96b69e7a`). Three WPT document.write tests flip from Fail to Pass and are rebaselined: all write an external script via document.write() followed by inline content. With spin_until, control did not return to the caller of document.write() between writing the script and observing its effects so the test's order assertions saw a different sequence than the spec mandates.	2026-04-26 10:44:45 +02:00
Pavel Shliak	0e98fdccd5	LibWeb/HTML: Fix ruby parse error check for rp/rt	2026-04-22 15:30:41 +01:00
Shannon Booth	8642801889	LibWeb: Set fragment scripting mode from the context document This corresponds with the editorial change to the HTML standard introducing the parsing mode enum of: `01c45cede` And a follow up normative change of: `508706c80` Making fragment parsing derive its scripting mode from the context document.	2026-04-14 23:01:36 +02:00
Andreas Kling	88d4d1b1a6	LibWeb: Use VM helpers for execution context access Inline JS-to-JS frames no longer live in the raw execution context vector, so LibWeb callers that need to inspect or pop contexts now go through VM helpers instead of peeking into that storage directly. This keeps the execution context bookkeeping encapsulated while preserving existing microtask and realm-entry checks.	2026-04-13 18:29:43 +02:00
Aliaksandr Kalenik	df96b69e7a	LibWeb: Replace spin_until in HTMLParser::the_end() with state machine HTMLParser::the_end() had three spin_until calls that blocked the event loop: step 5 (deferred scripts), step 7 (ASAP scripts), and step 8 (load event delay). This replaces them with an HTMLParserEndState state machine that progresses asynchronously via callbacks. The state machine has three phases matching the three spin_until calls: - WaitingForDeferredScripts: loops executing ready deferred scripts - WaitingForASAPScripts: waits for ASAP script lists to empty - WaitingForLoadEventDelay: waits for nothing to delay the load event Notification triggers re-evaluate the state machine when conditions change: HTMLScriptElement::mark_as_ready, stylesheet unblocking in StyleElementBase/HTMLLinkElement, did_stop_being_active_document, and DocumentLoadEventDelayer decrements. NavigableContainer state changes (session history readiness, content navigable cleared, lazy load flag) also trigger re-evaluation of the load event delay check. Key design decisions and why: 1. Microtask checkpoint in schedule_progress_check(): The old spin_until called perform_a_microtask_checkpoint() before checking conditions. This is critical because HTMLImageElement::update_the_image_data step 8 queues a microtask that creates the DocumentLoadEventDelayer. Without the checkpoint, check_progress() would see zero delayers and complete before images start delaying the load event. 2. deferred_invoke in schedule_progress_check(): I tried Core::Timer (0ms), queue_global_task, and synchronous calls. Timers caused non-deterministic ordering with the HTML event loop's task processing timer, leading to image layout tests failing (wrong subtest pass/fail patterns). Synchronous calls fired too early during image load processing before dimensions were set, causing 0-height images in layout tests. queue_global_task had task ordering issues with the session history traversal queue. deferred_invoke runs after the current callback returns but within the same event loop pump, giving the right balance. 3. Navigation load event guard (m_navigation_load_event_guard): During cross-document navigation, finalize_a_cross_document_navigation step 2 calls set_delaying_load_events(false) before the session history traversal activates the new document. This creates a transient state where the parent's load event delay check sees the about:blank (which has ready_for_post_load_tasks=true) as the active document and completes prematurely.	2026-03-28 23:14:55 +01:00
Sam Atkins	ed6a5f25a0	LibWeb: Implement scoped custom element registries	2026-03-27 19:49:55 +00:00
Luke Wilde	0381c40cb4	LibWeb: Reset non-FACEs and don't associate them to a form during parse (FACE stands for form-associated custom element)	2026-03-25 13:18:15 +00:00
Jelle Raaijmakers	428a47cb7c	LibWeb: Reduce size of `Optional<HTMLToken>`	2026-03-20 12:03:36 +01:00
Tim Ledbetter	36f59a406e	LibWeb: Put HTML parser debug message behind a flag	2026-03-10 11:14:04 +01:00
Andreas Kling	e87f889e31	Everywhere: Abandon Swift adoption After making no progress on this for a very long time, let's acknowledge it's not going anywhere and remove it from the codebase.	2026-02-17 10:48:09 -05:00
Aliaksandr Kalenik	30e4779acb	AK+LibWeb: Reduce recompilation impact of DOM/Node.h Remove includes from Node.h that are only needed for forward declarations (AccessibilityTreeNode.h, XMLSerializer.h, JsonObjectSerializer.h). Extract StyleInvalidationReason and FragmentSerializationMode enums into standalone lightweight headers so downstream headers (CSSStyleSheet.h, CSSStyleProperties.h, HTMLParser.h) can include just the enum they need instead of all of Node.h. Replace Node.h with forward declarations in headers that only use Node by pointer/reference. This breaks the circular dependency between Node.h and AccessibilityTreeNode.h, reducing AccessibilityTreeNode.h's recompilation footprint from ~1399 to ~25 files.	2026-02-11 20:02:28 +01:00
Andreas Kling	9b987baf0e	LibWeb: Bail out of the_end() spin_untils for inactive documents When a document is navigated away from while HTMLParser::the_end() is spinning the event loop (steps 7 and 8), the spin_until stays on the call stack indefinitely, causing all subsequent event processing on the same event loop to happen within nested spin_until pumping. Add is_fully_active() checks to bail out early in this case.	2026-02-10 21:19:35 +01:00
Jelle Raaijmakers	ae20ecf857	AK+Everywhere: Add Vector::contains(predicate) and use it No functional changes.	2026-01-08 15:27:30 +00:00
Andreas Kling	a9cc425cde	LibJS+LibWeb: Add missing GC marking visits This adds visit_edges(Cell::Visitor&) methods to various helper structs that contain GC pointers, and makes sure they are called from owning GC-heap-allocated objects as needed. These were found by our Clang plugin after expanding its capabilities. The added rules will be enforced by CI going forward.	2026-01-07 12:48:58 +01:00
Feng Yu	b58fcaeecf	LibWeb: Add HTMLSelectedContentElement for customizable select Introduce the HTMLSelectedContentElement and integrate it into <select>, <option> and HTMLParser. See whatwg/html#10548. There are two bugs with WPT tests which causes the third subtest in selectedcontent.html and selectedcontent-mutations.html fail. See whatwg/html#11882, web-platform-tests/wpt#55849.	2025-12-12 12:06:24 +00:00

1 2 3 4

179 commits