LibWeb: Skip right amount of characters during encoding detection

When detecting an element's opening tag, the spec asks us to skip ahead to the first whitespace or end chevron character before trying to read attributes. Instead, we were always skipping 2 positions ahead and then ignoring all whitespace characters and slashes, which was clearly wrong. Theoretically this could have caused some weird behaviors if part of the opening tag matched an expected attribute name, but it's very unlikely to see that in the wild.
Author: https://github.com/gmta Commit: f52632d48a Pull-request: https://github.com/LadybirdBrowser/ladybird/pull/6893
2025-12-08 06:09:58 +00:00 · 2025-11-21 11:30:57 +01:00 · 2025-11-21 11:30:57 +01:00 · f52632d48a · 2025-11-21 16:44:15 +00:00
commit f52632d48a
parent 4bcf988e46
2 changed files with 22 additions and 17 deletions
--- a/Libraries/LibWeb/HTML/Parser/HTMLEncodingDetection.h
+++ b/Libraries/LibWeb/HTML/Parser/HTMLEncodingDetection.h
@ -13,9 +13,6 @@

 namespace Web::HTML {

-bool prescan_should_abort(ByteBuffer const& input, size_t const& position);
-bool prescan_is_whitespace_or_slash(u8 const& byte);
-bool prescan_skip_whitespace_and_slashes(ByteBuffer const& input, size_t& position);
 Optional<StringView> extract_character_encoding_from_meta_element(ByteString const&);
 GC::Ptr<DOM::Attr> prescan_get_attribute(DOM::Document&, ByteBuffer const& input, size_t& position);
 Optional<ByteString> run_prescan_byte_stream_algorithm(DOM::Document&, ByteBuffer const& input);