LibWeb: Skip right amount of characters during encoding detection

When detecting an element's opening tag, the spec asks us to skip ahead
to the first whitespace or end chevron character before trying to read
attributes. Instead, we were always skipping 2 positions ahead and then
ignoring all whitespace characters and slashes, which was clearly wrong.

Theoretically this could have caused some weird behaviors if part of the
opening tag matched an expected attribute name, but it's very unlikely
to see that in the wild.
This commit is contained in:
Jelle Raaijmakers 2025-11-21 11:30:57 +01:00 committed by Jelle Raaijmakers
parent 4bcf988e46
commit f52632d48a
Notes: github-actions[bot] 2025-11-21 16:44:15 +00:00
2 changed files with 22 additions and 17 deletions

View file

@ -13,9 +13,6 @@
namespace Web::HTML {
bool prescan_should_abort(ByteBuffer const& input, size_t const& position);
bool prescan_is_whitespace_or_slash(u8 const& byte);
bool prescan_skip_whitespace_and_slashes(ByteBuffer const& input, size_t& position);
Optional<StringView> extract_character_encoding_from_meta_element(ByteString const&);
GC::Ptr<DOM::Attr> prescan_get_attribute(DOM::Document&, ByteBuffer const& input, size_t& position);
Optional<ByteString> run_prescan_byte_stream_algorithm(DOM::Document&, ByteBuffer const& input);