cpython/Misc/NEWS.d/next/Security/2025-06-25-14-13-39.gh-issue-135661.idjQ0B.rst
Miss Islington (bot) ad695f5328
[3.12] gh-135661: Fix parsing attributes with whitespaces around the "=" separator in HTMLParser (GH-136908) (GH-136919)
This fixes a regression introduced in GH-135930.
(cherry picked from commit dee6501894)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2025-07-22 11:56:39 +02:00

20 lines
899 B
ReStructuredText

Fix parsing start and end tags in :class:`html.parser.HTMLParser`
according to the HTML5 standard.
* Whitespaces no longer accepted between ``</`` and the tag name.
E.g. ``</ script>`` does not end the script section.
* Vertical tabulation (``\v``) and non-ASCII whitespaces no longer recognized
as whitespaces. The only whitespaces are ``\t\n\r\f`` and space.
* Null character (U+0000) no longer ends the tag name.
* Attributes and slashes after the tag name in end tags are now ignored,
instead of terminating after the first ``>`` in quoted attribute value.
E.g. ``</script/foo=">"/>``.
* Multiple slashes and whitespaces between the last attribute and closing ``>``
are now ignored in both start and end tags. E.g. ``<a foo=bar/ //>``.
* Multiple ``=`` between attribute name and value are no longer collapsed.
E.g. ``<a foo==bar>`` produces attribute "foo" with value "=bar".