gh-86155: Fix data loss after unclosed script or style tag in HTMLParser (GH-22658)

When calling .close() the HTMLParser should flush all remaining content,
even when that content is in an unclosed script or style tag.
This commit is contained in:
Waylan Limberg 2025-05-10 13:36:06 -04:00 committed by GitHub
parent 7dddb4e667
commit 53383e90e4
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 13 additions and 1 deletions

View file

@ -317,6 +317,16 @@ def get_events(self):
("endtag", element_lower)],
collector=Collector(convert_charrefs=False))
def test_EOF_in_cdata(self):
content = """<!-- not a comment --> &not-an-entity-ref;
<a href="" /> </p><p> <span></span></style>
'</script' + '>'"""
s = f'<script>{content}'
self._run_check(s, [
("starttag", 'script', []),
("data", content)
])
def test_comments(self):
html = ("<!-- I'm a valid comment -->"
'<!--me too!-->'