bpo-43323: Fix UnicodeEncodeError in the email module (GH-32137)

It was raised if the charset itself contains characters not encodable
in UTF-8 (in particular \udcxx characters representing non-decodable
bytes in the source).
(cherry picked from commit e91dee87ed)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
This commit is contained in:
Miss Islington (bot) 2022-04-30 05:31:28 -07:00 committed by GitHub
parent 7149b21c2e
commit 19a079690c
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
6 changed files with 36 additions and 6 deletions

View file

@ -5323,6 +5323,15 @@ def test_rfc2231_unknown_encoding(self):
Content-Transfer-Encoding: 8bit
Content-Disposition: inline; filename*=X-UNKNOWN''myfile.txt
"""
msg = email.message_from_string(m)
self.assertEqual(msg.get_filename(), 'myfile.txt')
def test_rfc2231_bad_character_in_encoding(self):
m = """\
Content-Transfer-Encoding: 8bit
Content-Disposition: inline; filename*=utf-8\udce2\udc80\udc9d''myfile.txt
"""
msg = email.message_from_string(m)
self.assertEqual(msg.get_filename(), 'myfile.txt')