mirror of
https://github.com/python/cpython.git
synced 2025-12-31 12:33:28 +00:00
[3.14] Correctly fold unknown-8bit originating from encoded words. (GH-142517) (#143146)
The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that. However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that were *successfully* decoded. The fix is
simple: do the unknown-8bit encoding using the utf-8 codec. This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.
(cherry picked from commit 1e17ccd030)
Co-authored-by: R. David Murray <rdmurray@bitdance.com>
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
This commit is contained in:
parent
b921374c78
commit
6809811036
3 changed files with 13 additions and 1 deletions
|
|
@ -219,7 +219,7 @@ def encode(string, charset='utf-8', encoding=None, lang=''):
|
|||
|
||||
"""
|
||||
if charset == 'unknown-8bit':
|
||||
bstring = string.encode('ascii', 'surrogateescape')
|
||||
bstring = string.encode('utf-8', 'surrogateescape')
|
||||
else:
|
||||
bstring = string.encode(charset)
|
||||
if encoding is None:
|
||||
|
|
|
|||
|
|
@ -3340,5 +3340,13 @@ def test_fold_unfoldable_element_stealing_whitespace(self):
|
|||
token = parser.get_address_list(text)[0]
|
||||
self._test(token, expected, policy=policy)
|
||||
|
||||
def test_encoded_word_with_undecodable_bytes(self):
|
||||
self._test(parser.get_address_list(
|
||||
' =?utf-8?Q?=E5=AE=A2=E6=88=B6=E6=AD=A3=E8=A6=8F=E4=BA=A4=E7?='
|
||||
)[0],
|
||||
' =?unknown-8bit?b?5a6i5oi25q2j6KaP5Lqk5w==?=\n',
|
||||
)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
||||
|
|
|
|||
|
|
@ -0,0 +1,4 @@
|
|||
The non-``compat32`` :mod:`email` policies now correctly handle refolding
|
||||
encoded words that contain bytes that can not be decoded in their specified
|
||||
character set. Previously this resulted in an encoding exception during
|
||||
folding.
|
||||
Loading…
Add table
Add a link
Reference in a new issue