[3.13] Correctly fold unknown-8bit originating from encoded words. (GH-142517) (#143147)

The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that.  However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that were *successfully* decoded.  The fix is
simple: do the unknown-8bit encoding using the utf-8 codec.  This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.
(cherry picked from commit 1e17ccd030)

Co-authored-by: R. David Murray <rdmurray@bitdance.com>
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
This commit is contained in:
Miss Islington (bot) 2025-12-24 19:19:28 +01:00 committed by GitHub
parent 86504f26bd
commit 88025560aa
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 13 additions and 1 deletions

View file

@ -0,0 +1,4 @@
The non-``compat32`` :mod:`email` policies now correctly handle refolding
encoded words that contain bytes that can not be decoded in their specified
character set. Previously this resulted in an encoding exception during
folding.