Preserve non-UTF-8 filenames when appending to a ZipFile.
---------
(cherry picked from commit 24c6bbc92b)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
It no longer emits text for comments and processing instructions.
(cherry picked from commit 7de4fcd445)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
gh-150175: Fix ThreadingMock call_count race condition (GH-150176)
ThreadingMock._increment_mock_call() was not thread-safe.
Multiple threads calling the mock simultaneously could lose
increments due to race conditions on call_count and other
attributes.
Fix by overriding _increment_mock_call in ThreadingMixin
and wrapping it with the existing _mock_calls_events_lock.
(cherry picked from commit 388e023fe1)
Co-authored-by: saisneha196 <156835592+saisneha196@users.noreply.github.com>
This has not been observed in practice, but we cannot be 100% sure that
it will not happen with some weird gzip data.
(cherry picked from commit 28eac9a726)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Some of these docstrings read as if they were written when typing.py was
first written, and things have evolved since then.
A few motivations:
- Call protocols protocols instead of ABCs. They are also ABCs, but the fact
they are protocols is more relevant to typing.
- Avoid recommending direct use of .__annotations__ and steer users to
annotationlib instead.
- For TypedDict, mention NotRequired before total=False since it is more
general and probably more frequently useful.
- For overloads, mention runtime use first instead of stub use. I think early on
there was talk of allowing overload only in stubs, but it is now heavily used at
runtime too and that's more likely to be relevant to users.
(cherry picked from commit f159419ae2)
gh-142831: Fix use-after-free in json encoder during re-entrant mutation (gh-142851)
User callbacks invoked during JSON encoding (e.g. the `default` callback or
a custom string encoder) can mutate or clear the dict or sequence being
encoded, invalidating borrowed references to items, keys, and values. Hold
strong references unconditionally while iterating.
(cherry picked from commit 235fa7244a)
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
RFC 2047 Section 6.2 requires that "any 'linear-white-space' that
separates a pair of adjacent 'encoded-word's is ignored." The modern
header value parser correctly implements that for unstructured headers,
but had missed a case in structured headers. This could cause a parsed
address header to include extraneous spaces in a display-name.
Switch to @bitdancer's fix from review feedback. Recharacterize space
between ews as fws after parsing in get_phrase.
RDM: This fix is dependent on the fact that "subsequent" atoms will never have
leading whitespace because that's been consumed already. I don't think
it's worth adding extra code for the possibility of leading whitespace
because the parser won't produce it. It's a bit of parser fragility in the
face of code changes, but I think that's a minor concern given the
parser design (which is that it consumes whitespace greedily)
(cherry picked from commit 7a4c6dfb88)
Co-authored-by: Mike Edmunds <medmunds@gmail.com>
Co-authored-by: R David Murray <rdmurray@bitdance.com>
gh-87451: Apply CVE-2021-4189 PASV fix to ftplib.ftpcp() (GH-149648)
ftpcp() called parse227() directly and passed the source server's
self-reported PASV IPv4 address to the target server's PORT command,
bypassing the CVE-2021-4189 fix that was applied only to FTP.makepasv().
A malicious source FTP server could use this to redirect the target
server's data connection to an arbitrary host:port (SSRF).
ftpcp() now uses the source server's actual peer address, honoring the
existing trust_server_pasv_ipv4_address opt-out, the same as makepasv().
Thanks to Qi Ding at Aurascape AI for the report. (GHSA-w8c5-q2xf-gf7c)
(cherry picked from commit eac4fe3b2c)
Co-authored-by: Gregory P. Smith <68491+gpshead@users.noreply.github.com>
gh-149776: Skip UDP Lite tests if it's not supported (GH-149777)
Fix test_socket on Linux kernel 7.1 and newer: skip UDP Lite tests if
it's not supported.
(cherry picked from commit 3cfc249e11)
Co-authored-by: Victor Stinner <vstinner@python.org>
Exclude encodings like 'utf-8-sig', 'iso2022-jp' and 'hz' from the list of
supported encodings.
(cherry picked from commit fa2afa64d9)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* gh-112821: Fix rlcompleter failures on objects with descriptors (GH-149577)
* gh-112821: Fix rlcompleter failures on objects with descriptors
* Confirm no accesses
(cherry picked from commit f23a1837d7)
Co-authored-by: Michael Droettboom <mdboom@gmail.com>
* Add missing import
---------
Co-authored-by: Michael Droettboom <mdboom@gmail.com>
Co-authored-by: Michael Droettboom <mdroettboom@nvidia.com>
* gh-149486: tarfile.data_filter: validate written link target (GH-149487)
The data filter rewrote linknames with normpath() but ran the
containment check against the un-normalised value, and computed a
symlink's directory before stripping trailing slashes. Both let a
crafted archive create links pointing outside the destination. Also
reject link members that resolve to the destination directory itself,
which could otherwise replace it with a symlink and redirect all
subsequent members.
(Patch by Greg; Petr's just reviewing & merging.)
(cherry picked from commit 578411982c)
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Also, use urllib.request.urlcleanup() in NetworkTestCase.
(cherry picked from commit 57ef219950)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
gh-149096: Remove 'im_*' attribute reference from inspect module docstring (GH-149108)
The im_class/func/self names were removed in 3.0. The prefix appears nowhere else in inspect.py
and nowhere in inspect.rst.
(cherry picked from commit e4444538dc)
Co-authored-by: Vineet Kumar <108144301+whyvineet@users.noreply.github.com>
[3.13] GH-130750: Restore quoting of choices in argparse error messages to match documentation and improve clarity (GH-144983)
(cherry picked from commit 53a7f76501)
* empty lines are always ignored instead of separating groups
* the "user-agent" line after a rule starts a new group
* groups matching the same user agent are now merged
* the rule with the longest match wins instead of the first matching rule
* in case of equal matches, the “Allow” rule wins over “Disallow”
* special characters “$” and “*” are now supported in rules
* prefer full match for user agent
(cherry picked from commit bc285e5832)
Instead of reading past the end of the empty buffer.
(cherry picked from commit 0c6d2f64c0)
Co-authored-by: Maurycy Pawłowski-Wieroński <maurycy@maurycy.com>
Previously, identical PickleBuffers did not preserve identity.
Also, empty writable PickleBuffer memoized an empty bytearray object
in place of b'' which is a singleton in CPython, so the following
references to b'' were unpickled as an empty bytearray object.
(cherry picked from commit b89735625d)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
gh-149117: Set `ImportError.name` on errors from `runpy.run_module`/`run_path` (gh-149159)
Set ImportError.name on errors from runpy.run_module/run_path
`runpy.run_module()` and `runpy.run_path()` now set the `name` attribute
of the `ImportError` they raise to the requested module name, matching
the behaviour of a regular import statement (previously `name` was
always `None`, which broke introspection).
The `name=` kwarg is gated on `issubclass(error, ImportError)` because
`_get_module_details()` is also used by `_run_module_as_main()` with
a private `_Error` sentinel class. `_Error` does not subclass
ImportError, and `BaseException.__init__` rejects unknown kwargs at
the C level, so passing `name=` unconditionally would break the
`python -m foo` codepath.
(cherry picked from commit ff35fe4633)
Co-authored-by: W. H. Wang <mattwang44@gmail.com>
As part of fixing bpo-27931 code was introduced to get_bare_quoted_string
that added an empty Terminal if the quoted string was empty. This isn't
the best answer in terms of the parse tree; we really want the token
list to be empty in that case. But having it be empty resulted in
local_part raising the index error. We find that same problem if we
try to parse an address consisting of a single dquote. By fixing
local_part to not raise on an empty token list, we can have the
bare_quoted_string code correctly return an empty token list for
the empty string cases (two dquotes or a single dquote as the
entire addrespec, at the end of a line).
(cherry picked from commit bdbb55c403)
Co-authored-by: R. David Murray <rdmurray@bitdance.com>
When an address in an address-list has garbage at the end, the code will
currently:
1. change the mailbox in the last parsed address into invalid-mailbox by
overriding its token_type;
2. wrap the trailing garbage into another invalid-mailbox and append it
to the last parsed address.
However, that does not take into account that an address may
also contain a Group instead of a single mailbox. In that case,
overwriting token_type leads to undesirable results, e.g. parsing an
email with the following 'To' header:
unlisted-recipients:; (no To-header on input)
raises an AttributeError from trying to treat the Group as a Mailbox.
Moreover it is questionable whether the previously parsed mailbox should
be treated as invalid in addition to the trailing garbage.
Address both of the above by wrapping the trailing garbage in a new
Address with a single invalid-mailbox, and append it to the AddressList
directly.
Changes the results of the
test_get_address_list_mailboxes_invalid_addresses test, where the
address list is now parsed into 4 mailboxes instead of 3 (all but the
first one are invalid).
(cherry picked from commit b413bc7a1f)
Co-authored-by: elenril <anton@khirnov.net>