This makes tokenizer.c:valid_utf8 match stringlib/codecs.h:decode_utf8.
It also fixes an off-by-one error introduced in 3.10 for the line number when the tokenizer reports bad UTF8.
(cherry picked from commit 8bc356a7dd)
Co-authored-by: Michael Droettboom <mdboom@gmail.com>
There were two specific areas not covered:
- %(name) syntax
- %*s syntax
Automerge-Triggered-By: GH:iritkatriel
(cherry picked from commit dde15f5879)
Co-authored-by: Michael Droettboom <mdboom@gmail.com>
This doesn't happen naturally, but is allowed by the ASDL and compiler.
We don't want to change ASDL for backward compatibility reasons
(GH-57645, GH-92987)
(cherry picked from commit 200c9a8da0)
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)
The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.
The justification for the current check. The C code check is:
```c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10
```
In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$
From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.
<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->
Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
(cherry picked from commit b126196838)
Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.
This PR comes fresh from a pile of work done in our private PSRT security response team repo.
This backports https://github.com/python/cpython/pull/96499 aka 511ca94520
Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).
<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->
I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).
The builtin `property` is not a callable, so was failing the check in
`_test_simple_enum` causing a match failure; this adds `property` to the
bypass list.
Co-authored-by: Alexandru Mărășteanu <alexei@users.noreply.github.com>
find_unused_port() has an inherent race condition, but we can't use
bind_port() as that uses .getsockname() which this test is exercising.
Try binding to unused ports a few times before failing.
Signed-off-by: Ross Burton <ross.burton@arm.com>
(cherry picked from commit df11012697)
Co-authored-by: Ross Burton <ross.burton@arm.com>
Automerge-Triggered-By: GH:tiran
(cherry picked from commit 822955c166)
Co-authored-by: Christian Heimes <christian@python.org>
Co-authored-by: Christian Heimes <christian@python.org>
When a task catches CancelledError and raises some other error,
the other error should not silently be suppressed.
Any scenario where a task crashes in cleanup upon cancellation
will now result in an ExceptionGroup wrapping the crash(es)
instead of propagating CancelledError and ignoring the side errors.
NOTE: This represents a change in behavior (hence the need to
change several tests). But it is only an edge case.
Co-authored-by: Thomas Grainger <tagrain@gmail.com>
(cherry picked from commit f51f54f39d)
Co-authored-by: Guido van Rossum <guido@python.org>
This PR fixes the error message from float(s) in the case where s contains only whitespace.
(cherry picked from commit 97e9cfa75a)
Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
Under certain build conditions, test_check_c_globals fails. This fix takes the same approach as we took for gh-84236 (via gh-20095). We'll be removing use of distutils in the c-analyzer at some point. Until then we'll hide the warning filter.
(cherry picked from commit 3ff6d9affb)
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
`dataclass` was called as a function when it was almost certainly intended to be a decorator.
(cherry picked from commit 59e09efe88)
Co-authored-by: da-woods <dw-git@d-woods.co.uk>
Co-authored-by: da-woods <dw-git@d-woods.co.uk>
It updates links which redirect to HTTPS with different authority or
path.
(cherry picked from commit d0d0154443)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* Add test for inheriting explicit __dict__ and weakref.
* Restore 3.10 behavior for multiple inheritance of C extension classes that store their dictionary at the end of the struct.
Adds a regression test for an re slowdown observed by rjsmin.
Uses multiprocessing to kill the test after SHORT_TIMEOUT.
Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
Co-authored-by: Christian Heimes <christian@python.org>
(cherry picked from commit fe23c0061d)
Co-authored-by: Miro Hrončok <miro@hroncok.cz>
The API documentation for [findtext](https://docs.python.org/3/library/xml.etree.elementtree.htmlGH-xml.etree.ElementTree.Element.findtext) states that this function gives back an empty string on "no text content." With the previous implementation, this would give back a empty string even on text content values such as 0 or False. This patch attempts to resolve that by only giving back an empty string if the text attribute is set to `None`. Resolves GH-91447.
Automerge-Triggered-By: GH:gvanrossum
(cherry picked from commit a95e60db74)
Co-authored-by: Eugene Triguba <eugenetriguba@gmail.com>