If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().
_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58623)
(cherry picked from commit 6279eb8c07)
(cherry picked from commit a75953b347)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
gh-113602: Bail out when the parser tries to override existing errors (GH-113607)
(cherry picked from commit 9ed36d533a)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
(cherry picked from commit 48c49739f5)
(cherry picked from commit d58a5f453f)
Co-authored-by: Yilei Yang <yileiyang@google.com>
Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
gh-112388: Fix an error that was causing the parser to try to overwrite tokenizer errors (GH-112410)
(cherry picked from commit 2c8b191742)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
gh-111380: Show SyntaxWarnings only once when parsing if invalid syntax is encouintered (GH-111381)
(cherry picked from commit 3d2f1f0b83)
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
There are some warnings if build python via clang:
Parser/pegen.c:812:31: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
_PyPegen_clear_memo_statistics()
^
void
Parser/pegen.c:820:29: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
_PyPegen_get_memo_statistics()
^
void
Fix it to make clang happy.
(cherry picked from commit 7703def37e)
Signed-off-by: Chenxi Mao <chenxi.mao@suse.com>
Co-authored-by: Chenxi Mao <chenxi.mao@suse.com>
This makes tokenizer.c:valid_utf8 match stringlib/codecs.h:decode_utf8.
It also fixes an off-by-one error introduced in 3.10 for the line number when the tokenizer reports bad UTF8.
(cherry picked from commit 8bc356a7dd)
Co-authored-by: Michael Droettboom <mdboom@gmail.com>
Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.
This PR comes fresh from a pile of work done in our private PSRT security response team repo.
This backports https://github.com/python/cpython/pull/96499 aka 511ca94520
Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).
<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->
I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).
* gh-94949: Disallow parsing parenthesised ctx manager with old feature_version
* 📜🤖 Added by blurb_it.
* Allow it with feature_version=(3, 9) as well
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
(cherry picked from commit 0daba82221)
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
(cherry picked from commit ee70c70aa9)
Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>
Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>
Currently, calling Py_EnterRecursiveCall() and
Py_LeaveRecursiveCall() may use a function call or a static inline
function call, depending if the internal pycore_ceval.h header file
is included or not. Use a different name for the static inline
function to ensure that the static inline function is always used in
Python internals for best performance. Similar approach than
PyThreadState_GET() (function call) and _PyThreadState_GET() (static
inline function).
* Rename _Py_EnterRecursiveCall() to _Py_EnterRecursiveCallTstate()
* Rename _Py_LeaveRecursiveCall() to _Py_LeaveRecursiveCallTstate()
* pycore_ceval.h: Rename Py_EnterRecursiveCall() to
_Py_EnterRecursiveCall() and Py_LeaveRecursiveCall() and
_Py_LeaveRecursiveCall()
The warning emitted by the Python parser for a numeric literal
immediately followed by keyword has been changed from deprecation
warning to syntax warning.