gh-140594: Fix an out of bounds read when feeding NUL byte to PyOS_StdioReadline() (GH-140910)
(cherry picked from commit 86a0756234)
Co-authored-by: Shamil <ashm.tech@proton.me>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
gh-144759: Fix undefined behavior from NULL pointer arithmetic in lexer (GH-144788)
Guard against NULL pointer arithmetic in `_PyLexer_remember_fstring_buffers`
and `_PyLexer_restore_fstring_buffers`. When `start` or `multi_line_start`
are NULL (uninitialized in tok_mode_stack[0]), performing `NULL - tok->buf`
is undefined behavior. Add explicit NULL checks to store -1 as sentinel
and restore NULL accordingly.
Add test_lexer_buffer_realloc_with_null_start to test_repl.py that
exercises the code path where the lexer buffer is reallocated while
tok_mode_stack[0] has NULL start/multi_line_start pointers. This
triggers _PyLexer_remember_fstring_buffers and verifies the NULL
checks prevent undefined behavior.
(cherry picked from commit e6110efd03)
Co-authored-by: Ramin Farajpour Cami <ramin.blackhat@gmail.com>
gh-144169: Fix three crashes in AST objects with non-str kwargs (GH-144178)
(cherry picked from commit 639c1ad4f1)
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
gh-140576: Fixed crash produced by lexer in case of dedented zero byte (GH-140583)
(cherry picked from commit 8706167474)
Co-authored-by: Mikhail Efimov <efimov.mikhail@gmail.com>
* Support non-UTF-8 shebang and comments if non-UTF-8 encoding is specified.
* Detect decoding error in comments for UTF-8 encoding.
* Include the decoding error position for default encoding in SyntaxError.
(cherry picked from commit 5c942f11cd)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
gh-139516: Fix lambda colon start format spec in f-string in tokenizer (GH-139657)
(cherry picked from commit 539461d9ec)
Co-authored-by: Tomasz Pytel <tompytel@gmail.com>
gh-137314: Fix incorrect treatment of format specs in raw fstrings (GH-137328)
(cherry picked from commit 0153d82a5a)
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
gh-130077: Properly match full soft keywords in the parser (GH-135317)
(cherry picked from commit ff2b5f40c2)
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
gh-133968: Add PyUnicodeWriter_WriteASCII() function (#133973)
Replace most PyUnicodeWriter_WriteUTF8() calls with
PyUnicodeWriter_WriteASCII().
(cherry picked from commit f49a07b531)
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().
_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58623)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
In the `ast` documentation for Python:
* https://docs.python.org/3/library/ast.html#ast.Dict
it is made clear that:
> When doing dictionary unpacking using dictionary literals the expression to be expanded goes in the values list, with a `None` at the corresponding position in `keys`.
Hence, `keys` is really a `expr?*` and *not* a `expr*`.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>