Guard against NULL pointer arithmetic in `_PyLexer_remember_fstring_buffers`
and `_PyLexer_restore_fstring_buffers`. When `start` or `multi_line_start`
are NULL (uninitialized in tok_mode_stack[0]), performing `NULL - tok->buf`
is undefined behavior. Add explicit NULL checks to store -1 as sentinel
and restore NULL accordingly.
Add test_lexer_buffer_realloc_with_null_start to test_repl.py that
exercises the code path where the lexer buffer is reallocated while
tok_mode_stack[0] has NULL start/multi_line_start pointers. This
triggers _PyLexer_remember_fstring_buffers and verifies the NULL
checks prevent undefined behavior.
Many functions related to compiling or parsing Python code, such as
compile(), ast.parse(), symtable.symtable(),
and importlib.abc.InspectLoader.source_to_code() now allow to pass
the module name used when filtering syntax warnings.
Use PyBytesWriter in action_helpers.c _build_concatenated_bytes().
3x faster bytes concat in the parser.
Co-authored-by: Victor Stinner <vstinner@python.org>
* Support non-UTF-8 shebang and comments if non-UTF-8 encoding is specified.
* Detect decoding error in comments for UTF-8 encoding.
* Include the decoding error position for default encoding in SyntaxError.
Replace most PyUnicodeWriter_WriteUTF8() calls with
PyUnicodeWriter_WriteASCII().
Unrelated change to please the linter: remove an unused
import in test_ctypes.
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().
_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
In the `ast` documentation for Python:
* https://docs.python.org/3/library/ast.html#ast.Dict
it is made clear that:
> When doing dictionary unpacking using dictionary literals the expression to be expanded goes in the values list, with a `None` at the corresponding position in `keys`.
Hence, `keys` is really a `expr?*` and *not* a `expr*`.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>