the lazy imports PEP initial implementation (3.15 alpha) inadvertently incremented the length of the sys.flags tuple. In a way that did not do anything useful or related to the lazy imports setting (it exposed sys.flags.gil in the tuple). This fixes that to hard code the length to the 3.13 & 3.14 released length of 18 and have our tests and code comments make it clear that we've since stopped making new sys.flags attributes available via sequence index.
* Convert DEOPT_IFs to EXIT_IFs for guards. Keep DEOPT_IF for intentional drops to the interpreter.
* Modify BINARY_OP_SUBSCR_LIST_INT and STORE_SUBSCR_LIST_INT to handle negative indices, to keep EXIT_IFs and DEOPT_IFs in different uops
Cache one datachunk per tstate to prevent alloc/dealloc thrashing when repeatedly hitting the same call depth at exactly the wrong boundary.
---------
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
* Drop DOUBLE_IS_ARM_MIXED_ENDIAN_IEEE754 macro.
* Use DOUBLE_IS_BIG/LITTLE_ENDIAN_IEEE754 to detect endianness of
float/doubles.
* Drop "unknown_format" code path in PyFloat_Pack/Unpack*().
Co-authored-by: Victor Stinner <vstinner@python.org>
This undoes a change made as a part of PR 137470, for compatibility with EMSDK
4.0.19. It adds `emscripten_trampoline` field in `pycore_runtime_structs.h`
and initializes it from JS initialization code with the wasm-gc based trampoline
if possible. Otherwise we fall back to the JS trampoline.
Add SIMD optimization for `bytes.hex()`, `bytearray.hex()`, and `binascii.hexlify()` as well as `hashlib` `.hexdigest()` methods using platform-agnostic GCC/Clang vector extensions that compile to native SIMD instructions on our [PEP-11 Tier 1 Linux and macOS](https://peps.python.org/pep-0011/#tier-1) platforms.
- 1.1-3x faster for common small data (16-64 bytes, covering md5 through sha512 digest sizes)
- Up to 11x faster for large data (1KB+)
- Retains the existing scalar code for short inputs (<16 bytes) or platforms lacking SIMD instructions, no observable performance regressions there.
## Supported platforms:
- x86-64: the compiler generates SSE2 - always available, no flags or CPU feature checks needed
- ARM64: NEON is always available, always available, no flags or CPU feature checks needed
- ARM32: Requires NEON support and that appropriate compiler flags enable that (e.g., `-march=native` on a Raspberry Pi 3+) - while we _could_ use runtime detection to allow neon when compiled without a recent enough `-march=` flag (`cortex-a53` and later IIRC), there are diminishing returns in doing so. Anyone using 32-bit ARM in a situation where performance matters will already be compiling with such flags. (as opposed to 32-bit Raspbian compilation that defaults to aiming primarily for compatibility with rpi1&0 armv6 arch=armhf which lacks neon)
- Windows/MSVC: Not supported. MSVC lacks `__builtin_shufflevector`, so the existing scalar path is used. Leaving it as an opportunity for the future for someone to figure out how to express the intent to that compiler.
This is compile time detection of features that are always available on the target architectures. No need for runtime feature inspection.
Deprecate PEP-456 [1] support for providing an external definition
of the string hashing scheme. Removal is scheduled for Python 3.19.
Previously, embedders could define the ``Py_HASH_ALGORITHM`` macro to be
``Py_HASH_EXTERNAL`` [2] to indicate that the hashing scheme was provided
externally but this feature was undocumented, untested and most likely
unused.
[1]: https://peps.python.org/pep-0456/
[2]: https://peps.python.org/pep-0456/#hash-function-selection
Align the QSBR thread state array to a 64-byte cache line boundary
and add padding at the end of _PyThreadStateImpl. Depending on heap
layout, the QSBR array could end up sharing a cache line with a
thread's tlbc_index, causing QSBR quiescent state updates to contend
with reads of tlbc_index in RESUME_CHECK. This is sensitive to
earlier allocations during interpreter init and can appear or
disappear with seemingly unrelated changes.
Either change alone is sufficient to fix the specific issue, but both
are worthwhile to avoid similar problems in the future.
Add TYPE_FROZENDICT to the marshal module.
Add C API functions:
* PyAnyDict_Check()
* PyAnyDict_CheckExact()
* PyFrozenDict_Check()
* PyFrozenDict_CheckExact()
* PyFrozenDict_New()
Add PyFrozenDict_Type C type.
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Adam Johnson <me@adamj.eu>
Co-authored-by: Benedikt Johannes <benedikt.johannes.hofer@gmail.com>