Rename from _Py_INTERNAL_ABI_SLOT to _Py_ABI_SLOT
and define the macro using _PyABIInfo_DEFAULT.
Use the ABI slot in stdlib extension modules to enable running
a check of ABI version compatibility.
_tkinter, _tracemalloc and readline don't use the slots, hence they need
explicit handling.
Co-authored-by: Victor Stinner <vstinner@python.org>
Base64 decoder (see binascii.a2b_base64(), base64.b64decode(), etc)
no longer ignores excess data after the first padded quad in non-strict
(default) mode. Instead, in conformance with RFC 4648, it ignores the
pad character, "=", if it is present before the end of the encoded data.
Add base32 encoder and decoder functions implemented in
C to the binascii module and use them to greatly improve the
performance and reduce the memory usage of the existing
base32 codec functions in the base64 module.
* Add the alphabet parameter in functions b2a_base64(), a2b_base64(),
b2a_base85(), and a2b_base85().
* And a number of "*_ALPHABET" constants.
* Remove b2a_z85() and a2b_z85().
* Add the c_init_default attribute which is used to initialize the C variable
if the default is not explicitly provided.
* Add the c_default_init() method which is used to derive c_default from
default if c_default is not explicitly provided.
* Explicit c_default and py_default are now almost always have precedence
over the generated value.
* Add support for bytes literals as default values.
* Improve support for str literals as default values (support non-ASCII
and non-printable characters and special characters like backslash or quotes).
* Fix support for str and bytes literals containing trigraphs, "/*" and "*/".
* Improve support for default values in converters "char" and "int(accept={str})".
* Converter "int(accept={str})" now requires 1-character string instead of
integer as default value.
* Add support for non-None default values in converter "Py_buffer": NULL,
str and bytes literals.
* Improve error handling for invalid default values.
* Rename Null to NullType for consistency.
It could happen if the size of the input is more than 4/5 of sys.maxsize
(only feasible on 32-bit platforms).
Also simplify the integer overflow checks in the Base64 encoder, and
harmonize them with the code for Ascii85 and Base85.
Add Ascii85, Base85, and Z85 encoders and decoders to binascii,
replacing the existing pure Python implementations in base64.
This makes the codecs two orders of magnitude faster and consume
two orders of magnitude less memory.
Note that attempting to decode Ascii85 or Base85 data of length 1 mod 5
(after accounting for Ascii85 quirks) now produces an error, as no
encoder would emit such data. This should be the only significant
externally visible difference compared to the old implementation.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Optimize base64 encoding/decoding by eliminating loop-carried dependencies. Key changes:
- Add `base64_encode_trio()` and `base64_decode_quad()` helper functions that process complete groups independently
- Add `base64_encode_fast()` and `base64_decode_fast()` wrappers
- Update `b2a_base64` and `a2b_base64` to use fast path for complete groups
Performance gains (encode/decode speedup vs main, PGO builds):
```
64 bytes 64K 1M
Zen2: 1.2x/1.8x 1.7x/2.8x 1.5x/2.8x
Zen4: 1.2x/1.7x 1.6x/3.0x 1.5x/3.0x [old data, likely faster]
M4: 1.3x/1.9x 2.3x/2.8x 2.4x/2.9x [old data, likely faster]
RPi5-32: 1.2x/1.2x 2.4x/2.4x 2.0x/2.1x
```
Based on my exploratory work done in https://github.com/python/cpython/compare/main...gpshead:cpython:claude/vectorize-base64-c-S7Hku
See PR and issue for further thoughts on sometimes MUCH faster SIMD vectorized versions of this.
Fix an edge case in `binascii.a2b_base64` strict mode, where
excessive padding was not detected when no padding is necessary.
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
This PR adds the ability to enable the GIL if it was disabled at
interpreter startup, and modifies the multi-phase module initialization
path to enable the GIL when loading a module, unless that module's spec
includes a slot indicating it can run safely without the GIL.
PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went
with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148.
A warning will be issued up to once per interpreter for the first
GIL-using module that is loaded. If `-v` is given, a shorter message
will be printed to stderr every time a GIL-using module is loaded
(including the first one that issues a warning).
Work around a macOS bug, limit zlib crc32 calls to 1GiB.
Without this, `zlib.crc32` and `binascii.crc32` could produce incorrect
results on multi-gigabyte inputs depending on the macOS version's Apple
supplied zlib implementation.
Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules. We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204).
builtins and extension module functions and methods that expect boolean values for parameters now accept any Python object rather than just a bool or int type. This is more consistent with how native Python code itself behaves.
When compiled with `USE_ZLIB_CRC32` defined (`configure` sets this on POSIX systems), `binascii.crc32(...)` failed to compute the correct value when the input data was >= 4GiB. Because the zlib crc32 API is limited to a 32-bit length.
This lines it up with the `zlib.crc32(...)` implementation that doesn't have that flaw.
**Performance:** This also adopts the same GIL releasing for larger inputs logic that `zlib.crc32` has, and causes the Windows build to always use zlib's crc32 instead of our slow C code as zlib is a required build dependency on Windows.
setup.py no longer defines Py_BUILD_CORE_MODULE. Instead every
module defines the macro before #include "Python.h" unless
Py_BUILD_CORE_BUILTIN is already defined.
Py_BUILD_CORE_BUILTIN is defined for every module that is built by
Modules/Setup.
The PR also simplifies Modules/Setup. Makefile and makesetup
already define Py_BUILD_CORE_BUILTIN and include Modules/internal
for us.
Signed-off-by: Christian Heimes <christian@python.org>
Move Include/longobject.h non-limited API to a new
Include/cpython/longobject.h header file.
Move the following definitions to the internal C API:
* _PyLong_DigitValue
* _PyLong_FormatAdvancedWriter()
* _PyLong_FormatWriter()
Move Include/pystrhex.h to Include/internal/pycore_strhex.h.
The header file only contains private functions.
The following C extensions are now built with Py_BUILD_CORE_MODULE
macro defined to get access to the internal C API:
* _blake2
* _hashopenssl
* _md5
* _sha1
* _sha3
* _ssl
* binascii
The binhex module, deprecated in Python 3.9, is now removed. The
following binascii functions, deprecated in Python 3.9, are now also
removed:
* a2b_hqx(), b2a_hqx();
* rlecode_hqx(), rledecode_hqx().
The binascii.crc_hqx() function remains available.
* Renamed assertLeadingPadding function to match logic
* Added a separate error message for discontinuous padding
* Updated the tests for discontinuous padding
binascii.a2b_base64 gains a strict_mode= parameter. When enabled it will raise an
error on input that deviates from the base64 spec in any way. The default remains
False for backward compatibility.
Code reviews and minor tweaks by: Gregory P. Smith <greg@krypto.org> [Google]
Extension modules: m_traverse, m_clear and m_free functions of
PyModuleDef are no longer called if the module state was requested
but is not allocated yet. This is the case immediately after the
module is created and before the module is executed (Py_mod_exec
function). More precisely, these functions are not called if m_size is
greater than 0 and the module state (as returned by
PyModule_GetState()) is NULL.
Extension modules without module state (m_size <= 0) are not affected.
Co-Authored-By: Petr Viktorin <encukou@gmail.com>
Deprecate binhex4 and hexbin4 standards. Deprecate the binhex module
and the following binascii functions:
* b2a_hqx(), a2b_hqx()
* rlecode_hqx(), rledecode_hqx()
* crc_hqx()
* bpo-22385: Support output separators in hex methods.
Also in binascii.hexlify aka b2a_hex.
The underlying implementation behind all hex generation in CPython uses the
same pystrhex.c implementation. This adds support to bytes, bytearray,
and memoryview objects.
The binascii module functions exist rather than being slated for deprecation
because they return bytes rather than requiring an intermediate step through a
str object.
This change was inspired by MicroPython which supports sep in its binascii
implementation (and does not yet support the .hex methods).
https://bugs.python.org/issue22385
Improvements:
1. Include the number of valid data characters in the error message.
2. Mention "number of data characters" rather than "length".
https://bugs.python.org/issue34736