The -fprofile-update=atomic flag was added to fix a random GCC
internal error on PGO build (gh-145801) caused by corruption of
profile data (.gcda files). The problem is that it makes the PGO
build way slower (up to 47x slower) on i686. Since the GCC internal
error was not seen on i686 so far, don't use -fprofile-update=atomic
on i686.
This option changes the behavior of --enable-shared to continue to build
the libpython3.x.so shared library, but not use it for linking the
python3 interpreter executable. Instead, the executable is linked
directly against the libpython .o files as it would be with
--disable-shared.
There are two benefits of this change. First, libpython uses
thread-local storage, which is noticeably slower when used in a loaded
module instead of in the main program, because the main program can take
advantage of constant offsets from the thread state pointer but loaded
modules have to dynamically call a function __tls_get_addr() to
potentially allocate their thread-local storage area. (There is another
thread-local storage model for dynamic libraries which mitigates most of
this performance hit, but it comes at the cost of preventing
dlopen("libpython3.x.so"), which is a use case we want to preserve.)
Second, this improves the user experience around relocatable Python a
little bit, in that we don't need to use an $ORIGIN-relative path to
locate libpython3.x.so, which has some mild benefits around musl (which
does not support $ORIGIN-relative DT_NEEDED, only $ORIGIN-relative
DT_RPATH/DT_RUNPATH), users who want to make the interpreter setuid or
setcap (which prevents processing $ORIGIN), etc.
When Python build is optimized with GCC using PGO, use
-fprofile-update=atomic option to use atomic operations when updating
profile information. This option reduces the risk of gcov Data Files
(.gcda) corruption which can cause random GCC crashes.
* Drop DOUBLE_IS_ARM_MIXED_ENDIAN_IEEE754 macro.
* Use DOUBLE_IS_BIG/LITTLE_ENDIAN_IEEE754 to detect endianness of
float/doubles.
* Drop "unknown_format" code path in PyFloat_Pack/Unpack*().
Co-authored-by: Victor Stinner <vstinner@python.org>
This undoes a change made as a part of PR 137470, for compatibility with EMSDK
4.0.19. It adds `emscripten_trampoline` field in `pycore_runtime_structs.h`
and initializes it from JS initialization code with the wasm-gc based trampoline
if possible. Otherwise we fall back to the JS trampoline.
* introduce executable specific linker flags
Add PY_CORE_EXE_LDFLAGS and EXE_LDFLAGS which stores executable specific
LDFLAGS, replacing PY_CORE_LDFLAGS for building
executable targets.
If PY_CORE_EXE_LDFLAGS / EXE_LDFLAGS is not provided, then it defaults
to the value of PY_CORE_LDFLAGS which is the existing behaviour.
If both flags are supplied, and there is a need
to distinguish between executable and shared specific LDFLAGS,
in particular, PY_CORE_LDFLAGS should contain the shared specific LDFLAGS.
* documentation for new linker flags
* update Misc folder documentation
* Update Makefile.pre.in
Co-authored-by: Victor Stinner <vstinner@python.org>
---------
Co-authored-by: Filipe Laíns <filipe.lains@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Filipe Laíns <lains@riseup.net>
Add SIMD optimization for `bytes.hex()`, `bytearray.hex()`, and `binascii.hexlify()` as well as `hashlib` `.hexdigest()` methods using platform-agnostic GCC/Clang vector extensions that compile to native SIMD instructions on our [PEP-11 Tier 1 Linux and macOS](https://peps.python.org/pep-0011/#tier-1) platforms.
- 1.1-3x faster for common small data (16-64 bytes, covering md5 through sha512 digest sizes)
- Up to 11x faster for large data (1KB+)
- Retains the existing scalar code for short inputs (<16 bytes) or platforms lacking SIMD instructions, no observable performance regressions there.
## Supported platforms:
- x86-64: the compiler generates SSE2 - always available, no flags or CPU feature checks needed
- ARM64: NEON is always available, always available, no flags or CPU feature checks needed
- ARM32: Requires NEON support and that appropriate compiler flags enable that (e.g., `-march=native` on a Raspberry Pi 3+) - while we _could_ use runtime detection to allow neon when compiled without a recent enough `-march=` flag (`cortex-a53` and later IIRC), there are diminishing returns in doing so. Anyone using 32-bit ARM in a situation where performance matters will already be compiling with such flags. (as opposed to 32-bit Raspbian compilation that defaults to aiming primarily for compatibility with rpi1&0 armv6 arch=armhf which lacks neon)
- Windows/MSVC: Not supported. MSVC lacks `__builtin_shufflevector`, so the existing scalar path is used. Leaving it as an opportunity for the future for someone to figure out how to express the intent to that compiler.
This is compile time detection of features that are always available on the target architectures. No need for runtime feature inspection.
If you are building with `--with-thread-sanitizer` and don't use the
suppression file, then running configure will report a thread leak.
Call `pthread_join()` to avoid the thread leak.
* GH-108819: fix LIBDEST not honoring --with-platlibdir
We look for the pure-Python part of the standard library in
PLATSTDLIBDIR, which may not match the default LIBDIR subdir.
From ``getpath.py``:
```python
...
STDLIB_SUBDIR = f'{platlibdir}/python{VERSION_MAJOR}.{VERSION_MINOR}{ABI_THREAD}'
STDLIB_LANDMARKS = [f'{STDLIB_SUBDIR}/os.py', f'{STDLIB_SUBDIR}/os.pyc']
PLATSTDLIB_LANDMARK = f'{platlibdir}/python{VERSION_MAJOR}.{VERSION_MINOR}{ABI_THREAD}/lib-dynload'
...
```
Signed-off-by: Filipe Laíns <lains@riseup.net>
* Add news
Signed-off-by: Filipe Laíns <lains@riseup.net>
* Always set LIBDEST and BINLIBDEST based on PLATLIBDIR
Signed-off-by: Filipe Laíns <lains@riseup.net>
* Add XXX comment on PLATLIBDIR default value
Signed-off-by: Filipe Laíns <lains@riseup.net>
* Regen configure
Signed-off-by: Filipe Laíns <lains@riseup.net>
---------
Signed-off-by: Filipe Laíns <lains@riseup.net>
On m68k, an fmove instruction accessing %fpcr may only move from
or to a data register or a memory operand. The constraint "g" also
permits the use of address registers, which is invalid. The correct
constraint is "dm". Beginning with GCC 15, the register allocator
picks an address register in the code which causes SIGILL during
runtime.
Co-authored-by: Michael Karcher <github@mkarcher.dialup.fu-berlin.de>
While CPython doesn't support `--enable-wasm-dynamic-linking`, external tools like componentize-py do and they have to patch around it. Since the flag is off by default, allowing the flag so external users can add/inject dynamic linking support seems acceptable.
Adapted from a patch for Python 3.14 submitted to the Debian BTS by John
https://bugs.debian.org/1105111#20
Co-authored-by: John David Anglin <dave.anglin@bell.net>
Some systems have the definitions of the mask bits without having the
corresponding members in struct statx. Add configure checks for members
added after Linux 4.11 (when statx itself was added).
Android has Linux's statx, but MACHDEP is "android" on Android, so
configure doesn't check for statx on Android. Base the check for statx
on ac_sys_system instead, which is "Linux-android" on Android, "Linux"
on other Linux distributions, and "AIX" on AIX (which has an
incompatible function named statx).
stx_atomic_write_unit_max_opt was added in Linux 6.16, but is controlled
by the STATX_WRITE_ATOMIC mask bit added in Linux 6.11. That's safe at
runtime because all kernels clear the reserved space in struct statx and
zero is a valid value for stx_atomic_write_unit_max_opt, and it avoids
allocating another mask bit, which are a limited resource. But it also
means the kernel headers don't provide a way to check whether
stx_atomic_write_unit_max_opt exists, so add a configure check.
Adds tooling to generate and test an iOS XCframework, in a way that will also facilitate
adding other XCframework targets for other Apple platforms (tvOS, watchOS, visionOS and
even macOS, potentially).
---------
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>