The `AC_PATH_TOOL` calls had included a duplicated argument, causing a "`PATH`"
consisting of `notfound` to be searched instead of `$PATH`.
(cherry picked from commit c613f72eee)
Co-authored-by: sendaoYan <yansendao@126.com>
gh-148260: Use at least 1 MiB stack size on musl (GH-149993)
On Linux when Python is linked to the musl C library, use a thread
stack size of at least 1 MiB instead of musl default which is 128
kiB.
(cherry picked from commit df6c157e51)
Co-authored-by: Victor Stinner <vstinner@python.org>
The power ABI specification requires that compilers maintain a back
chain by default, so unwinding already works without a dedicated
frame pointer. Don't use -fno-omit-frame-pointer on ppc64le.
-fno-omit-frame-pointer is not enough to make every target walkable by the
simple manual frame pointer unwinder.
The helper used by test_frame_pointer_unwind used to assume the frame pointer
named a two-word record where fp[0] was the previous frame pointer and fp[1]
was the return address. That is only the generic layout used by some targets.
This patch keeps that default, but moves the slots behind named offsets so
architecture-specific layouts can describe where the backchain and return
address really live.
On s390x, GCC and Clang do not emit a usable backchain unless -mbackchain is
enabled. Without it, the unwinder stops at the current C frame and the test
reports no Python frames. Once backchains are present, the helper must also
stop at the current thread's known C stack bounds; otherwise it can follow the
final backchain far enough to dereference an invalid frame and segfault.
For Linux s390x backchain frames, the documented z/Architecture stack-frame
layout saves r14, the return-address register, at byte offset 112 from the
frame pointer, so read the return address from that named slot instead of fp[1].
The 112-byte offset comes from Linux's s390 debugging documentation: its Stack
Frame Layout table shows z/Architecture backchain frames with the backchain at
offset 0 and saved r14 of the caller function at offset 112:
https://www.kernel.org/doc/html/v5.3/s390/debugging390.html#stack-frame-layout
This helper remains scoped to Linux s390x backchain frames. GNU SFrame's s390x
notes state that the s390x ELF ABI does not generally mandate where RA and FP
are saved, or whether they are saved at all:
https://sourceware.org/binutils/docs/sframe-spec.html#s390x
As Jens Remus noted, -fno-omit-frame-pointer is not needed when -mbackchain is
present.
On 32-bit ARM, GCC defaults to Thumb mode on common armhf toolchains. The Thumb
prologue keeps the saved frame pointer and link register at offsets that depend
on the generated frame, which breaks the fp[0]/fp[1] walk used by the helper.
Use -marm when it is supported for frame-pointer builds, and teach the helper
the GCC ARM-mode slots where the previous frame pointer is at fp[-1] and the
saved LR return address is at fp[0].
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Savannah Ostrowski <savannah@python.org>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Emma Smith <emma@emmatyping.dev>
The -fprofile-update=atomic flag was added to fix a random GCC
internal error on PGO build (gh-145801) caused by corruption of
profile data (.gcda files). The problem is that it makes the PGO
build way slower (up to 47x slower) on i686. Since the GCC internal
error was not seen on i686 so far, don't use -fprofile-update=atomic
on i686.
This option changes the behavior of --enable-shared to continue to build
the libpython3.x.so shared library, but not use it for linking the
python3 interpreter executable. Instead, the executable is linked
directly against the libpython .o files as it would be with
--disable-shared.
There are two benefits of this change. First, libpython uses
thread-local storage, which is noticeably slower when used in a loaded
module instead of in the main program, because the main program can take
advantage of constant offsets from the thread state pointer but loaded
modules have to dynamically call a function __tls_get_addr() to
potentially allocate their thread-local storage area. (There is another
thread-local storage model for dynamic libraries which mitigates most of
this performance hit, but it comes at the cost of preventing
dlopen("libpython3.x.so"), which is a use case we want to preserve.)
Second, this improves the user experience around relocatable Python a
little bit, in that we don't need to use an $ORIGIN-relative path to
locate libpython3.x.so, which has some mild benefits around musl (which
does not support $ORIGIN-relative DT_NEEDED, only $ORIGIN-relative
DT_RPATH/DT_RUNPATH), users who want to make the interpreter setuid or
setcap (which prevents processing $ORIGIN), etc.
When Python build is optimized with GCC using PGO, use
-fprofile-update=atomic option to use atomic operations when updating
profile information. This option reduces the risk of gcov Data Files
(.gcda) corruption which can cause random GCC crashes.
* Drop DOUBLE_IS_ARM_MIXED_ENDIAN_IEEE754 macro.
* Use DOUBLE_IS_BIG/LITTLE_ENDIAN_IEEE754 to detect endianness of
float/doubles.
* Drop "unknown_format" code path in PyFloat_Pack/Unpack*().
Co-authored-by: Victor Stinner <vstinner@python.org>
This undoes a change made as a part of PR 137470, for compatibility with EMSDK
4.0.19. It adds `emscripten_trampoline` field in `pycore_runtime_structs.h`
and initializes it from JS initialization code with the wasm-gc based trampoline
if possible. Otherwise we fall back to the JS trampoline.
* introduce executable specific linker flags
Add PY_CORE_EXE_LDFLAGS and EXE_LDFLAGS which stores executable specific
LDFLAGS, replacing PY_CORE_LDFLAGS for building
executable targets.
If PY_CORE_EXE_LDFLAGS / EXE_LDFLAGS is not provided, then it defaults
to the value of PY_CORE_LDFLAGS which is the existing behaviour.
If both flags are supplied, and there is a need
to distinguish between executable and shared specific LDFLAGS,
in particular, PY_CORE_LDFLAGS should contain the shared specific LDFLAGS.
* documentation for new linker flags
* update Misc folder documentation
* Update Makefile.pre.in
Co-authored-by: Victor Stinner <vstinner@python.org>
---------
Co-authored-by: Filipe Laíns <filipe.lains@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Filipe Laíns <lains@riseup.net>
Add SIMD optimization for `bytes.hex()`, `bytearray.hex()`, and `binascii.hexlify()` as well as `hashlib` `.hexdigest()` methods using platform-agnostic GCC/Clang vector extensions that compile to native SIMD instructions on our [PEP-11 Tier 1 Linux and macOS](https://peps.python.org/pep-0011/#tier-1) platforms.
- 1.1-3x faster for common small data (16-64 bytes, covering md5 through sha512 digest sizes)
- Up to 11x faster for large data (1KB+)
- Retains the existing scalar code for short inputs (<16 bytes) or platforms lacking SIMD instructions, no observable performance regressions there.
## Supported platforms:
- x86-64: the compiler generates SSE2 - always available, no flags or CPU feature checks needed
- ARM64: NEON is always available, always available, no flags or CPU feature checks needed
- ARM32: Requires NEON support and that appropriate compiler flags enable that (e.g., `-march=native` on a Raspberry Pi 3+) - while we _could_ use runtime detection to allow neon when compiled without a recent enough `-march=` flag (`cortex-a53` and later IIRC), there are diminishing returns in doing so. Anyone using 32-bit ARM in a situation where performance matters will already be compiling with such flags. (as opposed to 32-bit Raspbian compilation that defaults to aiming primarily for compatibility with rpi1&0 armv6 arch=armhf which lacks neon)
- Windows/MSVC: Not supported. MSVC lacks `__builtin_shufflevector`, so the existing scalar path is used. Leaving it as an opportunity for the future for someone to figure out how to express the intent to that compiler.
This is compile time detection of features that are always available on the target architectures. No need for runtime feature inspection.
If you are building with `--with-thread-sanitizer` and don't use the
suppression file, then running configure will report a thread leak.
Call `pthread_join()` to avoid the thread leak.
* GH-108819: fix LIBDEST not honoring --with-platlibdir
We look for the pure-Python part of the standard library in
PLATSTDLIBDIR, which may not match the default LIBDIR subdir.
From ``getpath.py``:
```python
...
STDLIB_SUBDIR = f'{platlibdir}/python{VERSION_MAJOR}.{VERSION_MINOR}{ABI_THREAD}'
STDLIB_LANDMARKS = [f'{STDLIB_SUBDIR}/os.py', f'{STDLIB_SUBDIR}/os.pyc']
PLATSTDLIB_LANDMARK = f'{platlibdir}/python{VERSION_MAJOR}.{VERSION_MINOR}{ABI_THREAD}/lib-dynload'
...
```
Signed-off-by: Filipe Laíns <lains@riseup.net>
* Add news
Signed-off-by: Filipe Laíns <lains@riseup.net>
* Always set LIBDEST and BINLIBDEST based on PLATLIBDIR
Signed-off-by: Filipe Laíns <lains@riseup.net>
* Add XXX comment on PLATLIBDIR default value
Signed-off-by: Filipe Laíns <lains@riseup.net>
* Regen configure
Signed-off-by: Filipe Laíns <lains@riseup.net>
---------
Signed-off-by: Filipe Laíns <lains@riseup.net>
On m68k, an fmove instruction accessing %fpcr may only move from
or to a data register or a memory operand. The constraint "g" also
permits the use of address registers, which is invalid. The correct
constraint is "dm". Beginning with GCC 15, the register allocator
picks an address register in the code which causes SIGILL during
runtime.
Co-authored-by: Michael Karcher <github@mkarcher.dialup.fu-berlin.de>
While CPython doesn't support `--enable-wasm-dynamic-linking`, external tools like componentize-py do and they have to patch around it. Since the flag is off by default, allowing the flag so external users can add/inject dynamic linking support seems acceptable.
Adapted from a patch for Python 3.14 submitted to the Debian BTS by John
https://bugs.debian.org/1105111#20
Co-authored-by: John David Anglin <dave.anglin@bell.net>
Some systems have the definitions of the mask bits without having the
corresponding members in struct statx. Add configure checks for members
added after Linux 4.11 (when statx itself was added).
Android has Linux's statx, but MACHDEP is "android" on Android, so
configure doesn't check for statx on Android. Base the check for statx
on ac_sys_system instead, which is "Linux-android" on Android, "Linux"
on other Linux distributions, and "AIX" on AIX (which has an
incompatible function named statx).
stx_atomic_write_unit_max_opt was added in Linux 6.16, but is controlled
by the STATX_WRITE_ATOMIC mask bit added in Linux 6.11. That's safe at
runtime because all kernels clear the reserved space in struct statx and
zero is a valid value for stx_atomic_write_unit_max_opt, and it avoids
allocating another mask bit, which are a limited resource. But it also
means the kernel headers don't provide a way to check whether
stx_atomic_write_unit_max_opt exists, so add a configure check.