Commit graph

455 commits

Author SHA1 Message Date
Ken Jin
4fa80ce74c
gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310)
This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. **Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR**. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg

Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md .

This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277

The optimizer stack space check is disabled, as it's no longer valid to deal with underflow.

Pros:
* Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator.
* `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace.
* The new JIT frontend is able to handle a lot more control-flow than the old one.
* Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing.
* Better handling of polymorphism. We leverage the specializing interpreter for this.

Cons:
* (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962

Design:
* After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests.
* The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering.
* The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory.
* The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt.
* The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested.
* Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace.
* Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.
2025-11-13 18:08:32 +00:00
Cody Maloney
732224e113
gh-139871: Add bytearray.take_bytes([n]) to efficiently extract bytes (GH-140128)
Update `bytearray` to contain a `bytes` and provide a zero-copy path to
"extract" the `bytes`. This allows making several code paths more efficient.

This does not move any codepaths to make use of this new API. The documentation
changes include common code patterns which can be made more efficient with
this API.

---

When just changing `bytearray` to contain `bytes` I ran pyperformance on a
`--with-lto --enable-optimizations --with-static-libpython` build and don't see
any major speedups or slowdowns with this; all seems to be in the noise of
my machine (Generally changes under 5% or benchmarks that don't touch
bytes/bytearray).


Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Maurycy Pawłowski-Wieroński <5383+maurycy@users.noreply.github.com>
2025-11-13 13:19:44 +00:00
Petr Viktorin
589a03a8ce
gh-140550: Initial implementation of PEP 793 – PyModExport (GH-140556)
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
2025-11-05 12:31:42 +01:00
Adam Turner
5443b9e52f
gh-133143: Condense the implementation for `sys.abi_info` (#138672) 2025-09-08 19:21:28 +00:00
Klaus Zimmermann
1acb718ea2
gh-133143: Add sys.abi_info (GH-137476)
This makes information about the interpreter ABI more accessible.

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2025-09-08 14:35:44 +00:00
Peter Bierma
e8251dc0ae
gh-134170: Add colorization to unraisable exceptions (#134183)
Default implementation of sys.unraisablehook() now uses traceback._print_exception_bltin() to print exceptions with colorized text.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
2025-08-04 14:35:00 +00:00
Serhiy Storchaka
c45f4f3ebe
gh-78465: Fix error message for cls.__new__(cls, ...) where cls is not instantiable (GH-135981)
Previous error message suggested to use cls.__new__(), which
obviously does not work. Now the error message is the same as for
cls(...).
2025-06-27 14:35:55 +03:00
sobolevn
8ca1e4d846
gh-135645: Added supports_isolated_interpreters to sys.implementation (#135667)
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
2025-06-21 10:56:14 +03:00
Nadeshiko Manju
1ddfe59320
gh-135543: Emit sys.remote_exec audit event when sys.remote_exec is called (GH-135544) 2025-06-19 21:23:38 +01:00
Eric Snow
62143736b6
gh-134939: Add the concurrent.interpreters Module (gh-133958)
PEP-734 has been accepted (for 3.14).

(FTR, I'm opposed to putting this under the concurrent package, but
doing so is the SC condition under which the module can land in 3.14.)
2025-06-11 17:35:48 -06:00
tpburns
54ca55978e
gh-134248 test_getallocatedblocks pre-check to ignore immortalized strings (#134871)
When sanity checking against gettotalrefcount(), we exclude the blocks for
immortalized strings since their references are not tracked/reported. This
now matches refleak.py's book-keeping using the same functions.
2025-06-03 18:00:25 +02:00
CF Bolz-Tereick
895119ec24
skip test for sys._stdlib_dir if that is not present (#134973) 2025-05-31 13:46:22 +02:00
Victor Stinner
ebf6d13567
gh-134745: Change PyThread_allocate_lock() implementation to PyMutex (#134747)
Co-authored-by: Sam Gross <colesbury@gmail.com>
2025-05-30 10:15:47 +00:00
Serhiy Storchaka
2602d8ae98
gh-71339: Use new assertion methods in tests (GH-129046) 2025-05-22 13:17:22 +03:00
Victor Stinner
009e7b3698
gh-134064: Fix sys.remote_exec() error checking (#134067) 2025-05-18 00:24:40 +02:00
Serhiy Storchaka
c09cec5d69
gh-133886: Fix sys.remote_exec() for non-UTF-8 paths (GH-133887)
It now supports non-ASCII paths in non-UTF-8 locales and
non-UTF-8 paths in UTF-8 locales.
2025-05-13 11:55:24 +03:00
Irit Katriel
296cd128bf
Revert "gh-133395: add option for extension modules to specialize BINARY_OP/SUBSCR, apply to arrays (#133396)" (#133498) 2025-05-06 13:12:26 +03:00
Brandt Bucher
b1aa515bd6
GH-133231: Add JIT utilities in sys._jit (GH-133233) 2025-05-05 15:25:22 -07:00
Irit Katriel
082dbf7788
gh-133395: add option for extension modules to specialize BINARY_OP/SUBSCR, apply to arrays (#133396) 2025-05-05 17:46:56 +01:00
littlebutt's workshop
d6078ed6d0
gh-132143: Fix the AssertionError in the test case test.test_sys.TestRemoteExec (#132248) 2025-05-05 17:08:49 +01:00
Adam Turner
3f80165a26
GH-91048: Minor fixes for `_remotedebugging & rename to _remote_debugging` (#133398) 2025-05-05 02:30:14 +02:00
Pablo Galindo Salgado
2bc8365231
GH-91048: Add utils for printing the call stack for asyncio tasks (#133284) 2025-05-04 00:51:57 +00:00
Srinivas Reddy Thatiparthy (తాటిపర్తి శ్రీనివాస్ రెడ్డి)
8783cec9b6
gh-129027: Raise DeprecationWarning for sys._clear_type_cache (#129043)
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
2025-04-25 15:01:48 +03:00
Matt Wozniski
a94c7528b5
gh-132859: Run debugger scripts in their own namespaces (#132860)
Run debugger scripts in their own namespaces

Previously scripts injected by `sys.remote_exec` were run with the
globals of the `__main__` module. Instead, run each injected script
with an empty set of globals. If someone really wants to use the
`__main__` module's namespace, they can always `import __main__`.
2025-04-23 23:40:24 +00:00
Xuehai Pan
26ae05e95c
gh-127405: Add ABIFLAGS to sysconfig variables on Windows (GH-131799) 2025-04-11 16:19:03 +01:00
Neil Schemenauer
d687900f98
gh-128384: Use a context variable for warnings.catch_warnings (gh-130010)
Make `warnings.catch_warnings()` use a context variable for holding
the warning filtering state if the `sys.flags.context_aware_warnings`
flag is set to true.  This makes using the context manager thread-safe in
multi-threaded programs.

Add the `sys.flags.thread_inherit_context` flag.  If true, starting a new
thread with `threading.Thread` will use a copy of the context
from the caller of `Thread.start()`.

Both these flags are set to true by default for the free-threaded build
and false for the default build.

Move the Python implementation of warnings.py into _py_warnings.py.

Make _contextvars a builtin module.

Co-authored-by: Kumar Aditya <kumaraditya@python.org>
2025-04-09 16:18:54 -07:00
Pablo Galindo Salgado
943cc1431e
gh-131591: Implement PEP 768 (#131937)
Co-authored-by: Ivona Stojanovic <stojanovic.i@hotmail.com>
Co-authored-by: Matt Wozniski <godlygeek@gmail.com>
2025-04-03 16:20:01 +01:00
mpage
053c285f6b
gh-130704: Strength reduce LOAD_FAST{_LOAD_FAST} (#130708)
Optimize `LOAD_FAST` opcodes into faster versions that load borrowed references onto the operand stack when we can prove that the lifetime of the local outlives the lifetime of the temporary that is loaded onto the stack.
2025-04-01 10:18:42 -07:00
Michael Droettboom
8614f86b71
gh-131525: Cache the result of tuple_hash (#131529)
* gh-131525: Cache the result of tuple_hash

* Fix debug builds

* Add blurb

* Fix formatting

* Pre-compute empty tuple singleton

* Mostly set the cache within tuple_alloc

* Fixes for TSAN

* Pre-compute empty tuple singleton

* Fix for 32-bit platforms

* Assert that op != NULL in _PyTuple_RESET_HASH_CACHE

* Use FT_ATOMIC_STORE_SSIZE_RELAXED macro

* Update Include/internal/pycore_tuple.h

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

* Fix alignment

* atomic load

* Update Objects/tupleobject.c

Co-authored-by: Chris Eibl <138194463+chris-eibl@users.noreply.github.com>

---------

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Chris Eibl <138194463+chris-eibl@users.noreply.github.com>
2025-03-27 09:57:06 -04:00
Peter Bierma
90b82f2b61
gh-129900: Fix SystemExit return codes when the REPL is started from the command line (#129901) 2025-03-25 19:48:46 +00:00
Serhiy Storchaka
0ef4ffeefd
gh-130163: Fix crashes related to PySys_GetObject() (GH-130503)
The use of PySys_GetObject() and _PySys_GetAttr(), which return a borrowed
reference, has been replaced by using one of the following functions, which
return a strong reference and distinguish a missing attribute from an error:
_PySys_GetOptionalAttr(), _PySys_GetOptionalAttrString(),
_PySys_GetRequiredAttr(), and _PySys_GetRequiredAttrString().
2025-02-25 23:04:27 +02:00
Mark Shannon
014223649c
GH-130396: Use computed stack limits on linux (GH-130398)
* Implement C recursion protection with limit pointers for Linux, MacOS and Windows

* Remove calls to PyOS_CheckStack

* Add stack protection to parser

* Make tests more robust to low stacks

* Improve error messages for stack overflow
2025-02-25 09:24:48 +00:00
Russell Keith-Magee
8a76eb8469
gh-130384: Skip a test_getallocatedblocks test pre-condition on iOS. (GH-130385) 2025-02-24 15:34:38 +00:00
Sam Gross
a6a8c6f86e
gh-128954: Reorder _PyInterpreterFrame fields for reduced memory usage (#128958)
This reduces the size of _PyInterpreterFrame by 8 bytes on 64-bit
platforms using the free threading build due to alignment requirements.

This allows for slightly more recursive calls into the interpreter (from
C), but `test_call.test_super_deep` still crashes.
2025-01-27 17:14:51 +00:00
mpage
2e95c5ba3b
gh-115999: Implement thread-local bytecode and enable specialization for BINARY_OP (#123926)
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.

Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.

Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
2024-11-04 11:13:32 -08:00
Sam Gross
ad6110a93f
gh-125842: Fix sys.exit(0xffff_ffff) on Windows (#125896)
On Windows, `long` is a signed 32-bit integer so it can't represent
`0xffff_ffff` without overflow. Windows exit codes are unsigned 32-bit
integers, so if a child process exits with `-1`, it will be represented
as `0xffff_ffff`.

Also fix a number of other possible cases where `_Py_HandleSystemExit`
could return with an exception set, leading to a `SystemError` (or
fatal error in debug builds) later on during shutdown.
2024-10-24 12:03:50 -04:00
Donghee Na
ad7c778546
gh-123990: Good bye WITH_FREELISTS macro (gh-124358) 2024-09-24 01:28:59 +00:00
neonene
646f16bdee
gh-124153: Implement PyType_GetBaseByToken() and Py_tp_token slot (GH-124163) 2024-09-18 09:18:19 +02:00
Sam Gross
dc09301067
gh-122417: Implement per-thread heap type refcounts (#122418)
The free-threaded build partially stores heap type reference counts in
distributed manner in per-thread arrays. This avoids reference count
contention when creating or destroying instances.

Co-authored-by: Ken Jin <kenjin@python.org>
2024-08-06 14:36:57 -04:00
Sam Gross
4b63cd170e
gh-122527: Fix a crash on deallocation of PyStructSequence (GH-122577)
The `PyStructSequence` destructor would crash if it was deallocated after
its type's dictionary was cleared by the GC, because it couldn't compute
the "real size" of the instance. This could occur with relatively
straightforward code in the free-threaded build or with a reference
cycle involving the type in the default build, due to differing orders
in which `tp_clear()` was called.

Account for the non-sequence fields in `tp_basicsize` and use that,
along with `Py_SIZE()`, to compute the "real" size of a
`PyStructSequence` in the dealloc function. This avoids the accesses to
the type's dictionary during dealloc, which were unsafe.
2024-08-02 18:11:44 +02:00
Mark Shannon
169324c27a
GH-120024: Use pointer for stack pointer (GH-121923) 2024-07-18 12:47:21 +01:00
Tian Gao
e65cb4c6f0
gh-118934: Make PyEval_GetLocals return borrowed reference (#119769)
Co-authored-by: Alyssa Coghlan <ncoghlan@gmail.com>
2024-07-16 12:17:47 -07:00
Petr Viktorin
6f1d448bc1
gh-113993: Allow interned strings to be mortal, and fix related issues (GH-120520)
* Add an InternalDocs file describing how interning should work and how to use it.

* Add internal functions to *explicitly* request what kind of interning is done:
  - `_PyUnicode_InternMortal`
  - `_PyUnicode_InternImmortal`
  - `_PyUnicode_InternStatic`

* Switch uses of `PyUnicode_InternInPlace` to those.

* Disallow using `_Py_SetImmortal` on strings directly.
  You should use `_PyUnicode_InternImmortal` instead:
  - Strings should be interned before immortalization, otherwise you're possibly
    interning a immortalizing copy.
  - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
    `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
    backports, as they are now part of public API and version-specific ABI.

* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.

* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
  - `_Py_ID`
  - `_Py_STR` (including the empty string)
  - one-character latin-1 singletons

  Now, when you intern a singleton, that exact singleton will be interned.

* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).

* Intern `_Py_STR` singletons at startup.

* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.

* Beef up the tests. Cover internal details (marked with `@cpython_only`).

* Add lots of assertions

Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
2024-06-21 17:19:31 +02:00
Alyssa Coghlan
3859e09e3d
gh-74929: PEP 667 C API documentation (gh-119379)
* Add docs for new APIs
* Add soft-deprecation notices
* Add What's New porting entries
* Update comments referencing `PyFrame_LocalsToFast()` to mention the proxy instead
* Other related cleanups found when looking for refs to the deprecated APIs
2024-06-01 13:59:35 +10:00
Jelle Zijlstra
e9875ecb5d
gh-119180: PEP 649: Add __annotate__ attributes (#119209) 2024-05-22 04:38:12 +02:00
Jeong, YunWon
8d8275b0cf
gh-118473: Fix set_asyncgen_hooks not to be partially set when arguments are invalid (#118474) 2024-05-06 17:02:52 -07:00
Tian Gao
b034f14a4b
gh-74929: Implement PEP 667 (GH-115153) 2024-05-04 12:12:10 +01:00
Brett Simmers
c2627d6eea
gh-116322: Add Py_mod_gil module slot (#116882)
This PR adds the ability to enable the GIL if it was disabled at
interpreter startup, and modifies the multi-phase module initialization
path to enable the GIL when loading a module, unless that module's spec
includes a slot indicating it can run safely without the GIL.

PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went
with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148.

A warning will be issued up to once per interpreter for the first
GIL-using module that is loaded. If `-v` is given, a shorter message
will be printed to stderr every time a GIL-using module is loaded
(including the first one that issues a warning).
2024-05-03 11:30:55 -04:00
Sam Gross
2dae505e87
gh-117514: Add sys._is_gil_enabled() function (#118514)
The function returns `True` or `False` depending on whether the GIL is
currently enabled. In the default build, it always returns `True`
because the GIL is always enabled.
2024-05-03 11:09:57 -04:00
Pablo Galindo Salgado
345e1e04ec
gh-112730: Make the test suite resilient to color-activation environment variables (#117672) 2024-04-24 21:25:22 +01:00