In free-threaded builds, concurrent calls to PyDict_AddWatcher, PyDict_ClearWatcher, PyDict_Watch, and PyDict_Unwatch can race on the shared callback array and the per-dict watcher tags. This change adds a mutex to serialize watcher registration and removal, atomic operations for tag updates, and atomic acquire/release synchronization for callback dispatch in _PyDict_SendEvent.
* SEND specialization. Adds 2 new specialized instructions:
* SEND_VIRTUAL: for sends to virtual iterators e.g lists and tuples
* SEND_ASYNC_GEN: for sends to async generators
Tweak FOR_ITER_VIRTUAL so that SEND_VIRTUAL and FOR_ITER_VIRTUAL use equivalent guards
The function__entry and function__return probes stopped working in Python 3.11
when the interpreter was restructured around the new bytecode system. This change
restores these probes by adding DTRACE_FUNCTION_ENTRY() at the start_frame label
in bytecodes.c and DTRACE_FUNCTION_RETURN() in the RETURN_VALUE and YIELD_VALUE
instructions. The helper functions are defined in ceval.c and extract the
filename, function name, and line number from the frame before firing the probe.
This builds on the approach from https://github.com/python/cpython/pull/125019
but avoids modifying the JIT template since the JIT does not currently support
DTrace. The macros are conditionally compiled with WITH_DTRACE and are no-ops
otherwise. The tests have been updated to use modern opcode names (CALL, CALL_KW,
CALL_FUNCTION_EX) and a new bpftrace backend was added for Linux CI alongside
the existing SystemTap tests. Line probe tests were removed since that probe
was never restored after 3.11.
The replaces the incremental GC with a forward port (from 3.13) of the generational GC.
Co-Authored-By: Neil Schemenauer <nas@arctrix.com>
Co-Authored-By: Zanie Blue <contact@zanie.dev>
Co-Authored-By: Sergey Miryanov <sergey.miryanov@gmail.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Add a keyword-only `max_threads` argument to `dump_traceback()` and
`dump_traceback_later()`, defaulting to 100 to preserve existing
behavior. Allows server processes with many worker threads to dump
beyond the historical 100-thread cap (previously a hardcoded
`MAX_NTHREADS = 100` in `Python/traceback.c`).
The cap matters in practice: tstates are prepended to the
PyInterpreterState linked list, so the dump walks newest-first. With
more than 100 threads alive, the main thread (oldest, at the tail) is
silently elided from watchdog dumps -- exactly the thread that's
usually wanted.
The hardcoded value is moved to a new internal macro
`_Py_TRACEBACK_MAX_NTHREADS` in `pycore_traceback.h` so the in-tree
fatal-signal callers all reference one source of truth.
* Records the same objects for each member of family before execution
* Records derived values when recording the trace
* This makes sure that specialization, or deoptimization, does not cause invalid values to be recorded
The add_const() function in flowgraph.c uses a linear search over the
consts list to find the index of a constant. After gh-126835 moved
constant folding from the AST optimizer to the CFG optimizer, this
function is now called N times for N inner tuple elements during
fold_tuple_of_constants(), resulting in O(N²) total time.
Fix by maintaining an auxiliary _Py_hashtable_t that maps object
pointers to their indices in the consts list, providing O(1) lookup.
For a file with 100,000 constant 2-tuples:
- Before: 10.38s (add_const occupies 83.76% of CPU time)
- After: 1.48s