* Add FOR_ITER_VIRTUAL to specialize FOR_ITER for virtual iterators
* Add GET_ITER_SELF to specialize GET_ITER for iterators (including generators)
* Add GET_ITER_VIRTUAL to specialize GET_ITER for iterables as virtual iterators
* Add new (internal) _tp_iteritem function slot to PyTypeObject
* Put limited RESUME at start of genexpr for free-threading. Fix up exception handling in genexpr
The watcher-bits read in _PyDict_NotifyEvent needs to use acquire to
synchronize with the release from PyDict_Watch so that the callback
publication is visible before the callback is invoked.
## Summary
- Move the `runtime->initialized = 1` store from before `site.py` import to the end of `init_interp_main()`, so `Py_IsInitialized()` only returns true after initialization has fully completed
- Access `initialized` and `core_initialized` through new inline accessors using acquire/release atomics, to also protect from data race undefined behavior
- `PySys_AddAuditHook()` now uses the accessor, so with the flag move it correctly skips audit hook invocation during all init phases (matching the documented "after runtime initialization" behavior) ... We could argue that running these earlier would be good even if the intent was never explicitly expressed, but that'd be its own issue.
## Motivation
`Py_IsInitialized()` returned 1 while `Py_InitializeEx()` was still running — specifically, before `site.py` had been imported. See https://github.com/PyO3/pyo3/issues/5900 where a second thread could acquire the GIL and start executing Python with an incomplete `sys.path` because `site.py` hadn't finished.
The flag was also a plain `int` with no atomic operations, making concurrent reads a C-standard data race, though unlikely to manifest.
## Regression test:
The added test properly fails on `main` with `ERROR: Py_IsInitialized() was true during site import`.
---
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There are newly documented restrictions on tp_traverse:
The traversal function must not have any side effects.
It must not modify the reference counts of any Python
objects nor create or destroy any Python objects.
* Add several functions that are guaranteed side-effect-free,
with a _DuringGC suffix.
* Use these in ctypes
* Consolidate tp_traverse docs in gcsupport.rst, moving unique
content from typeobj.rst there
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
When compiling for abi3t, define Py_GIL_DISABLED, so that users who
check it to enable additional locking aren't broken.
But also avoid using Py_GIL_DISABLED in Python headers themselves
-- abi3 and abi3t ought to be the same except
the _Py_OPAQUE_PYOBJECT differences.
A check for this is coming in a later PR.
It will require rewriting some preprocessor conditions, some of these
changes are included in this PR.
For _Py_IsOwnedByCurrentThread & supporting functions
I opted to move them to a cpython/ header, as they're rather self-contained.
Vladimir's original overviews, from 1998, are still good, but going
on 30 years later details have changed. Note that, but rather try
to keep up with moving targets in a different file, point to
sys._debugmallocstats() as the sure way to discover precise current
details.
No code changes, just added a block comment.
Co-authored-by: Bartosz Sławecki <bartosz@ilikepython.com>
Store references to pickle.dumps and pickle.loads in _PyXI_state_t
so they are looked up only once per interpreter lifetime, avoiding
repeated PyImport_ImportModuleAttrString calls on every cross-interpreter
data transfer via pickle fallback.
Benchmarks show 1.7x-3.3x speedup for InterpreterPoolExecutor
when transferring mutable types (list, dict) through XIData.
_PyFrame_Copy() copied interpreter frames into generator and
frame-object storage without initializing the visited byte. Incremental
GC later reads frame->visited in mark_stacks() on non-start passes, so
copied frames could expose an uninitialized value once they became live
on a thread stack again.
Reset visited when copying a frame so copied frames start with defined
GC bookkeeping state. Preserve lltrace in Py_DEBUG builds.
Fix huge page leak in datastack chunk allocator
The original fix rounded datastack chunk allocations in pystate.c so that
_PyObject_VirtualFree() would receive the full huge page mapping size.
Change direction and move that logic into _PyObject_VirtualAlloc() and
_PyObject_VirtualFree() instead. The key invariant is that munmap() must see
the full mapped size, so alloc and free now apply the same platform-specific
rounding in the allocator layer.
This keeps _PyStackChunk bookkeeping in requested-size units, avoids a
hardcoded 2 MB assumption, and also covers other small virtual-memory users
such as the JIT tracer state allocation in optimizer.c.
Add the padded parameter in functions related to Base32 and Base64 codecs
in the binascii and base64 modules. In the encoding functions it controls
whether the pad character can be added in the output, in the decoding
functions it controls whether padding is required in input.
Padding of input no longer required in base64.urlsafe_b64decode() by default.
The gc_stats struct contains ring buffers of gc_generation_stats
entries (11 young + 3×2 old on default builds). Embedding it inline
in _gc_runtime_state, which is itself inline in PyInterpreterState,
pushed fields like _gil.locked and threads.head to offsets beyond
what out-of-process profilers and debuggers can reasonably read in
a single buffer (e.g. offset 9384 for _gil.locked vs an 8 KiB read
buffer).
Heap-allocate generation_stats via PyMem_RawCalloc in _PyGC_Init and
free it in _PyGC_Fini. This shrinks PyInterpreterState by ~1.6 KiB
and keeps the GIL, thread-list, and other frequently-inspected fields
at stable, low offsets.