Add `gi_state`, `cr_state`, and `ag_state` attributes to generators,
coroutines, and async generators respectively. These attributes return the
current state as a string (e.g., `GEN_RUNNING`, `CORO_SUSPENDED`).
The `inspect.getgeneratorstate()`, `inspect.getcoroutinestate()`, and
`inspect.getasyncgenstate()` functions now return these attributes directly.
This is in preparation for making `gi_frame` thread-safe, which may involve
stop-the-world synchronization. The new state attributes avoid potential
performance cliffs in `inspect.getgeneratorstate()` and similar functions by
not requiring frame access.
Also removes unused `FRAME_COMPLETED` state and renumbers the frame state enum
to start at 0 instead of -1.
The pymalloc huge page support had two problems. First, on
architectures where the default huge page size exceeds the arena
size (e.g. 32 MiB on PPC, 512 MiB on ARM64 with 64 KB base
pages), mmap with MAP_HUGETLB silently allocates a full huge page
even when the requested size is smaller. The subsequent munmap
with the original arena size then fails with EINVAL, permanently
leaking the entire huge page. Second, huge pages were always
attempted when compiled in, with no way to disable them at
runtime. On Linux, if the huge page pool is exhausted, page
faults including copy-on-write faults after fork deliver SIGBUS
and kill the process.
The arena allocator now queries the system huge page size from
/proc/meminfo and skips MAP_HUGETLB when the arena size is not a
multiple of it. Huge pages also now require explicit opt-in at
runtime via the PYTHON_PYMALLOC_HUGEPAGES environment variable,
which is read through PyConfig and respects -E and -I flags.
The config field pymalloc_hugepages is propagated to the runtime
allocators struct so the low-level arena allocator can check it
without calling getenv directly.
Add a FRAME_SUSPENDED_YIELD_FROM_LOCKED state that acts as a brief
lock, preventing other threads from transitioning the frame state
while gen_getyieldfrom reads the yield-from object off the stack.
Now that the specializing interpreter works with free threading,
replace ENABLE_SPECIALIZATION_FT with ENABLE_SPECIALIZATION and
replace requires_specialization_ft with requires_specialization.
Also limit the uniquely referenced check to FOR_ITER_RANGE. It's not
necessary for FOR_ITER_GEN and would cause test_for_iter_gen to fail.
* Halve size of buffers by reusing combined trace + optimizer buffers for TOS caching
* Add simple buffer struct for more maintainable handling of buffers
* Decouple JIT structs from thread state struct
* Ensure terminator is added to trace, when optimizer gives up
If we are specializing to `LOAD_GLOBAL_MODULE` or `LOAD_ATTR_MODULE`, try
to enable deferred reference counting for the value, if the object is owned by
a different thread. This applies to the free-threaded build only and should
improve scaling of multi-threaded programs.
* Moves the `GET_ITER` instruction into the generator function preamble.
This means the the iterable is converted into an iterator during generator
creation, as documented, but keeps it in the same code object allowing
optimization.
* move JitOptContext to _PyThreadStateImpl
* make _PyUOpInstruction buffer a part of _PyThreadStateImpl
Co-authored-by: Kumar Aditya <kumaraditya@python.org>