Remove spurious Py_DECREF on borrowed ref in LOAD_GLOBAL specialization
_PyDict_LookupIndexAndValue() returns a borrowed reference via
_Py_dict_lookup(), but specialize_load_global_lock_held() called
Py_DECREF(value) on it when bailing out for lazy imports. Each time
the adaptive counter fired while a lazy import was still in globals,
this stole one reference from the dict's object. With 8+ threads
racing through LOAD_GLOBAL during concurrent lazy import resolution,
enough triggers accumulated to drive the refcount to zero while the
dict and other threads still referenced the object, causing
use-after-free.
Now that the specializing interpreter works with free threading,
replace ENABLE_SPECIALIZATION_FT with ENABLE_SPECIALIZATION and
replace requires_specialization_ft with requires_specialization.
Also limit the uniquely referenced check to FOR_ITER_RANGE. It's not
necessary for FOR_ITER_GEN and would cause test_for_iter_gen to fail.
This reverts gh-144055 and fixes the bug in a different way. Deferred
reference counting relies on the object being tracked by the GC,
otherwise the object will live until interpreter shutdown. So, take
care that we do not enable deferred reference counting for objects that
are untracked. Also, if a tuple has deferred reference counting
enabled, don't untrack it.
If we are specializing to `LOAD_GLOBAL_MODULE` or `LOAD_ATTR_MODULE`, try
to enable deferred reference counting for the value, if the object is owned by
a different thread. This applies to the free-threaded build only and should
improve scaling of multi-threaded programs.
Allow the --enable-pystats build option to be used with free-threading. The
stats are now stored on a per-interpreter basis, rather than process global.
For free-threaded builds, the stats structure is allocated per-thread and
then periodically merged into the per-interpreter stats structure (on thread
exit or when the reporting function is called). Most of the pystats related
code has be moved into the file Python/pystats.c.
* FOR_ITER now pushes either the iterator and NULL or leaves the iterable and pushes tagged zero
* NEXT_ITER uses the tagged int as the index into the sequence or, if TOS is NULL, iterates as before.
Mark a few functions used by the interpreter loop as noinline
These are all the slow path and should not be inlined into the interpreter
loop. Unfortunately, they end up being inlined with LTO and the current PGO
task.
Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)
Add free-threaded specialization for COMPARE_OP, and tests for COMPARE_OP specialization in general.
Co-authored-by: Donghee Na <donghee.na92@gmail.com>