Fix also error handling in _BINARY_OP_ADD_FLOAT,
_BINARY_OP_SUBTRACT_FLOAT and _BINARY_OP_MULTIPLY_FLOAT opcodes.
PyStackRef_FromPyObjectSteal() must not be called with a NULL
pointer.
* Convert DEOPT_IFs to EXIT_IFs for guards. Keep DEOPT_IF for intentional drops to the interpreter.
* Modify BINARY_OP_SUBSCR_LIST_INT and STORE_SUBSCR_LIST_INT to handle negative indices, to keep EXIT_IFs and DEOPT_IFs in different uops
Remove PyThread_type_lock (now uses PyMutex internally).
Add new benchmark options:
- work_inside/work_outside: control work inside and outside the critical section to vary contention levels
- num_locks: use multiple independent locks with threads assigned round-robin
- total_iters: fixed iteration count per thread instead of time-based, useful for measuring fairness
- num_acquisitions: lock acquisitions per loop iteration
- random_locks: acquire random lock each iteration
Also return elapsed time from benchmark_locks() and switch lockbench.py to use argparse.
When the interpreter is in a stop-the-world pause, critical sections
don't need to acquire locks since no other threads can be running.
This avoids a potential deadlock where lock fairness hands off ownership
to a thread that has already suspended for stop-the-world.
Add `_Py_type_getattro_stackref`, a variant of type attribute lookup
that returns `_PyStackRef` instead of `PyObject*`. This allows returning
deferred references in the free-threaded build, reducing reference count
contention when accessing type attributes.
This significantly improves scaling of namedtuple instantiation across
multiple threads.
* Add blurb
* Rename PyObject_GetAttrStackRef to _PyObject_GetAttrStackRef
* Apply suggestion from @vstinner
Co-authored-by: Victor Stinner <vstinner@python.org>
* Apply suggestion from @vstinner
Co-authored-by: Victor Stinner <vstinner@python.org>
* format
* Update Include/internal/pycore_function.h
Co-authored-by: Victor Stinner <vstinner@python.org>
---------
Co-authored-by: Victor Stinner <vstinner@python.org>
Now that the specializing interpreter works with free threading,
replace ENABLE_SPECIALIZATION_FT with ENABLE_SPECIALIZATION and
replace requires_specialization_ft with requires_specialization.
Also limit the uniquely referenced check to FOR_ITER_RANGE. It's not
necessary for FOR_ITER_GEN and would cause test_for_iter_gen to fail.
This adds a `_PyRecursiveMutex` type based on `PyMutex` and uses that
for the import lock. This fixes some data races in the free-threaded
build and generally simplifies the import lock code.
Use the new public Raw functions:
* _PyTime_PerfCounterUnchecked() with PyTime_PerfCounterRaw()
* _PyTime_TimeUnchecked() with PyTime_TimeRaw()
* _PyTime_MonotonicUnchecked() with PyTime_MonotonicRaw()
Remove internal functions:
* _PyTime_PerfCounterUnchecked()
* _PyTime_TimeUnchecked()
* _PyTime_MonotonicUnchecked()
Avoid detaching thread state when stopping the world. When re-attaching
the thread state, the thread would attempt to resume the top-most
critical section, which might now be held by a thread paused for our
stop-the-world request.