This roughly follows what was done for dictobject to make a lock-free
lookup operation. With this change, the set contains operation scales much
better when used from multiple-threads. The frozenset contains performance
seems unchanged (as already lock-free).
Summary of changes:
* refactor set_lookkey() into set_do_lookup() which now takes a function
pointer that does the entry comparison. This is similar to dictobject and
do_lookup(). In an optimized build, the comparison function is inlined and
there should be no performance cost to this.
* change set_do_lookup() to return a status separately from the entry value
* add set_compare_frozenset() and use if the object is a frozenset. For the
free-threaded build, this avoids some overhead (locking, atomic operations,
incref/decref on key)
* use FT_ATOMIC_* macros as needed for atomic loads and stores
* use a deferred free on the set table array, if shared (only on free-threaded
build, normal build always does an immediate free)
* for free-threaded build, use explicit for loop to zero the table, rather than memcpy()
* when mutating the set, assign so->table to NULL while the change is a
happening. Assign the real table array after the change is done.
Makes the zlib module thread-safe free-threading build. Even though operations
are protected by locks, attributes exposed via PyMemberDef (eof, needs_input,
unused_data, unconsumed_tail) should still be stored atomically within locked
sections, since they can be read without acquiring the lock.
Added atomic operations to `scanner_begin()` and `scanner_end()` to prevent
race conditions on the `executing` flag in free-threaded builds. Also added
tests for concurrent usage of the `re` module.
Without the atomic operations, `test_scanner_concurrent_access()` triggers
`assert(self->executing)` failures, or a thread sanitizer run emits errors.
Most of the `self.assertTrue(self.called)` checks are flaky because
the worker threads may sometimes finish before the main thread calls
`self.during_threads()`.
If we overflowed the global version counter (i.e., after 2*24 calls to
`_PyMonitoring_SetEvents`), we bailed out after setting global monitoring
events but before instrumenting code objects, which led to assertion errors
later on.
Also add a `time.sleep()` to `test_free_threading.test_monitoring` to avoid
overflowing the global version counter.
Added a critical section to protect the states of `ReaderObj` and `WriterObj` in the free-threading build. Without the critical sections, both new free-threading tests were crashing.
The methods are already wrapped with a lock, which makes them thread-safe in
free-threaded build. This replaces `PyThread_acquire_lock` with `PyMutex` and
removes some macros and allocation handling code.
Also add a test for free-threading to ensure we aren't getting data races and
that the locking is working.
There were a few thread-safety issues when profiling or tracing all
threads via PyEval_SetProfileAllThreads or PyEval_SetTraceAllThreads:
* The loop over thread states could crash if a thread exits concurrently
(in both the free threading and default build)
* The modification of `c_profilefunc` and `c_tracefunc` wasn't
thread-safe on the free threading build.
The `PyEval_SetProfileAllThreads` function and other related functions
had a race condition on `tstate->c_profilefunc` that could lead to a
crash when disable profiling or tracing on all threads while another
thread is starting to profile or trace a a call.
There are still potential crashes when threads exit concurrently with
profiling or tracing be enabled/disabled across all threads.
Make the setlogmask() function in the syslog module thread-safe. These changes are relevant for scenarios where the GIL is disabled or when using subinterpreters.
Make the pwd module functions getpwuid(), getpwnam(), and getpwall() thread-safe. These changes apply to scenarios where the GIL is disabled or in subinterpreter use cases.
Make grp module methods getgrgid() and getgrnam() thread-safe when the GIL is disabled and getgrgid_r()/getgrnam_r() C APIs are not available.
---------
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Previously, we assumed that instrumentation would happen for all copies of
the bytecode if the instrumentation version on the code object didn't match
the per-interpreter instrumentation version. That assumption was incorrect:
instrumentation will exit early if there are no new "events," even if there
is an instrumentation version mismatch.
To fix this, include the instrumented opcodes when creating new copies of
the bytecode, rather than replacing them with their uninstrumented variants.
I don't think we have to worry about races between instrumentation and creating
new copies of the bytecode: instrumentation and new bytecode creation cannot happen
concurrently. Instrumentation requires that either the world is stopped or the
code object's per-object lock is held and new bytecode creation requires holding
the code object's per-object lock.
Use critical sections to make heapq methods that update the heap thread-safe when the GIL is disabled.
---------
Co-authored-by: mpage <mpage@meta.com>
Fix race in `lru_cache` by acquiring critical section on the cache object itself and call the lock held variant of dict functions to modify the underlying dict.
Make `warnings.catch_warnings()` use a context variable for holding
the warning filtering state if the `sys.flags.context_aware_warnings`
flag is set to true. This makes using the context manager thread-safe in
multi-threaded programs.
Add the `sys.flags.thread_inherit_context` flag. If true, starting a new
thread with `threading.Thread` will use a copy of the context
from the caller of `Thread.start()`.
Both these flags are set to true by default for the free-threaded build
and false for the default build.
Move the Python implementation of warnings.py into _py_warnings.py.
Make _contextvars a builtin module.
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Concurrent accesses from multiple threads to the same `cell` object did not
scale well in the free-threaded build. Use `_PyStackRef` and optimistically
avoid locking to improve scaling.
With the locks around cell reads gone, some of the free threading tests were
prone to starvation: the readers were able to run in a tight loop and the
writer threads weren't scheduled frequently enough to make timely progress.
Adjust the tests to avoid this.
Co-authored-by: Donghee Na <donghee.na@python.org>
Make tuple iteration more thread-safe, and actually test concurrent iteration of tuple, range and list. (This is prep work for enabling specialization of FOR_ITER in free-threaded builds.) The basic premise is:
Iterating over a shared iterable (list, tuple or range) should be safe, not involve data races, and behave like iteration normally does.
Using a shared iterator should not crash or involve data races, and should only produce items regular iteration would produce. It is not guaranteed to produce all items, or produce each item only once. (This is not the case for range iteration even after this PR.)
Providing stronger guarantees is possible for some of these iterators, but it's not always straight-forward and can significantly hamper the common case. Since iterators in general aren't shared between threads, and it's simply impossible to concurrently use many iterators (like generators), better to make sharing iterators without explicit synchronization clearly wrong.
Specific issues fixed in order to make the tests pass:
- List iteration could occasionally fail an assertion when a shared list was shrunk and an item past the new end was retrieved concurrently. There's still some unsafety when deleting/inserting multiple items through for example slice assignment, which uses memmove/memcpy.
- Tuple iteration could occasionally crash when the iterator's reference to the tuple was cleared on exhaustion. Like with list iteration, in free-threaded builds we can't safely and efficiently clear the iterator's reference to the iterable (doing it safely would mean extra, slow refcount operations), so just keep the iterable reference around.
The call to `PySequence_List()` could temporarily unlock and relock the
set, allowing the items to be cleared and return the incorrect
notation `{}` for a empty set (it should be `set()`).
Co-authored-by: T. Wouters <thomas@python.org>
The `dict.get` implementation uses `_Py_dict_lookup_threadsafe`, which is
thread-safe, so we remove the critical section from the argument clinic.
Add a test for concurrent dict get and set operations.