gh-146452: Fix pickle segfault on concurrent mutation of dict in pickle (GH-146470)
(cherry picked from commit e62a61177f)
Co-authored-by: Farhan Saif <fsaif@uic.edu>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Reduce NUMITEMS from 100000 to 5000. Peak RSS for the full
test_free_threading suite drops from ~850 MB to ~175 MB.
(cherry picked from commit 61f12211fc)
Co-authored-by: Sam Gross <colesbury@gmail.com>
In free-threaded builds, concurrent calls to PyDict_AddWatcher, PyDict_ClearWatcher, PyDict_Watch, and PyDict_Unwatch can race on the shared callback array and the per-dict watcher tags. This change adds a mutex to serialize watcher registration and removal, atomic operations for tag updates, and atomic acquire/release synchronization for callback dispatch in _PyDict_SendEvent.
(cherry picked from commit 8a4895985f)
Co-authored-by: Alper <alperyoney@fb.com>
Avoid racing with the owning thread's refcount operations when
immortalizing an interned string: if we don't own it and its refcount
isn't merged, intern a copy we own instead. Use atomic stores in
_Py_SetImmortalUntracked so concurrent atomic reads are race-free.
Fix thread-safety issues when accessing frame attributes while another
thread is executing the frame:
- Add critical section to frame_repr() to prevent races when accessing
the frame's code object and line number
- Add _Py_NO_SANITIZE_THREAD to PyUnstable_InterpreterFrame_GetLasti()
to allow intentional racy reads of instr_ptr.
- Fix take_ownership() to not write to the original frame's f_executable
Add a FRAME_SUSPENDED_YIELD_FROM_LOCKED state that acts as a brief
lock, preventing other threads from transitioning the frame state
while gen_getyieldfrom reads the yield-from object off the stack.
In `_PyDict_GetMethodStackRef`, only use the fast-path unicode lookup
when the dict is owned by the current thread or already marked as shared.
This prevents a race between the lookup and concurrent dict resizes,
which may free the PyDictKeysObject (i.e., it ensures that the resize
uses QSBR).
Address a similar issue in `_Py_dict_lookup_threadsafe_stackref` by
calling `ensure_shared_on_read()`.
This makes generator frame state transitions atomic in the free
threading build, which avoids segfaults when trying to execute
a generator from multiple threads concurrently.
There are still a few operations that aren't thread-safe and may crash
if performed concurrently on the same generator/coroutine:
* Accessing gi_yieldfrom/cr_await/ag_await
* Accessing gi_frame/cr_frame/ag_frame
* Async generator operations
Make the attributes in _bz2 module thread-safe on the free-threading build.
Attributes (eof, needs_input, unused_data) are now stored atomically or
accessed via mutex-protected getters.
This roughly follows what was done for dictobject to make a lock-free
lookup operation. With this change, the set contains operation scales much
better when used from multiple-threads. The frozenset contains performance
seems unchanged (as already lock-free).
Summary of changes:
* refactor set_lookkey() into set_do_lookup() which now takes a function
pointer that does the entry comparison. This is similar to dictobject and
do_lookup(). In an optimized build, the comparison function is inlined and
there should be no performance cost to this.
* change set_do_lookup() to return a status separately from the entry value
* add set_compare_frozenset() and use if the object is a frozenset. For the
free-threaded build, this avoids some overhead (locking, atomic operations,
incref/decref on key)
* use FT_ATOMIC_* macros as needed for atomic loads and stores
* use a deferred free on the set table array, if shared (only on free-threaded
build, normal build always does an immediate free)
* for free-threaded build, use explicit for loop to zero the table, rather than memcpy()
* when mutating the set, assign so->table to NULL while the change is a
happening. Assign the real table array after the change is done.
Makes the zlib module thread-safe free-threading build. Even though operations
are protected by locks, attributes exposed via PyMemberDef (eof, needs_input,
unused_data, unconsumed_tail) should still be stored atomically within locked
sections, since they can be read without acquiring the lock.
Added atomic operations to `scanner_begin()` and `scanner_end()` to prevent
race conditions on the `executing` flag in free-threaded builds. Also added
tests for concurrent usage of the `re` module.
Without the atomic operations, `test_scanner_concurrent_access()` triggers
`assert(self->executing)` failures, or a thread sanitizer run emits errors.
Most of the `self.assertTrue(self.called)` checks are flaky because
the worker threads may sometimes finish before the main thread calls
`self.during_threads()`.
If we overflowed the global version counter (i.e., after 2*24 calls to
`_PyMonitoring_SetEvents`), we bailed out after setting global monitoring
events but before instrumenting code objects, which led to assertion errors
later on.
Also add a `time.sleep()` to `test_free_threading.test_monitoring` to avoid
overflowing the global version counter.
Added a critical section to protect the states of `ReaderObj` and `WriterObj` in the free-threading build. Without the critical sections, both new free-threading tests were crashing.
The methods are already wrapped with a lock, which makes them thread-safe in
free-threaded build. This replaces `PyThread_acquire_lock` with `PyMutex` and
removes some macros and allocation handling code.
Also add a test for free-threading to ensure we aren't getting data races and
that the locking is working.
There were a few thread-safety issues when profiling or tracing all
threads via PyEval_SetProfileAllThreads or PyEval_SetTraceAllThreads:
* The loop over thread states could crash if a thread exits concurrently
(in both the free threading and default build)
* The modification of `c_profilefunc` and `c_tracefunc` wasn't
thread-safe on the free threading build.
The `PyEval_SetProfileAllThreads` function and other related functions
had a race condition on `tstate->c_profilefunc` that could lead to a
crash when disable profiling or tracing on all threads while another
thread is starting to profile or trace a a call.
There are still potential crashes when threads exit concurrently with
profiling or tracing be enabled/disabled across all threads.
Make the setlogmask() function in the syslog module thread-safe. These changes are relevant for scenarios where the GIL is disabled or when using subinterpreters.
Make the pwd module functions getpwuid(), getpwnam(), and getpwall() thread-safe. These changes apply to scenarios where the GIL is disabled or in subinterpreter use cases.
Make grp module methods getgrgid() and getgrnam() thread-safe when the GIL is disabled and getgrgid_r()/getgrnam_r() C APIs are not available.
---------
Co-authored-by: Kumar Aditya <kumaraditya@python.org>