Setting the size to 0 turns the list contents into overallocated memory that the deallocator will free.
Ownership is transferred to the new tuple so no refcount adjustment is needed.
The use of memmove and _Py_memory_repeat were not thread-safe in the
free threading build in some cases. In theory, memmove and
_Py_memory_repeat can copy byte-by-byte instead of pointer-by-pointer,
so concurrent readers could see uninitialized data or tearing.
Additionally, we should be using "release" (or stronger) ordering to be
compliant with the C11 memory model when copying objects within a list.
New scheme from Stefan Pochmann for picking minimum run lengths.
By allowing them to change a little from one run to the next, it's possible to
arrange for that all merges, at all levels, strongly tend to be as evenly balanced
as possible, for randomly ordered data. Meaning the number of initial runs is a
power of 2, and all merges involve runs whose lengths differ by no more than 1.
The free threading build uses QSBR to delay the freeing of dictionary
keys and list arrays when the objects are accessed by multiple threads
in order to allow concurrent reads to proceed with holding the object
lock. The requests are processed in batches to reduce execution
overhead, but for large memory blocks this can lead to excess memory
usage.
Take into account the size of the memory block when deciding when to
process QSBR requests.
Also track the amount of memory being held by QSBR for mimalloc pages. Advance the write sequence if this memory exceeds a limit. Advancing the sequence will allow it to be freed more quickly.
Process the held QSBR items from the "eval breaker", rather than from `_PyMem_FreeDelayed()`. This gives a higher chance that the global read sequence has advanced enough so that items can be freed.
Co-authored-by: Sam Gross <colesbury@gmail.com>
Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)
* Fix use after free in list objects
Set the items pointer in the list object to NULL after the items array
is freed during list deallocation. Otherwise, we can end up with a list
object added to the free list that contains a pointer to an already-freed
items array.
* Mark `_PyList_FromStackRefStealOnSuccess` as escaping
I think technically it's not escaping, because the only object that
can be decrefed if allocation fails is an exact list, which cannot
execute arbitrary code when it is destroyed. However, this seems less
intrusive than trying to special cases objects in the assert in `_Py_Dealloc`
that checks for non-null stackpointers and shouldn't matter for performance.
Make tuple iteration more thread-safe, and actually test concurrent iteration of tuple, range and list. (This is prep work for enabling specialization of FOR_ITER in free-threaded builds.) The basic premise is:
Iterating over a shared iterable (list, tuple or range) should be safe, not involve data races, and behave like iteration normally does.
Using a shared iterator should not crash or involve data races, and should only produce items regular iteration would produce. It is not guaranteed to produce all items, or produce each item only once. (This is not the case for range iteration even after this PR.)
Providing stronger guarantees is possible for some of these iterators, but it's not always straight-forward and can significantly hamper the common case. Since iterators in general aren't shared between threads, and it's simply impossible to concurrently use many iterators (like generators), better to make sharing iterators without explicit synchronization clearly wrong.
Specific issues fixed in order to make the tests pass:
- List iteration could occasionally fail an assertion when a shared list was shrunk and an item past the new end was retrieved concurrently. There's still some unsafety when deleting/inserting multiple items through for example slice assignment, which uses memmove/memcpy.
- Tuple iteration could occasionally crash when the iterator's reference to the tuple was cleared on exhaustion. Like with list iteration, in free-threaded builds we can't safely and efficiently clear the iterator's reference to the iterable (doing it safely would mean extra, slow refcount operations), so just keep the iterable reference around.
In the free threading build, if a non-owning thread resizes a list,
it must use QSBR to free the old list array because there may be a
concurrent access (without a lock) from the owning thread.
To match the pattern in dictobject.c, we just mark the list as "shared"
before resizing if it's from a non-owning thread and not already marked
as shared.
Replace the private _PyUnicodeWriter with the public PyUnicodeWriter.
Replace PyObject_Repr() + _PyUnicodeWriter_WriteStr()
with PyUnicodeWriter_WriteRepr().
This replaces `_PyList_FromArraySteal` with `_PyList_FromStackRefSteal`.
It's functionally equivalent, but takes a `_PyStackRef` array instead of
an array of `PyObject` pointers.
Co-authored-by: Ken Jin <kenjin@python.org>
This combines and updates our freelist handling to use a consistent
implementation. Objects in the freelist are linked together using the
first word of memory block.
If configured with freelists disabled, these operations are essentially
no-ops.
Make error message for index() methods consistent
Remove the repr of the searched value (which can be arbitrary large)
from ValueError messages for list.index(), range.index(), deque.index(),
deque.remove() and ShareableList.index(). Make the error messages
consistent with error messages for other index() and remove()
methods.
This makes the following macros public as part of the non-limited C-API for
locking a single object or two objects at once.
* `Py_BEGIN_CRITICAL_SECTION(op)` / `Py_END_CRITICAL_SECTION()`
* `Py_BEGIN_CRITICAL_SECTION2(a, b)` / `Py_END_CRITICAL_SECTION2()`
The supporting functions and structs used by the macros are also exposed for
cases where C macros are not available.
The `list_preallocate_exact` function did not zero initialize array
contents. In the free-threaded build, this could expose uninitialized
memory to concurrent readers between the call to
`list_preallocate_exact` and the filling of the array contents with
items.