The free threading build uses QSBR to delay the freeing of dictionary
keys and list arrays when the objects are accessed by multiple threads
in order to allow concurrent reads to proceed with holding the object
lock. The requests are processed in batches to reduce execution
overhead, but for large memory blocks this can lead to excess memory
usage.
Take into account the size of the memory block when deciding when to
process QSBR requests.
Also track the amount of memory being held by QSBR for mimalloc pages.
Advance the write sequence if this memory exceeds a limit. Advancing
the sequence will allow it to be freed more quickly.
Process the held QSBR items from the "eval breaker", rather than from
`_PyMem_FreeDelayed()`. This gives a higher chance that the global read
sequence has advanced enough so that items can be freed.
(cherry picked from commit 113de8545f)
Co-authored-by: Neil Schemenauer <nas-github@arctrix.com>
Co-authored-by: Sam Gross <colesbury@gmail.com>
Clean up redundant ifdef in list getitem (GH-128257)
It's already inside a `Py_GIL_DISABLED` block so the `#else` clause is always unused.
(cherry picked from commit 42f7a00ae8)
Co-authored-by: da-woods <dw-git@d-woods.co.uk>
In the free threading build, if a non-owning thread resizes a list,
it must use QSBR to free the old list array because there may be a
concurrent access (without a lock) from the owning thread.
To match the pattern in dictobject.c, we just mark the list as "shared"
before resizing if it's from a non-owning thread and not already marked
as shared.
(cherry picked from commit c7dec02de2)
Co-authored-by: Sam Gross <colesbury@gmail.com>
This makes the following macros public as part of the non-limited C-API for
locking a single object or two objects at once.
* `Py_BEGIN_CRITICAL_SECTION(op)` / `Py_END_CRITICAL_SECTION()`
* `Py_BEGIN_CRITICAL_SECTION2(a, b)` / `Py_END_CRITICAL_SECTION2()`
The supporting functions and structs used by the macros are also exposed for
cases where C macros are not available.
(cherry picked from commit 8f17d69b7b)
gh-119053: Implement the fast path for list.__getitem__ (gh-119112)
(cherry picked from commit ab4263a82a)
Co-authored-by: Donghee Na <donghee.na@python.org>
The `list_preallocate_exact` function did not zero initialize array
contents. In the free-threaded build, this could expose uninitialized
memory to concurrent readers between the call to
`list_preallocate_exact` and the filling of the array contents with
items.
(cherry picked from commit 2402715e10)
Co-authored-by: Sam Gross <colesbury@gmail.com>
Add a special case for `list.extend(dict)` and `list(dict)` so that those
patterns behave atomically with respect to modifications to the list or
dictionary.
This is required by multiprocessing, which assumes that
`list(_finalizer_registry)` is atomic.
Rewrote binarysort() for clarity.
Also changed the signature to be more coherent (it was mixing sortslice with raw pointers).
No change in method or functionality. However, I left some experiments in, disabled for now
via `#if` tricks. Since this code was first written, some kinds of comparisons have gotten
enormously faster (like for lists of floats), which changes the tradeoffs.
For example, plain insertion sort's simpler innermost loop and highly predictable branches
leave it very competitive (even beating, by a bit) binary insertion when comparisons are
very cheap, despite that it can do many more compares. And it wins big on runs that
are already sorted (moving the next one in takes only 1 compare then).
So I left code for a plain insertion sort, to make future experimenting easier.
Also made the maximum value of minrun a `#define` (``MAX_MINRUN`) to make
experimenting with that easier too.
And another bit of `#if``-disabled code rewrites binary insertion's innermost loop to
remove its unpredictable branch. Surprisingly, this doesn't really seem to help
overall. I'm unclear on why not. It certainly adds more instructions, but they're very
simple, and it's hard to be believe they cost as much as a branch miss.
* GH-116554: Relax list.sort()'s notion of "descending" run
Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal.
In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run).
While these have been in the back of my mind for years, a question on StackOverflow pushed it to action:
https://stackoverflow.com/questions/78108792/
They were wondering why it took about 4x longer to sort a list like:
[999_999, 999_999, ..., 2, 2, 1, 1, 0, 0]
than "similar" lists. Of course that runs very much faster after this patch.
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
The new `PyList_GetItemRef` is similar to `PyList_GetItem`, but returns
a strong reference instead of a borrowed reference. Additionally, if the
passed "list" object is not a list, the function sets a `TypeError`
instead of calling `PyErr_BadInternalCall()`.
Fix undefined behavior warnings (UBSan -fsanitize=function), for example:
Objects/object.c:674:11: runtime error: call to function list_repr through pointer to incorrect function type 'struct _object *(*)(struct _object *)'
listobject.c:382: note: list_repr defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Objects/object.c:674:11 in
* Split list_extend() into two sub-functions: list_extend_fast() and
list_extend_iter().
* list_inplace_concat() no longer has to call Py_DECREF() on the
list_extend() result, since list_extend() now returns an int.
gh-106168: Update the size only after setting the item, to avoid temporary inconsistencies.
Also remove the "what's new" sentence regarding the size setting since tuples cannot grow after allocation.
Move private _PyEval functions to the internal C API
(pycore_ceval.h):
* _PyEval_GetBuiltin()
* _PyEval_GetBuiltinId()
* _PyEval_GetSwitchInterval()
* _PyEval_MakePendingCalls()
* _PyEval_SetProfile()
* _PyEval_SetSwitchInterval()
* _PyEval_SetTrace()
No longer export most of these functions.