We already have a stop-the-world pause elsewhere in this code path
(type_set_bases) and this makes will make it easier to avoid contention
on the TYPE_LOCK when looking up names in the MRO hierarchy.
Also use deferred reference counting for non-immortal MROs.
Fix three issues that caused mimalloc pages to be leaked until the
owning thread exited:
1. In _PyMem_mi_page_maybe_free(), move pages out of the full queue
when relying on QSBR to defer freeing the page. Pages in the full
queue are never searched by mi_page_queue_find_free_ex(), so a page
left there is unusable for allocations.
2. Move _PyMem_mi_page_clear_qsbr() from _mi_page_free_collect() to
_mi_page_thread_free_collect() where it only fires when all blocks
on the page are free (used == 0). The previous placement was too
broad: it cleared QSBR state whenever local_free was non-NULL, but
_mi_page_free_collect() is called from non-allocation paths (e.g.,
page visiting in mi_heap_visit_blocks) where the page is not being
reused.
3. In _PyMem_mi_page_maybe_free(), use the page's heap tld to find the
correct thread state for QSBR list insertion instead of
PyThreadState_GET(). During stop-the-world pauses, the function may
process pages belonging to other threads, so the current thread
state is not necessarily the owner of the page.
Optimize memoryview comparison: a memoryview is equal to itself, there is no
need to compare values, except if it uses float format.
Benchmark comparing 1 MiB:
from timeit import timeit
with open("/dev/random", 'br') as fp:
data = fp.read(2**20)
view = memoryview(data)
LOOPS = 1_000
b = timeit('x == x', number=LOOPS, globals={'x': data})
m = timeit('x == x', number=LOOPS, globals={'x': view})
print("bytes %f seconds" % b)
print("mview %f seconds" % m)
print("=> %f time slower" % (m / b))
Result before the change:
bytes 0.000026 seconds
mview 1.445791 seconds
=> 55660.873940 time slower
Result after the change:
bytes 0.000026 seconds
mview 0.000028 seconds
=> 1.104382 time slower
This missed optimization was discovered by Pierre-Yves David
while working on Mercurial.
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
PyDict_Contains() and PyDict_ContainsString() now fail with
SystemError if the first argument is not a dict, frozendict, dict
subclass or frozendict subclass.
PyDict_MergeFromSeq2() now fails with SystemError if the first
argument is not a dict or a dict subclass.
PyDict_Update(), PyDict_Merge() and _PyDict_MergeEx() no longer
accept frozendict.
can_modify_dict() is stricter than ASSERT_DICT_LOCKED() for
frozendict. It uses PyUnstable_Object_IsUniquelyReferenced() which
matters for free-threaded builds.
Replace anydict_setitem_take2() with setitem_take2_lock_held(). It's
no longer useful to have two functions.
Add TYPE_FROZENDICT to the marshal module.
Add C API functions:
* PyAnyDict_Check()
* PyAnyDict_CheckExact()
* PyFrozenDict_Check()
* PyFrozenDict_CheckExact()
* PyFrozenDict_New()
Add PyFrozenDict_Type C type.
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Adam Johnson <me@adamj.eu>
Co-authored-by: Benedikt Johannes <benedikt.johannes.hofer@gmail.com>
When integrating slots-based module creation is with the inittab,
which currently requires PyModuleDef, it would be convenient to
reuse the the same slots array for the MethodDef.
Allow slots that match what's already present in the PyModuleDef.
Fix thread-safety issues when accessing frame attributes while another
thread is executing the frame:
- Add critical section to frame_repr() to prevent races when accessing
the frame's code object and line number
- Add _Py_NO_SANITIZE_THREAD to PyUnstable_InterpreterFrame_GetLasti()
to allow intentional racy reads of instr_ptr.
- Fix take_ownership() to not write to the original frame's f_executable
Add `_Py_type_getattro_stackref`, a variant of type attribute lookup
that returns `_PyStackRef` instead of `PyObject*`. This allows returning
deferred references in the free-threaded build, reducing reference count
contention when accessing type attributes.
This significantly improves scaling of namedtuple instantiation across
multiple threads.
* Add blurb
* Rename PyObject_GetAttrStackRef to _PyObject_GetAttrStackRef
* Apply suggestion from @vstinner
Co-authored-by: Victor Stinner <vstinner@python.org>
* Apply suggestion from @vstinner
Co-authored-by: Victor Stinner <vstinner@python.org>
* format
* Update Include/internal/pycore_function.h
Co-authored-by: Victor Stinner <vstinner@python.org>
---------
Co-authored-by: Victor Stinner <vstinner@python.org>