Add special cases for classmethod and staticmethod descriptors in
_PyObject_GetMethodStackRef() to avoid calling tp_descr_get, which
avoids reference count contention on the bound method and underlying
callable. This improves scaling when calling classmethods and
staticmethods from multiple threads.
Also refactor method_vectorcall in classobject.c into a new _PyObject_VectorcallPrepend() helper so that it can be used by
PyObject_VectorcallMethod as well.
* Add the c_init_default attribute which is used to initialize the C variable
if the default is not explicitly provided.
* Add the c_default_init() method which is used to derive c_default from
default if c_default is not explicitly provided.
* Explicit c_default and py_default are now almost always have precedence
over the generated value.
* Add support for bytes literals as default values.
* Improve support for str literals as default values (support non-ASCII
and non-printable characters and special characters like backslash or quotes).
* Fix support for str and bytes literals containing trigraphs, "/*" and "*/".
* Improve support for default values in converters "char" and "int(accept={str})".
* Converter "int(accept={str})" now requires 1-character string instead of
integer as default value.
* Add support for non-None default values in converter "Py_buffer": NULL,
str and bytes literals.
* Improve error handling for invalid default values.
* Rename Null to NullType for consistency.
Setting the size to 0 turns the list contents into overallocated memory that the deallocator will free.
Ownership is transferred to the new tuple so no refcount adjustment is needed.
* Drop DOUBLE_IS_ARM_MIXED_ENDIAN_IEEE754 macro.
* Use DOUBLE_IS_BIG/LITTLE_ENDIAN_IEEE754 to detect endianness of
float/doubles.
* Drop "unknown_format" code path in PyFloat_Pack/Unpack*().
Co-authored-by: Victor Stinner <vstinner@python.org>
We already have a stop-the-world pause elsewhere in this code path
(type_set_bases) and this makes will make it easier to avoid contention
on the TYPE_LOCK when looking up names in the MRO hierarchy.
Also use deferred reference counting for non-immortal MROs.
Fix three issues that caused mimalloc pages to be leaked until the
owning thread exited:
1. In _PyMem_mi_page_maybe_free(), move pages out of the full queue
when relying on QSBR to defer freeing the page. Pages in the full
queue are never searched by mi_page_queue_find_free_ex(), so a page
left there is unusable for allocations.
2. Move _PyMem_mi_page_clear_qsbr() from _mi_page_free_collect() to
_mi_page_thread_free_collect() where it only fires when all blocks
on the page are free (used == 0). The previous placement was too
broad: it cleared QSBR state whenever local_free was non-NULL, but
_mi_page_free_collect() is called from non-allocation paths (e.g.,
page visiting in mi_heap_visit_blocks) where the page is not being
reused.
3. In _PyMem_mi_page_maybe_free(), use the page's heap tld to find the
correct thread state for QSBR list insertion instead of
PyThreadState_GET(). During stop-the-world pauses, the function may
process pages belonging to other threads, so the current thread
state is not necessarily the owner of the page.
Optimize memoryview comparison: a memoryview is equal to itself, there is no
need to compare values, except if it uses float format.
Benchmark comparing 1 MiB:
from timeit import timeit
with open("/dev/random", 'br') as fp:
data = fp.read(2**20)
view = memoryview(data)
LOOPS = 1_000
b = timeit('x == x', number=LOOPS, globals={'x': data})
m = timeit('x == x', number=LOOPS, globals={'x': view})
print("bytes %f seconds" % b)
print("mview %f seconds" % m)
print("=> %f time slower" % (m / b))
Result before the change:
bytes 0.000026 seconds
mview 1.445791 seconds
=> 55660.873940 time slower
Result after the change:
bytes 0.000026 seconds
mview 0.000028 seconds
=> 1.104382 time slower
This missed optimization was discovered by Pierre-Yves David
while working on Mercurial.
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>