Previously, the _BlocksOutputBuffer code creates a list of bytes objects to handle the output data from compression libraries. This ends up being slow due to the output buffer code needing to copy each bytes element of the list into the final bytes object buffer at the end of compression.
The new PyBytesWriter API introduced in PEP 782 is an ergonomic and fast method of writing data into a buffer that will later turn into a bytes object. Benchmarks show that using the PyBytesWriter API is 10-30% faster for decompression across a variety of settings. The performance gains are greatest when the decompressor is very performant, such as for Zstandard (and likely zlib-ng). Otherwise the decompressor can bottleneck decompression and the gains are more modest, but still sizable (e.g. 10% faster for zlib)!
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Fix memory leak in sub-interpreter creation caused by overwriting of the previously used `_malloced` field. Now the pointer is stored in the first word of the memory block to avoid it being overwritten accidentally.
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
* Move PyUnicode_Format() implementation from unicodeobject.c
to unicode_format.c.
* Replace unicode_modifiable() with _PyUnicode_IsModifiable()
* Add empty lines to have two empty lines between functions.
* Move Python/formatter_unicode.c to Objects/unicode_formatter.c.
* Move Objects/stringlib/localeutil.h content into
unicode_formatter.c. Remove localeutil.h.
* Move _PyUnicode_InsertThousandsGrouping() to unicode_formatter.c
and mark the function as static.
* Rename unicode_fill() to _PyUnicode_Fill() and export it in
pycore_unicodeobject.h.
* Move MAX_UNICODE to pycore_unicodeobject.h as _Py_MAX_UNICODE.
Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the
PyBytesWriter API.
Add _PyBytesWriter_GetSize() and _PyBytesWriter_GetData() static
inline functions.
With https://github.com/llvm/llvm-project/pull/150201 being merged, there is
now a better way to generate the Emscripten trampoline, instead of including
hand-generated binary WASM content. Requires Emscripten 4.0.12.
There was a deadlock originally seen by Memray when a daemon thread
enabled or disabled profiling while the interpreter was shutting down.
I think this could also happen with garbage collection, but I haven't
seen that in practice.
The daemon thread could be hung while trying acquire the global rwmutex
that prevents overlapping global and per-interpreter stop-the-world events.
Since it already held the main interpreter's stop-the-world lock, it
also deadlocked the main thread, which is trying to perform interpreter
finalization.
Swap the order of lock acquisition to prevent this deadlock.
Additionally, refactor `_PyParkingLot_Park` so that the global buckets
hashtable is left in a clean state if the thread is hung in
`PyEval_AcquireThread`.
Follow-up refactoring after GH-133143, where we use `_Py_ID` for "big" and "little"
in `abi_info.byteorder`.
This uses `_Py_ID` for `sys.byteorder`, but also `float_repr_style` and a module name.
This partially reverts #137047, keeping the tests for GC collectability of the
original class that dataclass adds `__slots__` to.
The reference leaks solved there are instead solved by having the `__dict__` &
`__weakref__` descriptors not tied to (and referencing) their class.
Instead, they're shared between all classes that need them (within
an interpreter).
The `__objclass__` ol the descriptors is set to `object`, since these
descriptors work with *any* object. (The appropriate checks were already
made in the get/set code, so the `__objclass__` check was redundant.)
The repr of these descriptors (and any others whose `__objclass__` is `object`)
now doesn't mention the objclass.
This change required adjustment of introspection code that checks
`__objclass__` to determine an object's “own” (i.e. not inherited) `__dict__`.
Third-party code that does similar introspection of the internals will also
need adjusting.
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
There were a few thread-safety issues when profiling or tracing all
threads via PyEval_SetProfileAllThreads or PyEval_SetTraceAllThreads:
* The loop over thread states could crash if a thread exits concurrently
(in both the free threading and default build)
* The modification of `c_profilefunc` and `c_tracefunc` wasn't
thread-safe on the free threading build.