Commit graph

78 commits

Author SHA1 Message Date
Alper
c13b59204a
gh-116738: use PyMutex in lzma module (#140711)
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
2025-11-12 02:01:55 +05:30
Emma Smith
f262297d52
gh-139877: Use PyBytesWriter in pycore_blocks_output_buffer.h (#139976)
Previously, the _BlocksOutputBuffer code creates a list of bytes objects to handle the output data from compression libraries. This ends up being slow due to the output buffer code needing to copy each bytes element of the list into the final bytes object buffer at the end of compression.

The new PyBytesWriter API introduced in PEP 782 is an ergonomic and fast method of writing data into a buffer that will later turn into a bytes object. Benchmarks show that using the PyBytesWriter API is 10-30% faster for decompression across a variety of settings. The performance gains are greatest when the decompressor is very performant, such as for Zstandard (and likely zlib-ng). Otherwise the decompressor can bottleneck decompression and the gains are more modest, but still sizable (e.g. 10% faster for zlib)!

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-10-14 10:03:55 -07:00
Sergey Miryanov
1588413ca7
gh-116946: remove unnecessary gc from immutable types (#139073) 2025-10-01 13:15:58 +05:30
Victor Stinner
7168e98c80
gh-129813, PEP 782: Use PyBytesWriter in lzma and zlib (#138832)
Replace PyBytes_FromStringAndSize(NULL, size) with the new public
PyBytesWriter API.
2025-09-13 19:27:04 +02:00
Victor Stinner
06b7891f12
gh-129813, PEP 782: Use Py_GetConstant(Py_CONSTANT_EMPTY_BYTES) (#138830)
Replace PyBytes_FromStringAndSize(NULL, 0) with
Py_GetConstant(Py_CONSTANT_EMPTY_BYTES). Py_GetConstant() cannot
fail.
2025-09-13 18:30:25 +02:00
Bénédikt Tran
c9b252c2c0
gh-116946: Revert GC protocol for immutable empty heap types (GH-138266, GH-138288, GH-138289) (#138338)
* Revert "gh-116946: fully implement GC protocol for `bz2` objects (#138266)"

This reverts commit 9be91f6a20.

* Revert "gh-116946: fully implement GC protocol for `lzma` objects (#138288)"

This reverts commit 3ea16f990f.

* Revert "gh-116946: fully implement GC protocol for `_hashlib` objects (#138289)"

This reverts commit 6f1dd9551a.
2025-09-01 21:15:11 +05:30
Bénédikt Tran
3ea16f990f
gh-116946: fully implement GC protocol for lzma objects (#138288) 2025-09-01 10:22:43 +02:00
Adam Turner
918e3ba6c0
GH-137623: Use an AC decorator for docstring line length enforcement (#137690) 2025-08-18 18:29:00 +01:00
Serhiy Storchaka
4c914e7a36
gh-133583: Add support for fixed size unsigned integers in argument parsing (GH-133584)
* Add Argument Clinic converters: uint8, uint16, uint32, uint64.
* Add private C API: _PyLong_UInt8_Converter(),
  _PyLong_UInt16_Converter(), _PyLong_UInt32_Converter(),
  _PyLong_UInt64_Converter().
2025-05-08 12:27:50 +03:00
Serhiy Storchaka
a64fdc7513
gh-132987: Support __index__() in the lzma module (GH-133099) 2025-04-29 14:14:33 +00:00
Bénédikt Tran
edbf7fb129
gh-111178: remove redundant casts for functions with correct signatures (#131673) 2025-04-01 17:18:11 +02:00
Bénédikt Tran
185ac5adaf
gh-111178: fix UBSan failures in Modules/_lzmamodule.c (GH-129783)
fix UBSan failures for LZMA objects

suppress unused return values
2025-02-18 14:48:21 +01:00
Brett Simmers
c2627d6eea
gh-116322: Add Py_mod_gil module slot (#116882)
This PR adds the ability to enable the GIL if it was disabled at
interpreter startup, and modifies the multi-phase module initialization
path to enable the GIL when loading a module, unless that module's spec
includes a slot indicating it can run safely without the GIL.

PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went
with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148.

A warning will be issued up to once per interpreter for the first
GIL-using module that is loaded. If `-v` is given, a shorter message
will be printed to stderr every time a GIL-using module is loaded
(including the first one that issues a warning).
2024-05-03 11:30:55 -04:00
Radislav Chugunov
0154405350
gh-104282: Fix null pointer dereference in lzma._decode_filter_properties (GH-104283) 2024-01-17 13:15:44 +00:00
Victor Stinner
21c0844742
gh-108220: Internal header files require Py_BUILD_CORE to be defined (#108221)
* pycore_intrinsics.h does nothing if included twice
  (add #ifndef and #define).
* Update Tools/cases_generator/generate_cases.py to generate the
  Py_BUILD_CORE test.
* _bz2, _lzma, _opcode and zlib extensions now define the
  Py_BUILD_CORE_MODULE macro to use internal headers
  (pycore_code.h, pycore_intrinsics.h and pycore_blocks_output_buffer.h).
2023-08-21 19:15:52 +02:00
Victor Stinner
1a3faba9f1
gh-106869: Use new PyMemberDef constant names (#106871)
* Remove '#include "structmember.h"'.
* If needed, add <stddef.h> to get offsetof() function.
* Update Parser/asdl_c.py to regenerate Python/Python-ast.c.
* Replace:

  * T_SHORT => Py_T_SHORT
  * T_INT => Py_T_INT
  * T_LONG => Py_T_LONG
  * T_FLOAT => Py_T_FLOAT
  * T_DOUBLE => Py_T_DOUBLE
  * T_STRING => Py_T_STRING
  * T_OBJECT => _Py_T_OBJECT
  * T_CHAR => Py_T_CHAR
  * T_BYTE => Py_T_BYTE
  * T_UBYTE => Py_T_UBYTE
  * T_USHORT => Py_T_USHORT
  * T_UINT => Py_T_UINT
  * T_ULONG => Py_T_ULONG
  * T_STRING_INPLACE => Py_T_STRING_INPLACE
  * T_BOOL => Py_T_BOOL
  * T_OBJECT_EX => Py_T_OBJECT_EX
  * T_LONGLONG => Py_T_LONGLONG
  * T_ULONGLONG => Py_T_ULONGLONG
  * T_PYSSIZET => Py_T_PYSSIZET
  * T_NONE => _Py_T_NONE
  * READONLY => Py_READONLY
  * PY_AUDIT_READ => Py_AUDIT_READ
  * READ_RESTRICTED => Py_AUDIT_READ
  * PY_WRITE_RESTRICTED => _Py_WRITE_RESTRICTED
  * RESTRICTED => (READ_RESTRICTED | _Py_WRITE_RESTRICTED)
2023-07-25 15:28:30 +02:00
Serhiy Storchaka
329e4a1a3f
gh-86493: Modernize modules initialization code (GH-106858)
Use PyModule_Add() or PyModule_AddObjectRef() instead of soft deprecated
PyModule_AddObject().
2023-07-25 14:34:49 +03:00
Serhiy Storchaka
4bf43710d1
gh-106307: C API: Add PyMapping_GetOptionalItem() function (GH-106308)
Also add PyMapping_GetOptionalItemString() function.
2023-07-11 23:04:12 +03:00
Inada Naoki
d5bd32fb48
gh-104922: remove PY_SSIZE_T_CLEAN (#106315) 2023-07-02 15:07:46 +09:00
Eric Snow
a9c6e0618f
gh-99113: Add Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (gh-104205)
Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules.  We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204).
2023-05-05 21:11:27 +00:00
Zackery Spytz
665730d217
bpo-23224: Fix segfaults and multiple leaks in the lzma and bz2 modules (GH-7822)
lzma.LZMADecompressor and bz2.BZ2Decompressor objects caused
segfaults when their `__init__()` methods were not called.

lzma.LZMADecompressor, lzma.LZMACompressor, bz2.BZ2Compressor,
and bz2.BZ2Decompressor objects would leak locks and internal buffers
when their `__init__()` methods were called multiple times.


https://bugs.python.org/issue23224
2023-02-23 06:00:58 -08:00
Dong-hee Na
d168c728f7
bpo-46541: Remove usage of _Py_IDENTIFIER from lzma module (GH-31683) 2022-03-05 01:38:56 +09:00
Eric Snow
81c72044a1
bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928)
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code.  It is still used in a number of non-builtin stdlib modules.

The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime.  A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).

https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.

The core of the change is in:

* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers

I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings.  That check is added to the PR CI config.

The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()).  This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.

The following are not changed (yet):

* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init

https://bugs.python.org/issue46541
2022-02-08 13:39:07 -07:00
Victor Stinner
aac29af678
bpo-45434: pyport.h no longer includes <stdlib.h> (GH-28914)
Include <stdlib.h> explicitly in C files.

Python.h includes <wchar.h>.
2021-10-13 19:25:53 +02:00
Victor Stinner
c63623a0a6
bpo-45434: bytearrayobject.h no longer includes <stdarg.h> (GH-28913)
bytearrayobject.h and _lzmamodule.c don't use va_list and so don't
need to include <stdarg.h>.
2021-10-13 04:37:55 +02:00
Erlend Egeberg Aasland
00710e6346
bpo-43908: Make heap types converted during 3.10 alpha immutable (GH-26351)
* Make functools types immutable

* Multibyte codec types are now immutable

* pyexpat.xmlparser is now immutable

* array.arrayiterator is now immutable

* _thread types are now immutable

* _csv types are now immutable

* _queue.SimpleQueue is now immutable

* mmap.mmap is now immutable

* unicodedata.UCD is now immutable

* sqlite3 types are now immutable

* _lsprof.Profiler is now immutable

* _overlapped.Overlapped is now immutable

* _operator types are now immutable

* winapi__overlapped.Overlapped is now immutable

* _lzma types are now immutable

* _bz2 types are now immutable

* _dbm.dbm and _gdbm.gdbm are now immutable
2021-06-17 11:06:09 +01:00
Ma Lin
251ffa9d2b
bpo-41486: Fix initial buffer size can't > UINT32_MAX in zlib module (GH-25738)
* Fix initial buffer size can't > UINT32_MAX in zlib module

After commit f9bedb630e, in 64-bit build,
if the initial buffer size > UINT32_MAX, ValueError will be raised.

These two functions are affected:
1. zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE)
2. zlib.Decompress.flush([length])

This commit re-allows the size > UINT32_MAX.

* adds curly braces per PEP 7.

* Renames `Buffer_*` to `OutputBuffer_*` for clarity
2021-04-30 16:32:49 -07:00
Ma Lin
f9bedb630e
bpo-41486: Faster bz2/lzma/zlib via new output buffering (GH-21740)
Faster bz2/lzma/zlib via new output buffering.
Also adds .readall() function to _compression.DecompressReader class
to take best advantage of this in the consume-all-output at once scenario.

Often a 5-20% speedup in common scenarios due to less data copying.

Contributed by Ma Lin.
2021-04-27 23:58:54 -07:00
Serhiy Storchaka
8cd1dbae32
bpo-41052: Fix pickling heap types implemented in C with protocols 0 and 1 (GH-22870) 2020-10-24 21:14:23 +03:00
Dong-hee Na
1937edd376
bpo-1635741: Port _lzma module to multiphase initialization (GH-19382) 2020-06-23 00:53:07 +09:00
Victor Stinner
4a21e57fe5
bpo-40268: Remove unused structmember.h includes (GH-19530)
If only offsetof() is needed: include stddef.h instead.

When structmember.h is used, add a comment explaining that
PyMemberDef is used.
2020-04-15 02:35:41 +02:00
Victor Stinner
62183b8d6d
bpo-40268: Remove explicit pythread.h includes (#19529)
Remove explicit pythread.h includes: it is always included
by Python.h.
2020-04-15 02:04:42 +02:00
Andy Lester
7668a8bc93
Use calloc-based functions, not malloc. (GH-19152) 2020-03-24 23:26:44 -05:00
Dong-hee Na
05e4a296ec
bpo-40024: Add PyModule_AddType() helper function (GH-19088) 2020-03-22 17:17:34 +01:00
animalize
4ffd05d7ec bpo-21872: fix lzma library decompresses data incompletely (GH-14048)
* 1. add test case with wrong behavior
* 2. fix bug when max_length == -1
* 3. allow b"" as valid input data for decompress_buf()
* 4. when max_length >= 0, let needs_input mechanism works
* add more asserts to test case
2019-09-12 15:20:37 +01:00
Jeroen Demeyer
530f506ac9 bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async (GH-13464)
Automatically replace
tp_print -> tp_vectorcall_offset
tp_compare -> tp_as_async
tp_reserved -> tp_as_async
2019-05-30 19:13:39 -07:00
Serhiy Storchaka
d53fe5f407
bpo-36254: Fix invalid uses of %d in format strings in C. (GH-12264) 2019-03-13 22:59:55 +02:00
Serhiy Storchaka
0353b4eaaf
bpo-33138: Change standard error message for non-pickleable and non-copyable types. (GH-6239) 2018-10-31 02:28:07 +02:00
Alexey Izbyshev
3d4fabb2a4 bpo-35090: Fix potential division by zero in allocator wrappers (GH-10174)
* Fix potential division by zero in BZ2_Malloc()
* Avoid division by zero in PyLzma_Malloc()
* Avoid division by zero and integer overflow in PyZlib_Malloc()

Reported by Svace static analyzer.
2018-10-28 17:45:50 +01:00
Victor Stinner
9b7cf75721
bpo-33916: Fix bz2 and lzma init when called twice (GH-7843)
bz2, lzma: When Decompressor.__init__() is called twice, free the old
lock to not leak memory.
2018-06-23 10:35:23 +02:00
Oren Milman
d019bc8319 bpo-31787: Prevent refleaks when calling __init__() more than once (GH-3995) 2018-02-13 19:28:33 +09:00
Antoine Pitrou
a6a4dc816d bpo-31370: Remove support for threads-less builds (#3385)
* Remove Setup.config
* Always define WITH_THREAD for compatibility.
2017-09-07 18:56:24 +02:00
Ville Skyttä
49b2734bf1 Spelling fixes (#2902) 2017-08-03 09:00:59 +03:00
Serhiy Storchaka
88b2219358 Issue #27517: LZMA compressor and decompressor no longer raise exceptions if
given empty data twice.  Patch by Benjamin Fogle.
2016-10-31 08:31:13 +02:00
Serhiy Storchaka
04f17f103a Issue #27517: LZMA compressor and decompressor no longer raise exceptions if
given empty data twice.  Patch by Benjamin Fogle.
2016-10-31 08:30:09 +02:00
Serhiy Storchaka
a12e7842a5 Issue #28275: Fixed possible use adter free in LZMADecompressor.decompress().
Original patch by John Leitch.
2016-09-27 20:23:41 +03:00
Serhiy Storchaka
c0b7037d4f Issue #28275: Fixed possible use adter free in LZMADecompressor.decompress().
Original patch by John Leitch.
2016-09-27 20:14:26 +03:00
Benjamin Peterson
af580dff4a replace PY_LONG_LONG with long long 2016-09-06 10:46:49 -07:00
Serhiy Storchaka
2954f83999 - Issue #27332: Fixed the type of the first argument of module-level functions
generated by Argument Clinic.  Patch by Petr Viktorin.
2016-07-07 18:20:03 +03:00
Serhiy Storchaka
1a2b24f02d Issue #27332: Fixed the type of the first argument of module-level functions
generated by Argument Clinic.  Patch by Petr Viktorin.
2016-07-07 17:35:15 +03:00