Commit graph

90 commits

Author SHA1 Message Date
Victor Stinner
7aa353c414
gh-142217: Deprecate the private _Py_Identifier C API (#142221)
Deprecate functions:

* _PyObject_CallMethodId()
* _PyObject_GetAttrId()
* _PyUnicode_FromId()
2025-12-12 14:10:25 +01:00
Victor Stinner
3bacae5598
gh-131510: Use PyUnstable_Unicode_GET_CACHED_HASH() (GH-141520)
Replace code that directly accesses PyASCIIObject.hash with
PyUnstable_Unicode_GET_CACHED_HASH().

Remove redundant "assert(PyUnicode_Check(op))" from
PyUnstable_Unicode_GET_CACHED_HASH(), _PyASCIIObject_CAST() already
implements the check.
2025-11-14 11:13:24 +01:00
Arseniy Terekhin
a6566e49e6
gh-92536: Fix comment about number of unicode string types (#136439) 2025-07-08 23:49:55 +02:00
Petr Viktorin
49d72365cd
gh-127545: Add _Py_ALIGNED_DEF(N, T) and use it for PyObject (GH-135209)
* Replace _Py_ALIGN_AS(V) by _Py_ALIGNED_DEF(N, T)

This is now a common façade for the various `_Alignas` alternatives,
which behave in interesting ways -- see the source comment.

The new macro (and MSVC's `__declspec(align)`) should not be used
on a variable/member declaration that includes a struct declaraton.
A workaround is to separate the struct definition.
Do that for `PyASCIIObject.state`.

* Specify minimum PyGC_Head and PyObject alignment

As documented in InternalDocs/garbage_collector.md, the garbage collector
stores flags in the least significant two bits of the _gc_prev pointer
in struct PyGC_Head. Consequently, this pointer is only capable of storing
a location that's aligned to a 4-byte boundary.

Encode this requirement using _Py_ALIGNED_DEF.

This patch fixes a segfault in m68k, which was previously investigated
by Adrian Glaubitz here:
https://lists.debian.org/debian-68k/2024/11/msg00020.html
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087600
Original patch (using the GCC-only Py_ALIGNED) by Finn Thain.

Co-authored-by: Finn Thain <fthain@linux-m68k.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2025-06-11 12:44:58 +02:00
Petr Viktorin
e413e26719
gh-134891: Add PyUnstable_Unicode_GET_CACHED_HASH (GH-134892) 2025-06-06 15:51:00 +02:00
Victor Stinner
f49a07b531
gh-133968: Add PyUnicodeWriter_WriteASCII() function (#133973)
Replace most PyUnicodeWriter_WriteUTF8() calls with
PyUnicodeWriter_WriteASCII().

Unrelated change to please the linter: remove an unused
import in test_ctypes.

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-05-29 14:54:30 +00:00
Petr Viktorin
987e45e632
gh-128972: Add _Py_ALIGN_AS and revert PyASCIIObject memory layout. (GH-133085)
Add `_Py_ALIGN_AS` as per C API WG vote: https://github.com/capi-workgroup/decisions/issues/61
This patch only adds it to free-threaded builds; the `#ifdef Py_GIL_DISABLED`
can be removed in the future.

Use this to revert `PyASCIIObject` memory layout for non-free-threaded builds.
The long-term plan is to deprecate the entire struct; until that happens
it's better to keep it unchanged, as courtesy to people that rely on it despite
it not being stable ABI.
2025-05-02 18:30:40 +02:00
Sergey Miryanov
3a7f17c7e2
gh-130790: Remove references about unicode's readiness from comments (#130801) 2025-03-03 19:18:09 +00:00
Petr Viktorin
e21863ce78
gh-46236: PyUnicode docs improvements (GH-129966)
Move deprecated PyUnicode API docs to new section

Move Py_UNICODE to a new "Deprecated API" section.

Formally soft-deprecate PyUnicode_READY, and move it

Document and soft-deprecate PyUnicode_IS_READY, and move it

Document PyUnicode_IS_ASCII, PyUnicode_CHECK_INTERNED

PyUnicode_New docs: Clarify requirements for "fresh" strings

PyUnicodeWriter_DecodeUTF8Stateful: Link "error-handlers"



Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2025-02-28 15:11:44 +01:00
Victor Stinner
519c2c6740
gh-128863: Deprecate the private _PyUnicodeWriter API (#129245)
Deprecate private C API functions:

* _PyUnicodeWriter_Init()
* _PyUnicodeWriter_Finish()
* _PyUnicodeWriter_Dealloc()
* _PyUnicodeWriter_WriteChar()
* _PyUnicodeWriter_WriteStr()
* _PyUnicodeWriter_WriteSubstring()
* _PyUnicodeWriter_WriteASCIIString()
* _PyUnicodeWriter_WriteLatin1String()

These functions are not deprecated in the internal C API (if the
Py_BUILD_CORE macro is defined).
2025-02-20 14:02:02 +01:00
Victor Stinner
a810cb89f1
gh-89188: Implement PyUnicode_KIND() as a function (#129412)
Implement PyUnicode_KIND() and PyUnicode_DATA() as function, in
addition to the macros with the same names. The macros rely on C bit
fields which have compiler-specific layout.
2025-01-30 11:27:27 +00:00
Victor Stinner
9012fa741d
gh-128863: Deprecate private C API functions (#128864)
Deprecate private C API functions:

* _PyBytes_Join()
* _PyDict_GetItemStringWithError()
* _PyDict_Pop()
* _PyThreadState_UncheckedGet()
* _PyUnicode_AsString()
* _Py_HashPointer()
* _Py_fopen_obj()

Replace _Py_HashPointer() with Py_HashPointer().

Remove references to deprecated functions.
2025-01-22 11:04:19 +00:00
Donghee Na
ae23a012e6
gh-128137: Update PyASCIIObject to handle interned field with the atomic operation (gh-128196) 2025-01-05 18:17:06 +09:00
Victor Stinner
2e157851e3
gh-119182: Add PyUnicodeWriter_WriteUCS4() function (#120849) 2024-06-24 17:40:39 +02:00
Victor Stinner
4123226bbd
gh-119182: Add PyUnicodeWriter_DecodeUTF8Stateful() (#120639)
Add PyUnicodeWriter_WriteWideChar() and
PyUnicodeWriter_DecodeUTF8Stateful() functions.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2024-06-21 19:33:15 +02:00
Victor Stinner
5c4235cd8c
gh-119182: Add PyUnicodeWriter C API (#119184) 2024-06-17 17:10:52 +02:00
Victor Stinner
58469244be
gh-112026: Restore removed private C API (#112115)
Restore removed private C API functions, macros and structures which
have no simple replacement for now:

* _PyDict_GetItem_KnownHash()
* _PyDict_NewPresized()
* _PyHASH_BITS
* _PyHASH_IMAG
* _PyHASH_INF
* _PyHASH_MODULUS
* _PyHASH_MULTIPLIER
* _PyLong_Copy()
* _PyLong_FromDigits()
* _PyLong_New()
* _PyLong_Sign()
* _PyObject_CallMethodId()
* _PyObject_CallMethodNoArgs()
* _PyObject_CallMethodOneArg()
* _PyObject_CallOneArg()
* _PyObject_EXTRA_INIT
* _PyObject_FastCallDict()
* _PyObject_GetAttrId()
* _PyObject_Vectorcall()
* _PyObject_VectorcallMethod()
* _PyStack_AsDict()
* _PyThread_CurrentFrames()
* _PyUnicodeWriter structure
* _PyUnicodeWriter_Dealloc()
* _PyUnicodeWriter_Finish()
* _PyUnicodeWriter_Init()
* _PyUnicodeWriter_Prepare()
* _PyUnicodeWriter_PrepareKind()
* _PyUnicodeWriter_WriteASCIIString()
* _PyUnicodeWriter_WriteChar()
* _PyUnicodeWriter_WriteLatin1String()
* _PyUnicodeWriter_WriteStr()
* _PyUnicodeWriter_WriteSubstring()
* _PyUnicode_AsString()
* _PyUnicode_FromId()
* _PyVectorcall_Function()
* _Py_HashDouble()
* _Py_HashPointer()
* _Py_IDENTIFIER()
* _Py_c_abs()
* _Py_c_diff()
* _Py_c_neg()
* _Py_c_pow()
* _Py_c_prod()
* _Py_c_quot()
* _Py_c_sum()
* _Py_static_string()
* _Py_static_string_init()
2023-11-15 16:38:31 +00:00
Victor Stinner
11e83488c5
gh-111089: Revert PyUnicode_AsUTF8() changes (#111833)
* Revert "gh-111089: Use PyUnicode_AsUTF8() in Argument Clinic (#111585)"

This reverts commit d9b606b3d0.

* Revert "gh-111089: Use PyUnicode_AsUTF8() in getargs.c (#111620)"

This reverts commit cde1071b2a.

* Revert "gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091)"

This reverts commit d731579bfb.

* Revert "gh-111089: Add PyUnicode_AsUTF8() to the limited C API (#111121)"

This reverts commit d8f32be5b6.

* Revert "gh-111089: Use PyUnicode_AsUTF8() in sqlite3 (#111122)"

This reverts commit 37e4e20eaa.
2023-11-07 22:36:13 +00:00
Victor Stinner
d8f32be5b6
gh-111089: Add PyUnicode_AsUTF8() to the limited C API (#111121)
Add PyUnicode_AsUTF8() function to the limited C API.

multiprocessing posixshmem now uses PyUnicode_AsUTF8() instead of
PyUnicode_AsUTF8AndSize(): the extension is built with the limited C
API. The function now raises an exception if the filename contains an
embedded null character instead of truncating silently the filename.
2023-10-20 19:29:27 +02:00
Victor Stinner
d731579bfb
gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091)
* PyUnicode_AsUTF8() now raises an exception if the string contains
  embedded null characters.
* Update related C API tests (test_capi.test_unicode).
* type_new_set_doc() uses PyUnicode_AsUTF8AndSize() to silently
  truncate doc containing null bytes.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2023-10-20 17:59:29 +02:00
Eric Snow
b72947a8d2
gh-106931: Intern Statically Allocated Strings Globally (gh-107272)
We tried this before with a dict and for all interned strings.  That ran into problems due to interpreter isolation.  However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning.  Here we circle back to using a global cache, but only for statically allocated strings.  We also use a more-basic _Py_hashtable_t for that global cache instead of a dict.

Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter.  Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter.
2023-07-27 13:56:59 -06:00
Victor Stinner
d27eb1e406
gh-106320: Remove private _PyUnicode C API (#107185)
Move private _PyUnicode functions to the internal C API
(pycore_unicodeobject.h):

* _PyUnicode_IsCaseIgnorable()
* _PyUnicode_IsCased()
* _PyUnicode_IsXidContinue()
* _PyUnicode_IsXidStart()
* _PyUnicode_ToFoldedFull()
* _PyUnicode_ToLowerFull()
* _PyUnicode_ToTitleFull()
* _PyUnicode_ToUpperFull()

No longer export these functions.
2023-07-24 18:26:29 +00:00
Victor Stinner
463b56da12
gh-106320: Remove private _PyUnicode_AsString() alias (#107021)
Remove private _PyUnicode_AsString() alias to PyUnicode_AsUTF8(). It
was kept for backward compatibility with Python 3.0 - 3.2.

The PyUnicode_AsUTF8() is available since Python
3.3. The PyUnicode_AsUTF8String() function can be used to keep
compatibility with Python 3.2 and older.
2023-07-22 13:15:05 +00:00
Victor Stinner
8a73b57b9b
gh-106320: Remove _PyUnicode_TransformDecimalAndSpaceToASCII() (#106398)
Remove private _PyUnicode_TransformDecimalAndSpaceToASCII() and other
private _PyUnicode C API functions: move them to the internal C API
(pycore_unicodeobject.h). No longer most of these functions.

Replace _testcapi.unicode_transformdecimalandspacetoascii() with
_testinternal._PyUnicode_TransformDecimalAndSpaceToASCII().
2023-07-04 08:59:09 +00:00
Victor Stinner
d8c5d76da2
gh-106320: Remove private _PyUnicode codecs C API functions (#106385)
Remove private _PyUnicode codecs C API functions: move them to the
internal C API (pycore_unicodeobject.h). No longer export most of
these functions.
2023-07-04 07:29:52 +00:00
Victor Stinner
506cfdf141
gh-106320: Remove more private _PyUnicode C API functions (#106382)
Remove more private _PyUnicode C API functions:
move them to the internal C API (pycore_unicodeobject.h).

No longer export most pycore_unicodeobject.h functions.
2023-07-03 22:35:46 +00:00
Victor Stinner
5ccbbe5bb9
gh-106320: Move _PyUnicodeWriter to the internal C API (#106342)
Move also _PyUnicode_FormatAdvancedWriter().

CJK codecs and multibytecodec.c now define the Py_BUILD_CORE_MODULE
macro.
2023-07-03 10:23:43 +02:00
Victor Stinner
7d07e5891d
gh-105156: Cleanup usage of old Py_UNICODE type (#105158)
* refcounts.dat:

  * Remove Py_UNICODE functions.
  * Replace Py_UNICODE argument type with wchar_t.

* _PyUnicode_ToLowercase(), _PyUnicode_ToUppercase(),
  _PyUnicode_ToTitlecase() are no longer deprecated in comments.
  It's no longer needed since they now use Py_UCS4 type, rather than
  the deprecated Py_UNICODE type.
* gdb: Remove unused char_width() method.
2023-06-01 07:18:09 +00:00
Victor Stinner
8ed705c083
gh-105156: Deprecate the old Py_UNICODE type in C API (#105157)
Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API:
use wchar_t instead.

Replace Py_UNICODE with wchar_t in multiple C files.

Co-authored-by: Inada Naoki <songofacandy@gmail.com>
2023-06-01 08:56:35 +02:00
Eddie Elizondo
ea2c001650
gh-84436: Implement Immortal Objects (gh-19474)
This is the implementation of PEP683

Motivation:

The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.

Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.
2023-04-22 13:39:37 -06:00
Victor Stinner
abbe4482ab
PyUnicode_KIND() uses _Py_RVALUE() (#100060)
The PyUnicode_KIND() macro is modified to use _Py_RVALUE(), so it can
no longer be used as a l-value.
2022-12-06 23:40:05 +01:00
Victor Stinner
02f72b8b93
gh-89653: PEP 670: Convert macros to functions (#99843)
Convert macros to static inline functions to avoid macro pitfalls,
like duplication of side effects:

* DK_ENTRIES()
* DK_UNICODE_ENTRIES()
* PyCode_GetNumFree()
* PyFloat_AS_DOUBLE()
* PyInstanceMethod_GET_FUNCTION()
* PyMemoryView_GET_BASE()
* PyMemoryView_GET_BUFFER()
* PyMethod_GET_FUNCTION()
* PyMethod_GET_SELF()
* PySet_GET_SIZE()
* _PyHeapType_GET_MEMBERS()

Changes:

* PyCode_GetNumFree() casts PyCode_GetNumFree.co_nfreevars from int
  to Py_ssize_t to be future proof, and because Py_ssize_t is
  commonly used in the C API.
* PyCode_GetNumFree() doesn't cast its argument: the replaced macro
  already required the exact type PyCodeObject*.
* Add assertions in some functions using "CAST" macros to check
  the arguments type when Python is built with assertions
  (debug build).
* Remove an outdated comment in unicodeobject.h.
2022-11-28 16:40:08 +01:00
David Hewitt
b4d54a332e
gh-99706: unicodeobject: Fix padding in PyASCIIObject.state (GH-99707) 2022-11-24 17:21:59 +09:00
Nikita Sobolev
76f989dc3e
gh-98783: Fix crashes when str subclasses are used in _PyUnicode_Equal (#98806) 2022-10-30 02:23:20 -04:00
Victor Stinner
df22eec421
gh-89653: PEP 670: Macros always cast arguments in cpython/ (#93766)
Header files in the Include/cpython/ are only included if
the Py_LIMITED_API macro is not defined.
2022-06-13 20:09:40 +02:00
Wenzel Jakob
b2694ab469
gh-92536: Mark PyUnicode_READY() argument as unused (#93011) 2022-05-23 16:15:09 +02:00
Victor Stinner
1305832362
gh-89653: Add assertions on PyUnicode_READ() index (#92883)
Add assertions on the index argument of PyUnicode_READ(),
PyUnicode_READ_CHAR() and PyUnicode_WRITE() functions.
2022-05-17 19:43:19 +02:00
Victor Stinner
e6fd7992a9
gh-89653: PEP 670: Fix PyUnicode_READ() cast (#92872)
_Py_CAST() cannot be used with a constant type: use _Py_STATIC_CAST()
instead.
2022-05-17 19:20:37 +02:00
Victor Stinner
90e7230073
gh-92781: Avoid mixing declarations and code in C API (#92783)
Avoid mixing declarations and code in the C API to fix the compiler
warning: "ISO C90 forbids mixed declarations and code"
[-Werror=declaration-after-statement].
2022-05-15 11:19:52 +02:00
Victor Stinner
059b5baf98
gh-85858: Remove PyUnicode_InternImmortal() function (#92579)
Remove the PyUnicode_InternImmortal() function and the
SSTATE_INTERNED_IMMORTAL macro.

The PyUnicode_InternImmortal() function is still exported in the
stable ABI. The function is removed from the API.

PyASCIIObject.state.interned size is now a single bit, rather than 2
bits.

Keep SSTATE_NOT_INTERNED and SSTATE_INTERNED_MORTAL macros for
backward compatibility, but no longer use them internally since the
interned member is now a single bit and so can only have two values
(interned or not interned).

Update stats of _PyUnicode_ClearInterned().
2022-05-13 13:40:22 +02:00
Victor Stinner
f62ad4f2c4
gh-89653: Use int type for Unicode kind (#92704)
Use the same type that PyUnicode_FromKindAndData() kind parameter
type (public C API): int.
2022-05-13 12:41:05 +02:00
Victor Stinner
db388df1d9
gh-89653: PEP 670: Convert PyUnicode_KIND() macro to function (#92705)
In the limited C API version 3.12, PyUnicode_KIND() is now
implemented as a static inline function. Keep the macro for the
regular C API and for the limited C API version 3.11 and older to
prevent introducing new compiler warnings.

Update _decimal.c and stringlib/eq.h for PyUnicode_KIND().
2022-05-13 11:49:56 +02:00
Inada Naoki
f9c9354a7a
gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537) 2022-05-12 14:48:38 +09:00
Victor Stinner
d0c9353a79
gh-89653: PEP 670: unicodeobject.h uses _Py_CAST() (#92696)
Use _Py_CAST() and _Py_STATIC_CAST() in macros wrapping static inline
functions of unicodeobject.h.

Change also the kind type from unsigned int to int: same parameter
type than PyUnicode_FromKindAndData().

The limited API version 3.11 no longer casts arguments to expected
types.
2022-05-12 01:35:41 +02:00
Victor Stinner
d492f0ab2a
gh-89653: Add assertions to unicodeobject.h functions (#92692) 2022-05-12 00:12:42 +02:00
Victor Stinner
eb88f21301
gh-89653: PEP 670: Convert unicodeobject.h macros to functions (#92648)
Convert the following Unicode macros to static inline functions.

Surrogate functions:

* Py_UNICODE_IS_SURROGATE()
* Py_UNICODE_IS_HIGH_SURROGATE()
* Py_UNICODE_IS_LOW_SURROGATE()
* Py_UNICODE_HIGH_SURROGATE()
* Py_UNICODE_LOW_SURROGATE()
* Py_UNICODE_JOIN_SURROGATES()

"Is" functions:

* Py_UNICODE_ISALNUM()
* Py_UNICODE_ISSPACE()

In the implementation of these functions, the character type is now
well defined to Py_UCS4.
2022-05-11 23:28:39 +02:00
Victor Stinner
551d02b3e6
gh-91321: Add _Py_NULL macro (#92253)
Fix C++ compiler warnings: "zero as null pointer constant"
(clang -Wzero-as-null-pointer-constant).

* Add the _Py_NULL macro used by static inline functions to use
  nullptr in C++.
* Replace NULL with nullptr in _testcppext.cpp.
2022-05-03 21:38:37 +02:00
Victor Stinner
ff3e9cdf33
gh-91320: Fix more old-style cast warnings in C++ (#92247)
Use _Py_CAST(), _Py_STATIC_CAST() and _PyASCIIObject_CAST() in
static inline functions to fix C++ compiler warnings:
"use of old-style cast" (clang -Wold-style-cast).

test_cppext now builds the C++ test extension with -Wold-style-cast.
2022-05-03 20:47:29 +02:00
Victor Stinner
fbd5539a54
gh-92135: Rename _Py_reinterpret_cast() to _Py_CAST() (#92230)
Rename also _Py_static_cast() to _Py_STATIC_CAST().
2022-05-03 16:37:06 +02:00
Victor Stinner
29e2245ad5
gh-91320: Add _Py_reinterpret_cast() macro (#91959)
Fix C++ compiler warnings about "old-style cast"
(g++ -Wold-style-cast) in the Python C API.  Use C++
reinterpret_cast<> and static_cast<> casts when the Python C API is
used in C++.

Example of fixed warning:

    Include/object.h:107:43: error: use of old-style cast to
    ‘PyObject*’ {aka ‘struct _object*’} [-Werror=old-style-cast]
    #define _PyObject_CAST(op) ((PyObject*)(op))

Add _Py_reinterpret_cast() and _Py_static_cast() macros.
2022-04-27 10:40:57 +02:00