Commit graph

151 commits

Author SHA1 Message Date
Victor Stinner
db9e7556c3
[3.13] gh-151253: Dump the Python path configuration on _PyCodec_InitRegistry() failure (#151250) (#151269) (#151283) (#151287)
[3.14][3.15] gh-151253: Dump the Python path configuration on _PyCodec_InitRegistry() failure (#151250) (#151269) (#151283)

[3.15] gh-151253: Dump the Python path configuration on _PyCodec_InitRegistry() failure (#151250) (#151269)

gh-151253: Dump the Python path configuration on _PyCodec_InitRegistry() failure (#151250)

If "import encodings" fails at Python startup, dump the Python path
configuration to help users debugging their configuration. The
encodings module is the first module imported during Python startup.

(cherry picked from commit 7b6e98911e)
(cherry picked from commit 10f616cf39)
(cherry picked from commit b3a7758d8a)
2026-06-10 23:24:46 +02:00
Miss Islington (bot)
f2d6931656
[3.13] gh-58124: Avoid CP_UTF8 in UnicodeDecodeError (GH-137415) (#137461)
gh-58124: Avoid CP_UTF8 in UnicodeDecodeError (GH-137415)

Fix name of the Python encoding in Unicode errors of the code page
codec: use "cp65000" and "cp65001" instead of "CP_UTF7" and "CP_UTF8"
which are not valid Python code names.
(cherry picked from commit ce1b747ff6)

Co-authored-by: Victor Stinner <vstinner@python.org>
2025-08-06 12:59:11 +00:00
Petr Viktorin
9769b7ae06
[3.13] gh-113993: Allow interned strings to be mortal, and fix related issues (GH-120520) (GH-120945)
* Add an InternalDocs file describing how interning should work and how to use it.

* Add internal functions to *explicitly* request what kind of interning is done:
  - `_PyUnicode_InternMortal`
  - `_PyUnicode_InternImmortal`
  - `_PyUnicode_InternStatic`

* Switch uses of `PyUnicode_InternInPlace` to those.

* Disallow using `_Py_SetImmortal` on strings directly.
  You should use `_PyUnicode_InternImmortal` instead:
  - Strings should be interned before immortalization, otherwise you're possibly
    interning a immortalizing copy.
  - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
    `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
    backports, as they are now part of public API and version-specific ABI.

* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.

* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
  - `_Py_ID`
  - `_Py_STR` (including the empty string)
  - one-character latin-1 singletons

  Now, when you intern a singleton, that exact singleton will be interned.

* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).

* Intern `_Py_STR` singletons at startup.

* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.

* Beef up the tests. Cover internal details (marked with `@cpython_only`).

* Add lots of assertions

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
2024-06-24 20:24:19 +02:00
Brett Simmers
f8290df63f
gh-116738: Make _codecs module thread-safe (#117530)
The module itself is a thin wrapper around calls to functions in
`Python/codecs.c`, so that's where the meaningful changes happened:

- Move codecs-related state that lives on `PyInterpreterState` to a
  struct declared in `pycore_codecs.h`.

- In free-threaded builds, add a mutex to `codecs_state` to synchronize
  operations on `search_path`. Because `search_path_mutex` is used as a
  normal mutex and not a critical section, we must be extremely careful
  with operations called while holding it.

- The codec registry is explicitly initialized as part of
  `_PyUnicode_InitEncodings` to simplify thread-safety.
2024-05-02 18:25:36 -04:00
Kirill Podoprigora
0785c68559
gh-111972: Make Unicode name C APIcapsule initialization thread-safe (#112249) 2023-11-30 11:12:49 +01:00
Serhiy Storchaka
aa438bdd6d
gh-111789: Use PyDict_GetItemRef() in Python/codecs.c (gh-112082) 2023-11-27 18:53:43 +01:00
Victor Stinner
03c4080c71
gh-108765: Python.h no longer includes <ctype.h> (#108831)
Remove <ctype.h> in C files which don't use it; only sre.c and
_decimal.c still use it.

Remove _PY_PORT_CTYPE_UTF8_ISSUE code from pyport.h:

* Code added by commit b5047fd019
  in 2004 for MacOSX and FreeBSD.
* Test removed by commit 52ddaefb6b
  in 2007, since Python str type now uses locale independent
  functions like Py_ISALPHA() and Py_TOLOWER() and the Unicode
  database.

Modules/_sre/sre.c replaces _PY_PORT_CTYPE_UTF8_ISSUE with new
functions: sre_isalnum(), sre_tolower(), sre_toupper().

Remove unused includes:

* _localemodule.c: remove <stdio.h>.
* getargs.c: remove <float.h>.
* dynload_win.c: remove <direct.h>, it no longer calls _getcwd()
  since commit fb1f68ed7c (in 2001).
2023-09-03 18:54:27 +02:00
Victor Stinner
4dc9f48930
gh-108308: Replace _PyDict_GetItemStringWithError() (#108372)
Replace _PyDict_GetItemStringWithError() calls with
PyDict_GetItemStringRef() which returns a strong reference to the
item.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2023-08-23 22:59:00 +02:00
Victor Stinner
615f6e946d
gh-106320: Remove _PyDict_GetItemStringWithError() function (#108313)
Remove private _PyDict_GetItemStringWithError() function of the
public C API: the new PyDict_GetItemStringRef() can be used instead.

* Move private _PyDict_GetItemStringWithError() to the internal C API.
* _testcapi get_code_extra_index() uses PyDict_GetItemStringRef().
  Avoid using private functions in _testcapi which tests the public C
  API.
2023-08-22 18:17:25 +00:00
Serhiy Storchaka
be1b968dc1
gh-106521: Remove _PyObject_LookupAttr() function (GH-106642) 2023-07-12 08:57:10 +03:00
Victor Stinner
bc7eb17084
gh-106320: Use _PyInterpreterState_GET() (#106336)
Replace PyInterpreterState_Get() with inlined
_PyInterpreterState_GET().
2023-07-02 16:37:37 +00:00
Irit Katriel
55c99d97e1
gh-77757: replace exception wrapping by PEP-678 notes in typeobject's __set_name__ (#103402) 2023-04-11 11:53:06 +01:00
Irit Katriel
76350e85eb
gh-102406: replace exception chaining by PEP-678 notes in codecs (#102407) 2023-03-21 21:36:31 +00:00
Victor Stinner
8211cf5d28
gh-99300: Replace Py_INCREF() with Py_NewRef() (#99530)
Replace Py_INCREF() and Py_XINCREF() using a cast with Py_NewRef()
and Py_XNewRef().
2022-11-16 18:34:24 +01:00
Victor Stinner
d8f239d86e
gh-99300: Use Py_NewRef() in Python/ directory (#99302)
Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and
Py_XNewRef() in C files of the Python/ directory.
2022-11-10 09:03:39 +01:00
Eric Snow
81c72044a1
bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928)
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code.  It is still used in a number of non-builtin stdlib modules.

The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime.  A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).

https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.

The core of the change is in:

* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers

I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings.  That check is added to the PR CI config.

The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()).  This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.

The following are not changed (yet):

* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init

https://bugs.python.org/issue46541
2022-02-08 13:39:07 -07:00
Kumar Aditya
41026c3155
bpo-45855: Replaced deprecated PyImport_ImportModuleNoBlock with PyImport_ImportModule (GH-30046) 2021-12-12 10:45:20 +02:00
Victor Stinner
d943d19172
bpo-45439: Move _PyObject_CallNoArgs() to pycore_call.h (GH-28895)
* Move _PyObject_CallNoArgs() to pycore_call.h (internal C API).
* _ssl, _sqlite and _testcapi extensions now call the public
  PyObject_CallNoArgs() function, rather than _PyObject_CallNoArgs().
* _lsprof extension is now built with Py_BUILD_CORE_MODULE macro
  defined to get access to internal _PyObject_CallNoArgs().
2021-10-12 08:38:19 +02:00
Victor Stinner
ce3489cfdb
bpo-45439: Rename _PyObject_CallNoArg() to _PyObject_CallNoArgs() (GH-28891)
Fix typo in the private _PyObject_CallNoArg() function name: rename
it to _PyObject_CallNoArgs() to be consistent with the public
function PyObject_CallNoArgs().
2021-10-12 00:42:23 +02:00
Victor Stinner
920cb647ba
bpo-42157: unicodedata avoids references to UCD_Type (GH-22990)
* UCD_Check() uses PyModule_Check()
* Simplify the internal _PyUnicode_Name_CAPI structure:

  * Remove size and state members
  * Remove state and self parameters of getcode() and getname()
    functions

* Remove global_module_state
2020-10-26 19:19:36 +01:00
Victor Stinner
47e1afd2a1
bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (GH-22713)
The private _PyUnicode_Name_CAPI structure of the PyCapsule API
unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the
structure gets a new state member which must be passed to the
getcode() and getname() functions.

* Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h
* unicodedata module is now built with Py_BUILD_CORE_MODULE.
* unicodedata: move hashAPI variable into unicodedata_module_state.
2020-10-26 16:43:47 +01:00
Hai Shi
c9f696cb96
bpo-41919, test_codecs: Move codecs.register calls to setUp() (GH-22513)
* Move the codecs' (un)register operation to testcases.
* Remove _codecs._forget_codec() and _PyCodec_Forget()
2020-10-16 10:34:15 +02:00
Hai Shi
d332e7b816
bpo-41842: Add codecs.unregister() function (GH-22360)
Add codecs.unregister() and PyCodec_Unregister() functions
to unregister a codec search function.
2020-09-28 23:41:11 +02:00
Victor Stinner
e5014be049
bpo-40268: Remove a few pycore_pystate.h includes (GH-19510) 2020-04-14 17:52:15 +02:00
Victor Stinner
81a7be3fa2
bpo-40268: Rename _PyInterpreterState_GET_UNSAFE() (GH-19509)
Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET()
for consistency with _PyThreadState_GET() and to have a shorter name
(help to fit into 80 columns).

Add also "assert(tstate != NULL);" to the function.
2020-04-14 15:14:01 +02:00
Victor Stinner
4a3fe08353
bpo-40268: Include explicitly pycore_interp.h (GH-19505)
pycore_pystate.h no longer includes pycore_interp.h:
it's now included explicitly in files accessing PyInterpreterState.
2020-04-14 14:26:24 +02:00
Serhiy Storchaka
cd8295ff75
bpo-39943: Add the const qualifier to pointers on non-mutable PyUnicode data. (GH-19345) 2020-04-11 10:48:40 +03:00
Victor Stinner
ff4584caca
bpo-39947: Use _PyInterpreterState_GET_UNSAFE() (GH-18978)
Replace _PyInterpreterState_Get() function call with
_PyInterpreterState_GET_UNSAFE() macro which is more efficient but
don't check if tstate or interp is NULL.

_Py_GetConfigsAsDict() now uses _PyThreadState_GET().
2020-03-13 18:03:56 +01:00
Andy Lester
7386a70746
closes bpo-39630: Update pointers to string literals to be const char *. (GH-18510) 2020-02-13 20:42:56 -08:00
Petr Viktorin
ffd9753a94
bpo-39245: Switch to public API for Vectorcall (GH-18460)
The bulk of this patch was generated automatically with:

    for name in \
        PyObject_Vectorcall \
        Py_TPFLAGS_HAVE_VECTORCALL \
        PyObject_VectorcallMethod \
        PyVectorcall_Function \
        PyObject_CallOneArg \
        PyObject_CallMethodNoArgs \
        PyObject_CallMethodOneArg \
    ;
    do
        echo $name
        git grep -lwz _$name | xargs -0 sed -i "s/\b_$name\b/$name/g"
    done

    old=_PyObject_FastCallDict
    new=PyObject_VectorcallDict
    git grep -lwz $old | xargs -0 sed -i "s/\b$old\b/$new/g"

and then cleaned up:

- Revert changes to in docs & news
- Revert changes to backcompat defines in headers
- Nudge misaligned comments
2020-02-11 17:46:57 +01:00
Victor Stinner
a102ed7d2f
bpo-39573: Use Py_TYPE() macro in Python and Include directories (GH-18391)
Replace direct access to PyObject.ob_type with Py_TYPE().
2020-02-07 02:24:48 +01:00
Victor Stinner
d3a1de2270
bpo-38631: Avoid Py_FatalError() in _PyCodecRegistry_Init() (GH-18217)
_PyCodecRegistry_Init() now reports exceptions to the caller,
rather than calling Py_FatalError().
2020-01-27 23:23:12 +01:00
Jordon Xu
20f59fe1f7 bpo-37751: Fix codecs.lookup() normalization (GH-15092)
Fix codecs.lookup() to normalize the encoding name the same way
than encodings.normalize_encoding(), except that codecs.lookup()
also converts the name to lower case.
2019-08-21 14:26:20 +01:00
Jeroen Demeyer
1dbd084f1f bpo-29548: no longer use PyEval_Call* functions (GH-14683) 2019-07-12 00:57:32 +09:00
Jeroen Demeyer
6e43d07324 bpo-37483: fix reference leak in _PyCodec_Lookup (GH-14600) 2019-07-05 19:57:32 +09:00
Jeroen Demeyer
196a530e00 bpo-37483: add _PyObject_CallOneArg() function (#14558) 2019-07-04 19:31:34 +09:00
Serhiy Storchaka
a24107b04c
bpo-35459: Use PyDict_GetItemWithError() instead of PyDict_GetItem(). (GH-11112) 2019-02-25 17:59:46 +02:00
Serhiy Storchaka
8905fcc85a
bpo-35454: Fix miscellaneous minor issues in error handling. (#11077)
* bpo-35454: Fix miscellaneous minor issues in error handling.

* Fix a null pointer dereference.
2018-12-11 08:38:03 +02:00
Victor Stinner
621cebe81b
bpo-35081: Rename internal headers (GH-10275)
Rename Include/internal/ headers:

* pycore_hash.h -> pycore_pyhash.h
* pycore_lifecycle.h -> pycore_pylifecycle.h
* pycore_mem.h -> pycore_pymem.h
* pycore_state.h -> pycore_pystate.h

Add missing headers to Makefile.pre.in and PCbuild:

* pycore_condvar.h.
* pycore_hamt.h
* pycore_pyhash.h
2018-11-12 16:53:38 +01:00
Victor Stinner
27e2d1f219
bpo-35081: Add pycore_ prefix to internal header files (GH-10263)
* Rename Include/internal/ header files:

  * pyatomic.h -> pycore_atomic.h
  * ceval.h -> pycore_ceval.h
  * condvar.h -> pycore_condvar.h
  * context.h -> pycore_context.h
  * pygetopt.h -> pycore_getopt.h
  * gil.h -> pycore_gil.h
  * hamt.h -> pycore_hamt.h
  * hash.h -> pycore_hash.h
  * mem.h -> pycore_mem.h
  * pystate.h -> pycore_state.h
  * warnings.h -> pycore_warnings.h

* PCbuild project, Makefile.pre.in, Modules/Setup: add the
  Include/internal/ directory to the search paths of header files.
* Update includes. For example, replace #include "internal/mem.h"
  with #include "pycore_mem.h".
2018-11-01 00:52:28 +01:00
Victor Stinner
caba55b3b7
bpo-34301: Add _PyInterpreterState_Get() helper function (GH-8592)
sys_setcheckinterval() now uses a local variable to parse arguments,
before writing into interp->check_interval.
2018-08-03 15:33:52 +02:00
INADA Naoki
0c1c4563a6
bpo-33231: Fix potential leak in normalizestring() (GH-6386) 2018-04-06 15:51:24 +09:00
Serhiy Storchaka
f320be77ff bpo-32571: Avoid raising unneeded AttributeError and silencing it in C code (GH-5222)
Add two new private APIs: _PyObject_LookupAttr() and _PyObject_LookupAttrId()
2018-01-25 17:49:40 +09:00
Eric Snow
2ebc5ce42a bpo-30860: Consolidate stateful runtime globals. (#3397)
* group the (stateful) runtime globals into various topical structs
* consolidate the topical structs under a single top-level _PyRuntimeState struct
* add a check-c-globals.py script that helps identify runtime globals

Other globals are excluded (see globals.txt and check-c-globals.py).
2017-09-07 23:51:28 -06:00
Victor Stinner
7bfb42d5b7 Issue #28858: Remove _PyObject_CallArg1() macro
Replace
   _PyObject_CallArg1(func, arg)
with
   PyObject_CallFunctionObjArgs(func, arg, NULL)

Using the _PyObject_CallArg1() macro increases the usage of the C stack, which
was unexpected and unwanted. PyObject_CallFunctionObjArgs() doesn't have this
issue.
2016-12-05 17:04:32 +01:00
Victor Stinner
4778eab1f2 Replace PyObject_CallFunction() with fastcall
Replace
    PyObject_CallFunction(func, "O", arg)
and
    PyObject_CallFunction(func, "O", arg, NULL)
with
    _PyObject_CallArg1(func, arg)

Replace
    PyObject_CallFunction(func, NULL)
with
    _PyObject_CallNoArg(func)

_PyObject_CallNoArg() and _PyObject_CallArg1() are simpler and don't allocate
memory on the C stack.
2016-12-01 14:51:04 +01:00
Serhiy Storchaka
85b0f5beb1 Added the const qualifier to char* variables that refer to readonly internal
UTF-8 represenatation of Unicode objects.
2016-11-20 10:16:47 +02:00
Serhiy Storchaka
cb33a01bbc Issue #28510: Clean up decoding error handlers.
Since PyUnicodeDecodeError_GetObject() always returns bytes, following
PyBytes_AsString() can be replaced with PyBytes_AS_STRING().
2016-10-23 09:44:50 +03:00
Martin Panter
6245cb3c01 Correct “an” → “a” with “Unicode”, “user”, “UTF”, etc
This affects documentation, code comments, and a debugging messages.
2016-04-15 02:14:19 +00:00
Victor Stinner
38b8ae0f5b Issue #24993: Handle import error in namereplace error handler
Handle PyCapsule_Import() failure (exception) in PyCodec_NameReplaceErrors():
return immedialty NULL.
2015-09-03 16:19:40 +02:00