Commit graph

170 commits

Author SHA1 Message Date
Victor Stinner
10f616cf39
[3.15] gh-151253: Dump the Python path configuration on _PyCodec_InitRegistry() failure (#151250) (#151269)
gh-151253: Dump the Python path configuration on _PyCodec_InitRegistry() failure (#151250)

If "import encodings" fails at Python startup, dump the Python path
configuration to help users debugging their configuration. The
encodings module is the first module imported during Python startup.

(cherry picked from commit 7b6e98911e)
2026-06-10 22:03:27 +02:00
Stan Ulbrych
12805ef9da
Python/codecs.c: Remove unused forward declaration (#139511) 2025-10-03 13:33:49 +02:00
Serhiy Storchaka
af58a6f883
gh-88886: Remove excessive encoding name normalization (GH-137167)
The codecs lookup function now performs only minimal normalization of
the encoding name before passing it to the search functions:
all ASCII letters are converted to lower case, spaces are replaced
with hyphens.

Excessive normalization broke third-party codecs providers, like
python-iconv.

Revert "bpo-37751: Fix codecs.lookup() normalization (GH-15092)"

This reverts commit 20f59fe1f7.
2025-09-09 21:07:21 +03:00
Peter Bierma
082f370cdd
gh-137514: Add a free-threading wrapper for mutexes (GH-137515)
Add `FT_MUTEX_LOCK`/`FT_MUTEX_UNLOCK`, which call `PyMutex_Lock` and `PyMutex_Unlock` on the free-threaded build, and no-op otherwise.
2025-08-07 11:24:50 -04:00
Victor Stinner
ce1b747ff6
gh-58124: Avoid CP_UTF8 in UnicodeDecodeError (#137415)
Fix name of the Python encoding in Unicode errors of the code page
codec: use "cp65000" and "cp65001" instead of "CP_UTF7" and "CP_UTF8"
which are not valid Python code names.
2025-08-06 14:35:27 +02:00
Inada Naoki
4e294f6feb
gh-133036: Deprecate codecs.open (#133038)
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
2025-04-30 10:11:09 +09:00
Victor Stinner
20c5f969dd
gh-131238: Remove more includes from pycore_interp.h (#131480) 2025-03-19 23:01:32 +01:00
Victor Stinner
978e37bb5f
gh-131238: Add explicit includes to pycore headers (#131257) 2025-03-17 12:32:43 +01:00
Sergey Miryanov
3a7f17c7e2
gh-130790: Remove references about unicode's readiness from comments (#130801) 2025-03-03 19:18:09 +00:00
Bénédikt Tran
3146a25e97
gh-129173: refactor PyCodec_BackslashReplaceErrors into separate functions (#129895)
The logic of `PyCodec_BackslashReplaceErrors` is now split into separate functions,
each of which handling a specific exception type.
2025-03-03 13:58:15 +01:00
Bénédikt Tran
f693f84227
gh-129173: simplify PyCodec_XMLCharRefReplaceErrors logic (#129894)
Writing the decimal representation of a Unicode codepoint only requires to know the number of digits.

---------

Co-authored-by: Petr Viktorin <encukou@gmail.com>
2025-03-03 11:43:22 +00:00
Bénédikt Tran
fa6a8140dd
gh-129173: refactor PyCodec_ReplaceErrors into separate functions (#129893)
The logic of `PyCodec_ReplaceErrors` is now split into separate functions,
each of which handling a specific exception type.
2025-02-25 14:24:46 +01:00
Bénédikt Tran
e24a1ac17c
gh-129173: Use _PyUnicodeError_GetParams in PyCodec_SurrogateEscapeErrors (GH-129175) 2025-02-20 13:18:47 +00:00
Bénédikt Tran
1775091dc1
gh-129173: Use _PyUnicodeError_GetParams in PyCodec_SurrogatePassErrors (GH-129134) 2025-02-14 18:34:32 +01:00
Bénédikt Tran
a56ead089c
gh-129173: Use _PyUnicodeError_GetParams in PyCodec_NameReplaceErrors (GH-129135) 2025-02-08 16:01:57 +01:00
Bénédikt Tran
36bb229933
gh-129173: Use _PyUnicodeError_GetParams in PyCodec_IgnoreErrors (#129174)
We also cleanup `PyCodec_StrictErrors` and the error message rendered
when an object of incorrect type is passed to codec error handlers.
2025-01-24 11:25:03 +01:00
Bénédikt Tran
25a614a502
gh-126004: Fix positions handling in codecs.backslashreplace_errors (#127676)
This fixes how `PyCodec_BackslashReplaceErrors` handles the `start` and `end`
attributes of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
2025-01-23 14:28:33 +01:00
Bénédikt Tran
225296cd5b
gh-126004: Fix positions handling in codecs.replace_errors (#127674)
This fixes how `PyCodec_ReplaceErrors` handles the `start` and `end` attributes
of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
2025-01-23 11:44:18 +01:00
Bénédikt Tran
70dcc847df
gh-126004: Fix positions handling in codecs.xmlcharrefreplace_errors (#127675)
This fixes how `PyCodec_XMLCharRefReplaceErrors` handles the `start` and `end`
attributes of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
2025-01-23 11:42:38 +01:00
Victor Stinner
b9a8ca0a6a
gh-115754: Use Py_GetConstant(Py_CONSTANT_EMPTY_STR) (#125194)
Replace PyUnicode_New(0, 0), PyUnicode_FromString("")
and PyUnicode_FromStringAndSize("", 0)
with Py_GetConstant(Py_CONSTANT_EMPTY_STR).
2024-10-09 17:15:23 +02:00
Bénédikt Tran
c00964ecd5
gh-124665: Add _PyCodec_UnregisterError and _codecs._unregister_error (#124677) 2024-09-29 02:25:23 +02:00
Petr Viktorin
6f1d448bc1
gh-113993: Allow interned strings to be mortal, and fix related issues (GH-120520)
* Add an InternalDocs file describing how interning should work and how to use it.

* Add internal functions to *explicitly* request what kind of interning is done:
  - `_PyUnicode_InternMortal`
  - `_PyUnicode_InternImmortal`
  - `_PyUnicode_InternStatic`

* Switch uses of `PyUnicode_InternInPlace` to those.

* Disallow using `_Py_SetImmortal` on strings directly.
  You should use `_PyUnicode_InternImmortal` instead:
  - Strings should be interned before immortalization, otherwise you're possibly
    interning a immortalizing copy.
  - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
    `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
    backports, as they are now part of public API and version-specific ABI.

* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.

* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
  - `_Py_ID`
  - `_Py_STR` (including the empty string)
  - one-character latin-1 singletons

  Now, when you intern a singleton, that exact singleton will be interned.

* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).

* Intern `_Py_STR` singletons at startup.

* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.

* Beef up the tests. Cover internal details (marked with `@cpython_only`).

* Add lots of assertions

Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
2024-06-21 17:19:31 +02:00
Brett Simmers
f8290df63f
gh-116738: Make _codecs module thread-safe (#117530)
The module itself is a thin wrapper around calls to functions in
`Python/codecs.c`, so that's where the meaningful changes happened:

- Move codecs-related state that lives on `PyInterpreterState` to a
  struct declared in `pycore_codecs.h`.

- In free-threaded builds, add a mutex to `codecs_state` to synchronize
  operations on `search_path`. Because `search_path_mutex` is used as a
  normal mutex and not a critical section, we must be extremely careful
  with operations called while holding it.

- The codec registry is explicitly initialized as part of
  `_PyUnicode_InitEncodings` to simplify thread-safety.
2024-05-02 18:25:36 -04:00
Kirill Podoprigora
0785c68559
gh-111972: Make Unicode name C APIcapsule initialization thread-safe (#112249) 2023-11-30 11:12:49 +01:00
Serhiy Storchaka
aa438bdd6d
gh-111789: Use PyDict_GetItemRef() in Python/codecs.c (gh-112082) 2023-11-27 18:53:43 +01:00
Victor Stinner
03c4080c71
gh-108765: Python.h no longer includes <ctype.h> (#108831)
Remove <ctype.h> in C files which don't use it; only sre.c and
_decimal.c still use it.

Remove _PY_PORT_CTYPE_UTF8_ISSUE code from pyport.h:

* Code added by commit b5047fd019
  in 2004 for MacOSX and FreeBSD.
* Test removed by commit 52ddaefb6b
  in 2007, since Python str type now uses locale independent
  functions like Py_ISALPHA() and Py_TOLOWER() and the Unicode
  database.

Modules/_sre/sre.c replaces _PY_PORT_CTYPE_UTF8_ISSUE with new
functions: sre_isalnum(), sre_tolower(), sre_toupper().

Remove unused includes:

* _localemodule.c: remove <stdio.h>.
* getargs.c: remove <float.h>.
* dynload_win.c: remove <direct.h>, it no longer calls _getcwd()
  since commit fb1f68ed7c (in 2001).
2023-09-03 18:54:27 +02:00
Victor Stinner
4dc9f48930
gh-108308: Replace _PyDict_GetItemStringWithError() (#108372)
Replace _PyDict_GetItemStringWithError() calls with
PyDict_GetItemStringRef() which returns a strong reference to the
item.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2023-08-23 22:59:00 +02:00
Victor Stinner
615f6e946d
gh-106320: Remove _PyDict_GetItemStringWithError() function (#108313)
Remove private _PyDict_GetItemStringWithError() function of the
public C API: the new PyDict_GetItemStringRef() can be used instead.

* Move private _PyDict_GetItemStringWithError() to the internal C API.
* _testcapi get_code_extra_index() uses PyDict_GetItemStringRef().
  Avoid using private functions in _testcapi which tests the public C
  API.
2023-08-22 18:17:25 +00:00
Serhiy Storchaka
be1b968dc1
gh-106521: Remove _PyObject_LookupAttr() function (GH-106642) 2023-07-12 08:57:10 +03:00
Victor Stinner
bc7eb17084
gh-106320: Use _PyInterpreterState_GET() (#106336)
Replace PyInterpreterState_Get() with inlined
_PyInterpreterState_GET().
2023-07-02 16:37:37 +00:00
Irit Katriel
55c99d97e1
gh-77757: replace exception wrapping by PEP-678 notes in typeobject's __set_name__ (#103402) 2023-04-11 11:53:06 +01:00
Irit Katriel
76350e85eb
gh-102406: replace exception chaining by PEP-678 notes in codecs (#102407) 2023-03-21 21:36:31 +00:00
Victor Stinner
8211cf5d28
gh-99300: Replace Py_INCREF() with Py_NewRef() (#99530)
Replace Py_INCREF() and Py_XINCREF() using a cast with Py_NewRef()
and Py_XNewRef().
2022-11-16 18:34:24 +01:00
Victor Stinner
d8f239d86e
gh-99300: Use Py_NewRef() in Python/ directory (#99302)
Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and
Py_XNewRef() in C files of the Python/ directory.
2022-11-10 09:03:39 +01:00
Eric Snow
81c72044a1
bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928)
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code.  It is still used in a number of non-builtin stdlib modules.

The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime.  A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).

https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.

The core of the change is in:

* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers

I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings.  That check is added to the PR CI config.

The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()).  This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.

The following are not changed (yet):

* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init

https://bugs.python.org/issue46541
2022-02-08 13:39:07 -07:00
Kumar Aditya
41026c3155
bpo-45855: Replaced deprecated PyImport_ImportModuleNoBlock with PyImport_ImportModule (GH-30046) 2021-12-12 10:45:20 +02:00
Victor Stinner
d943d19172
bpo-45439: Move _PyObject_CallNoArgs() to pycore_call.h (GH-28895)
* Move _PyObject_CallNoArgs() to pycore_call.h (internal C API).
* _ssl, _sqlite and _testcapi extensions now call the public
  PyObject_CallNoArgs() function, rather than _PyObject_CallNoArgs().
* _lsprof extension is now built with Py_BUILD_CORE_MODULE macro
  defined to get access to internal _PyObject_CallNoArgs().
2021-10-12 08:38:19 +02:00
Victor Stinner
ce3489cfdb
bpo-45439: Rename _PyObject_CallNoArg() to _PyObject_CallNoArgs() (GH-28891)
Fix typo in the private _PyObject_CallNoArg() function name: rename
it to _PyObject_CallNoArgs() to be consistent with the public
function PyObject_CallNoArgs().
2021-10-12 00:42:23 +02:00
Victor Stinner
920cb647ba
bpo-42157: unicodedata avoids references to UCD_Type (GH-22990)
* UCD_Check() uses PyModule_Check()
* Simplify the internal _PyUnicode_Name_CAPI structure:

  * Remove size and state members
  * Remove state and self parameters of getcode() and getname()
    functions

* Remove global_module_state
2020-10-26 19:19:36 +01:00
Victor Stinner
47e1afd2a1
bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (GH-22713)
The private _PyUnicode_Name_CAPI structure of the PyCapsule API
unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the
structure gets a new state member which must be passed to the
getcode() and getname() functions.

* Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h
* unicodedata module is now built with Py_BUILD_CORE_MODULE.
* unicodedata: move hashAPI variable into unicodedata_module_state.
2020-10-26 16:43:47 +01:00
Hai Shi
c9f696cb96
bpo-41919, test_codecs: Move codecs.register calls to setUp() (GH-22513)
* Move the codecs' (un)register operation to testcases.
* Remove _codecs._forget_codec() and _PyCodec_Forget()
2020-10-16 10:34:15 +02:00
Hai Shi
d332e7b816
bpo-41842: Add codecs.unregister() function (GH-22360)
Add codecs.unregister() and PyCodec_Unregister() functions
to unregister a codec search function.
2020-09-28 23:41:11 +02:00
Victor Stinner
e5014be049
bpo-40268: Remove a few pycore_pystate.h includes (GH-19510) 2020-04-14 17:52:15 +02:00
Victor Stinner
81a7be3fa2
bpo-40268: Rename _PyInterpreterState_GET_UNSAFE() (GH-19509)
Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET()
for consistency with _PyThreadState_GET() and to have a shorter name
(help to fit into 80 columns).

Add also "assert(tstate != NULL);" to the function.
2020-04-14 15:14:01 +02:00
Victor Stinner
4a3fe08353
bpo-40268: Include explicitly pycore_interp.h (GH-19505)
pycore_pystate.h no longer includes pycore_interp.h:
it's now included explicitly in files accessing PyInterpreterState.
2020-04-14 14:26:24 +02:00
Serhiy Storchaka
cd8295ff75
bpo-39943: Add the const qualifier to pointers on non-mutable PyUnicode data. (GH-19345) 2020-04-11 10:48:40 +03:00
Victor Stinner
ff4584caca
bpo-39947: Use _PyInterpreterState_GET_UNSAFE() (GH-18978)
Replace _PyInterpreterState_Get() function call with
_PyInterpreterState_GET_UNSAFE() macro which is more efficient but
don't check if tstate or interp is NULL.

_Py_GetConfigsAsDict() now uses _PyThreadState_GET().
2020-03-13 18:03:56 +01:00
Andy Lester
7386a70746
closes bpo-39630: Update pointers to string literals to be const char *. (GH-18510) 2020-02-13 20:42:56 -08:00
Petr Viktorin
ffd9753a94
bpo-39245: Switch to public API for Vectorcall (GH-18460)
The bulk of this patch was generated automatically with:

    for name in \
        PyObject_Vectorcall \
        Py_TPFLAGS_HAVE_VECTORCALL \
        PyObject_VectorcallMethod \
        PyVectorcall_Function \
        PyObject_CallOneArg \
        PyObject_CallMethodNoArgs \
        PyObject_CallMethodOneArg \
    ;
    do
        echo $name
        git grep -lwz _$name | xargs -0 sed -i "s/\b_$name\b/$name/g"
    done

    old=_PyObject_FastCallDict
    new=PyObject_VectorcallDict
    git grep -lwz $old | xargs -0 sed -i "s/\b$old\b/$new/g"

and then cleaned up:

- Revert changes to in docs & news
- Revert changes to backcompat defines in headers
- Nudge misaligned comments
2020-02-11 17:46:57 +01:00
Victor Stinner
a102ed7d2f
bpo-39573: Use Py_TYPE() macro in Python and Include directories (GH-18391)
Replace direct access to PyObject.ob_type with Py_TYPE().
2020-02-07 02:24:48 +01:00