mirror of
				https://github.com/python/cpython.git
				synced 2025-10-31 13:41:24 +00:00 
			
		
		
		
	gh-46236: PyUnicode docs improvements (GH-129966)
Move deprecated PyUnicode API docs to new section Move Py_UNICODE to a new "Deprecated API" section. Formally soft-deprecate PyUnicode_READY, and move it Document and soft-deprecate PyUnicode_IS_READY, and move it Document PyUnicode_IS_ASCII, PyUnicode_CHECK_INTERNED PyUnicode_New docs: Clarify requirements for "fresh" strings PyUnicodeWriter_DecodeUTF8Stateful: Link "error-handlers" Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
This commit is contained in:
		
							parent
							
								
									9f0879baf1
								
							
						
					
					
						commit
						e21863ce78
					
				
					 2 changed files with 122 additions and 48 deletions
				
			
		|  | @ -31,6 +31,12 @@ Unicode Type | |||
| These are the basic Unicode object types used for the Unicode implementation in | ||||
| Python: | ||||
| 
 | ||||
| .. c:var:: PyTypeObject PyUnicode_Type | ||||
| 
 | ||||
|    This instance of :c:type:`PyTypeObject` represents the Python Unicode type.  It | ||||
|    is exposed to Python code as :py:class:`str`. | ||||
| 
 | ||||
| 
 | ||||
| .. c:type:: Py_UCS4 | ||||
|             Py_UCS2 | ||||
|             Py_UCS1 | ||||
|  | @ -42,19 +48,6 @@ Python: | |||
|    .. versionadded:: 3.3 | ||||
| 
 | ||||
| 
 | ||||
| .. c:type:: Py_UNICODE | ||||
| 
 | ||||
|    This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type | ||||
|    depending on the platform. | ||||
| 
 | ||||
|    .. versionchanged:: 3.3 | ||||
|       In previous versions, this was a 16-bit type or a 32-bit type depending on | ||||
|       whether you selected a "narrow" or "wide" Unicode version of Python at | ||||
|       build time. | ||||
| 
 | ||||
|    .. deprecated-removed:: 3.13 3.15 | ||||
| 
 | ||||
| 
 | ||||
| .. c:type:: PyASCIIObject | ||||
|             PyCompactUnicodeObject | ||||
|             PyUnicodeObject | ||||
|  | @ -66,12 +59,6 @@ Python: | |||
|    .. versionadded:: 3.3 | ||||
| 
 | ||||
| 
 | ||||
| .. c:var:: PyTypeObject PyUnicode_Type | ||||
| 
 | ||||
|    This instance of :c:type:`PyTypeObject` represents the Python Unicode type.  It | ||||
|    is exposed to Python code as ``str``. | ||||
| 
 | ||||
| 
 | ||||
| The following APIs are C macros and static inlined functions for fast checks and | ||||
| access to internal read-only data of Unicode objects: | ||||
| 
 | ||||
|  | @ -87,16 +74,6 @@ access to internal read-only data of Unicode objects: | |||
|    subtype.  This function always succeeds. | ||||
| 
 | ||||
| 
 | ||||
| .. c:function:: int PyUnicode_READY(PyObject *unicode) | ||||
| 
 | ||||
|    Returns ``0``. This API is kept only for backward compatibility. | ||||
| 
 | ||||
|    .. versionadded:: 3.3 | ||||
| 
 | ||||
|    .. deprecated:: 3.10 | ||||
|       This API does nothing since Python 3.12. | ||||
| 
 | ||||
| 
 | ||||
| .. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *unicode) | ||||
| 
 | ||||
|    Return the length of the Unicode string, in code points.  *unicode* has to be a | ||||
|  | @ -149,12 +126,16 @@ access to internal read-only data of Unicode objects: | |||
| .. c:function:: void PyUnicode_WRITE(int kind, void *data, \ | ||||
|                                      Py_ssize_t index, Py_UCS4 value) | ||||
| 
 | ||||
|    Write into a canonical representation *data* (as obtained with | ||||
|    :c:func:`PyUnicode_DATA`).  This function performs no sanity checks, and is | ||||
|    intended for usage in loops.  The caller should cache the *kind* value and | ||||
|    *data* pointer as obtained from other calls.  *index* is the index in | ||||
|    the string (starts at 0) and *value* is the new code point value which should | ||||
|    be written to that location. | ||||
|    Write the code point *value* to the given zero-based *index* in a string. | ||||
| 
 | ||||
|    The *kind* value and *data* pointer must have been obtained from a | ||||
|    string using :c:func:`PyUnicode_KIND` and :c:func:`PyUnicode_DATA` | ||||
|    respectively. You must hold a reference to that string while calling | ||||
|    :c:func:`!PyUnicode_WRITE`. All requirements of | ||||
|    :c:func:`PyUnicode_WriteChar` also apply. | ||||
| 
 | ||||
|    The function performs no checks for any of its requirements, | ||||
|    and is intended for usage in loops. | ||||
| 
 | ||||
|    .. versionadded:: 3.3 | ||||
| 
 | ||||
|  | @ -196,6 +177,14 @@ access to internal read-only data of Unicode objects: | |||
|       is not ready. | ||||
| 
 | ||||
| 
 | ||||
| .. c:function:: unsigned int PyUnicode_IS_ASCII(PyObject *unicode) | ||||
| 
 | ||||
|    Return true if the string only contains ASCII characters. | ||||
|    Equivalent to :py:meth:`str.isascii`. | ||||
| 
 | ||||
|    .. versionadded:: 3.2 | ||||
| 
 | ||||
| 
 | ||||
| Unicode Character Properties | ||||
| """""""""""""""""""""""""""" | ||||
| 
 | ||||
|  | @ -330,11 +319,29 @@ APIs: | |||
|    to be placed in the string.  As an approximation, it can be rounded up to the | ||||
|    nearest value in the sequence 127, 255, 65535, 1114111. | ||||
| 
 | ||||
|    This is the recommended way to allocate a new Unicode object.  Objects | ||||
|    created using this function are not resizable. | ||||
| 
 | ||||
|    On error, set an exception and return ``NULL``. | ||||
| 
 | ||||
|    After creation, the string can be filled by :c:func:`PyUnicode_WriteChar`, | ||||
|    :c:func:`PyUnicode_CopyCharacters`, :c:func:`PyUnicode_Fill`, | ||||
|    :c:func:`PyUnicode_WRITE` or similar. | ||||
|    Since strings are supposed to be immutable, take care to not “use” the | ||||
|    result while it is being modified. In particular, before it's filled | ||||
|    with its final contents, a string: | ||||
| 
 | ||||
|    - must not be hashed, | ||||
|    - must not be :c:func:`converted to UTF-8 <PyUnicode_AsUTF8AndSize>`, | ||||
|      or another non-"canonical" representation, | ||||
|    - must not have its reference count changed, | ||||
|    - must not be shared with code that might do one of the above. | ||||
| 
 | ||||
|    This list is not exhaustive. Avoiding these uses is your responsibility; | ||||
|    Python does not always check these requirements. | ||||
| 
 | ||||
|    To avoid accidentally exposing a partially-written string object, prefer | ||||
|    using the :c:type:`PyUnicodeWriter` API, or one of the ``PyUnicode_From*`` | ||||
|    functions below. | ||||
| 
 | ||||
| 
 | ||||
|    .. versionadded:: 3.3 | ||||
| 
 | ||||
| 
 | ||||
|  | @ -636,6 +643,9 @@ APIs: | |||
|    possible.  Returns ``-1`` and sets an exception on error, otherwise returns | ||||
|    the number of copied characters. | ||||
| 
 | ||||
|    The string must not have been “used” yet. | ||||
|    See :c:func:`PyUnicode_New` for details. | ||||
| 
 | ||||
|    .. versionadded:: 3.3 | ||||
| 
 | ||||
| 
 | ||||
|  | @ -648,6 +658,9 @@ APIs: | |||
|    Fail if *fill_char* is bigger than the string maximum character, or if the | ||||
|    string has more than 1 reference. | ||||
| 
 | ||||
|    The string must not have been “used” yet. | ||||
|    See :c:func:`PyUnicode_New` for details. | ||||
| 
 | ||||
|    Return the number of written character, or return ``-1`` and raise an | ||||
|    exception on error. | ||||
| 
 | ||||
|  | @ -657,15 +670,16 @@ APIs: | |||
| .. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \ | ||||
|                                         Py_UCS4 character) | ||||
| 
 | ||||
|    Write a character to a string.  The string must have been created through | ||||
|    :c:func:`PyUnicode_New`.  Since Unicode strings are supposed to be immutable, | ||||
|    the string must not be shared, or have been hashed yet. | ||||
|    Write a *character* to the string *unicode* at the zero-based *index*. | ||||
|    Return ``0`` on success, ``-1`` on error with an exception set. | ||||
| 
 | ||||
|    This function checks that *unicode* is a Unicode object, that the index is | ||||
|    not out of bounds, and that the object can be modified safely (i.e. that it | ||||
|    its reference count is one). | ||||
|    not out of bounds, and that the object's reference count is one). | ||||
|    See :c:func:`PyUnicode_WRITE` for a version that skips these checks, | ||||
|    making them your responsibility. | ||||
| 
 | ||||
|    Return ``0`` on success, ``-1`` on error with an exception set. | ||||
|    The string must not have been “used” yet. | ||||
|    See :c:func:`PyUnicode_New` for details. | ||||
| 
 | ||||
|    .. versionadded:: 3.3 | ||||
| 
 | ||||
|  | @ -1649,6 +1663,20 @@ They all return ``NULL`` or ``-1`` if an exception occurs. | |||
|       Strings interned this way are made :term:`immortal`. | ||||
| 
 | ||||
| 
 | ||||
| .. c:function:: unsigned int PyUnicode_CHECK_INTERNED(PyObject *str) | ||||
| 
 | ||||
|    Return a non-zero value if *str* is interned, zero if not. | ||||
|    The *str* argument must be a string; this is not checked. | ||||
|    This function always succeeds. | ||||
| 
 | ||||
|    .. impl-detail:: | ||||
| 
 | ||||
|       A non-zero return value may carry additional information | ||||
|       about *how* the string is interned. | ||||
|       The meaning of such non-zero values, as well as each specific string's | ||||
|       intern-related details, may change between CPython versions. | ||||
| 
 | ||||
| 
 | ||||
| PyUnicodeWriter | ||||
| ^^^^^^^^^^^^^^^ | ||||
| 
 | ||||
|  | @ -1769,8 +1797,8 @@ object. | |||
|    *size* is the string length in bytes. If *size* is equal to ``-1``, call | ||||
|    ``strlen(str)`` to get the string length. | ||||
| 
 | ||||
|    *errors* is an error handler name, such as ``"replace"``. If *errors* is | ||||
|    ``NULL``, use the strict error handler. | ||||
|    *errors* is an :ref:`error handler <error-handlers>` name, such as | ||||
|    ``"replace"``. If *errors* is ``NULL``, use the strict error handler. | ||||
| 
 | ||||
|    If *consumed* is not ``NULL``, set *\*consumed* to the number of decoded | ||||
|    bytes on success. | ||||
|  | @ -1781,3 +1809,49 @@ object. | |||
|    On error, set an exception, leave the writer unchanged, and return ``-1``. | ||||
| 
 | ||||
|    See also :c:func:`PyUnicodeWriter_WriteUTF8`. | ||||
| 
 | ||||
| Deprecated API | ||||
| ^^^^^^^^^^^^^^ | ||||
| 
 | ||||
| The following API is deprecated. | ||||
| 
 | ||||
| .. c:type:: Py_UNICODE | ||||
| 
 | ||||
|    This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type | ||||
|    depending on the platform. | ||||
|    Please use :c:type:`wchar_t` directly instead. | ||||
| 
 | ||||
|    .. versionchanged:: 3.3 | ||||
|       In previous versions, this was a 16-bit type or a 32-bit type depending on | ||||
|       whether you selected a "narrow" or "wide" Unicode version of Python at | ||||
|       build time. | ||||
| 
 | ||||
|    .. deprecated-removed:: 3.13 3.15 | ||||
| 
 | ||||
| 
 | ||||
| .. c:function:: int PyUnicode_READY(PyObject *unicode) | ||||
| 
 | ||||
|    Do nothing and return ``0``. | ||||
|    This API is kept only for backward compatibility, but there are no plans | ||||
|    to remove it. | ||||
| 
 | ||||
|    .. versionadded:: 3.3 | ||||
| 
 | ||||
|    .. deprecated:: 3.10 | ||||
|       This API does nothing since Python 3.12. | ||||
|       Previously, this needed to be called for each string created using | ||||
|       the old API (:c:func:`!PyUnicode_FromUnicode` or similar). | ||||
| 
 | ||||
| 
 | ||||
| .. c:function:: unsigned int PyUnicode_IS_READY(PyObject *unicode) | ||||
| 
 | ||||
|    Do nothing and return ``1``. | ||||
|    This API is kept only for backward compatibility, but there are no plans | ||||
|    to remove it. | ||||
| 
 | ||||
|    .. versionadded:: 3.3 | ||||
| 
 | ||||
|    .. deprecated:: next | ||||
|       This API does nothing since Python 3.12. | ||||
|       Previously, this could be called to check if | ||||
|       :c:func:`PyUnicode_READY` is necessary. | ||||
|  |  | |||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue
	
	 Petr Viktorin
						Petr Viktorin