[3.14] gh-135676: Lexical analysis: Reword String literals and related sections (GH-135942) (#137048)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Blaise Pabon <blaise@gmail.com>
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
This commit is contained in:
Miss Islington (bot) 2025-07-23 18:23:25 +02:00 committed by GitHub
parent 9f25781bf9
commit 4832ceaa78
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 443 additions and 206 deletions

View file

@ -133,13 +133,18 @@ Literals
Python supports string and bytes literals and various numeric literals: Python supports string and bytes literals and various numeric literals:
.. productionlist:: python-grammar .. grammar-snippet::
literal: `stringliteral` | `bytesliteral` | `NUMBER` :group: python-grammar
literal: `strings` | `NUMBER`
Evaluation of a literal yields an object of the given type (string, bytes, Evaluation of a literal yields an object of the given type (string, bytes,
integer, floating-point number, complex number) with the given value. The value integer, floating-point number, complex number) with the given value. The value
may be approximated in the case of floating-point and imaginary (complex) may be approximated in the case of floating-point and imaginary (complex)
literals. See section :ref:`literals` for details. literals.
See section :ref:`literals` for details.
See section :ref:`string-concatenation` for details on ``strings``.
.. index:: .. index::
triple: immutable; data; type triple: immutable; data; type
@ -152,6 +157,58 @@ occurrence) may obtain the same object or a different object with the same
value. value.
.. _string-concatenation:
String literal concatenation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Multiple adjacent string or bytes literals (delimited by whitespace), possibly
using different quoting conventions, are allowed, and their meaning is the same
as their concatenation::
>>> "hello" 'world'
"helloworld"
Formally:
.. grammar-snippet::
:group: python-grammar
strings: ( `STRING` | fstring)+ | tstring+
This feature is defined at the syntactical level, so it only works with literals.
To concatenate string expressions at run time, the '+' operator may be used::
>>> greeting = "Hello"
>>> space = " "
>>> name = "Blaise"
>>> print(greeting + space + name) # not: print(greeting space name)
Hello Blaise
Literal concatenation can freely mix raw strings, triple-quoted strings,
and formatted string literals.
For example::
>>> "Hello" r', ' f"{name}!"
"Hello, Blaise!"
This feature can be used to reduce the number of backslashes
needed, to split long strings conveniently across long lines, or even to add
comments to parts of strings. For example::
re.compile("[A-Za-z_]" # letter or underscore
"[A-Za-z0-9_]*" # letter, digit or underscore
)
However, bytes literals may only be combined with other byte literals;
not with string literals of any kind.
Also, template string literals may only be combined with other template
string literals::
>>> t"Hello" t"{name}!"
Template(strings=('Hello', '!'), interpolations=(...))
.. _parenthesized: .. _parenthesized:
Parenthesized forms Parenthesized forms

View file

@ -10,11 +10,8 @@ error recovery.
The notation used here is the same as in the preceding docs, The notation used here is the same as in the preceding docs,
and is described in the :ref:`notation <notation>` section, and is described in the :ref:`notation <notation>` section,
except for a few extra complications: except for an extra complication:
* ``&e``: a positive lookahead (that is, ``e`` is required to match but
not consumed)
* ``!e``: a negative lookahead (that is, ``e`` is required *not* to match)
* ``~`` ("cut"): commit to the current alternative and fail the rule * ``~`` ("cut"): commit to the current alternative and fail the rule
even if this fails to parse even if this fails to parse

View file

@ -145,15 +145,23 @@ The definition to the right of the colon uses the following syntax elements:
* ``e?``: A question mark has exactly the same meaning as square brackets: * ``e?``: A question mark has exactly the same meaning as square brackets:
the preceding item is optional. the preceding item is optional.
* ``(e)``: Parentheses are used for grouping. * ``(e)``: Parentheses are used for grouping.
The following notation is only used in
:ref:`lexical definitions <notation-lexical-vs-syntactic>`.
* ``"a"..."z"``: Two literal characters separated by three dots mean a choice * ``"a"..."z"``: Two literal characters separated by three dots mean a choice
of any single character in the given (inclusive) range of ASCII characters. of any single character in the given (inclusive) range of ASCII characters.
This notation is only used in
:ref:`lexical definitions <notation-lexical-vs-syntactic>`.
* ``<...>``: A phrase between angular brackets gives an informal description * ``<...>``: A phrase between angular brackets gives an informal description
of the matched symbol (for example, ``<any ASCII character except "\">``), of the matched symbol (for example, ``<any ASCII character except "\">``),
or an abbreviation that is defined in nearby text (for example, ``<Lu>``). or an abbreviation that is defined in nearby text (for example, ``<Lu>``).
This notation is only used in
:ref:`lexical definitions <notation-lexical-vs-syntactic>`. .. _lexical-lookaheads:
Some definitions also use *lookaheads*, which indicate that an element
must (or must not) match at a given position, but without consuming any input:
* ``&e``: a positive lookahead (that is, ``e`` is required to match)
* ``!e``: a negative lookahead (that is, ``e`` is required *not* to match)
The unary operators (``*``, ``+``, ``?``) bind as tightly as possible; The unary operators (``*``, ``+``, ``?``) bind as tightly as possible;
the vertical bar (``|``) binds most loosely. the vertical bar (``|``) binds most loosely.

View file

@ -39,7 +39,8 @@ The end of a logical line is represented by the token :data:`~token.NEWLINE`.
Statements cannot cross logical line boundaries except where :data:`!NEWLINE` Statements cannot cross logical line boundaries except where :data:`!NEWLINE`
is allowed by the syntax (e.g., between statements in compound statements). is allowed by the syntax (e.g., between statements in compound statements).
A logical line is constructed from one or more *physical lines* by following A logical line is constructed from one or more *physical lines* by following
the explicit or implicit *line joining* rules. the :ref:`explicit <explicit-joining>` or :ref:`implicit <implicit-joining>`
*line joining* rules.
.. _physical-lines: .. _physical-lines:
@ -47,17 +48,30 @@ the explicit or implicit *line joining* rules.
Physical lines Physical lines
-------------- --------------
A physical line is a sequence of characters terminated by an end-of-line A physical line is a sequence of characters terminated by one the following
sequence. In source files and strings, any of the standard platform line end-of-line sequences:
termination sequences can be used - the Unix form using ASCII LF (linefeed),
the Windows form using the ASCII sequence CR LF (return followed by linefeed),
or the old Macintosh form using the ASCII CR (return) character. All of these
forms can be used equally, regardless of platform. The end of input also serves
as an implicit terminator for the final physical line.
When embedding Python, source code strings should be passed to Python APIs using * the Unix form using ASCII LF (linefeed),
the standard C conventions for newline characters (the ``\n`` character, * the Windows form using the ASCII sequence CR LF (return followed by linefeed),
representing ASCII LF, is the line terminator). * the '`Classic Mac OS`__' form using the ASCII CR (return) character.
__ https://en.wikipedia.org/wiki/Classic_Mac_OS
Regardless of platform, each of these sequences is replaced by a single
ASCII LF (linefeed) character.
(This is done even inside :ref:`string literals <strings>`.)
Each line can use any of the sequences; they do not need to be consistent
within a file.
The end of input also serves as an implicit terminator for the final
physical line.
Formally:
.. grammar-snippet::
:group: python-grammar
newline: <ASCII LF> | <ASCII CR> <ASCII LF> | <ASCII CR>
.. _comments: .. _comments:
@ -106,6 +120,16 @@ If an encoding is declared, the encoding name must be recognized by Python
encoding is used for all lexical analysis, including string literals, comments encoding is used for all lexical analysis, including string literals, comments
and identifiers. and identifiers.
All lexical analysis, including string literals, comments
and identifiers, works on Unicode text decoded using the source encoding.
Any Unicode code point, except the NUL control character, can appear in
Python source.
.. grammar-snippet::
:group: python-grammar
source_character: <any Unicode code point, except NUL>
.. _explicit-joining: .. _explicit-joining:
@ -474,80 +498,110 @@ Literals
Literals are notations for constant values of some built-in types. Literals are notations for constant values of some built-in types.
In terms of lexical analysis, Python has :ref:`string, bytes <strings>`
and :ref:`numeric <numbers>` literals.
Other "literals" are lexically denoted using :ref:`keywords <keywords>`
(``None``, ``True``, ``False``) and the special
:ref:`ellipsis token <lexical-ellipsis>` (``...``).
.. index:: string literal, bytes literal, ASCII .. index:: string literal, bytes literal, ASCII
single: ' (single quote); string literal single: ' (single quote); string literal
single: " (double quote); string literal single: " (double quote); string literal
single: u'; string literal
single: u"; string literal
.. _strings: .. _strings:
String and Bytes literals String and Bytes literals
------------------------- =========================
String literals are described by the following lexical definitions: String literals are text enclosed in single quotes (``'``) or double
quotes (``"``). For example:
.. productionlist:: python-grammar .. code-block:: python
stringliteral: [`stringprefix`](`shortstring` | `longstring`)
stringprefix: "r" | "u" | "R" | "U" | "f" | "F" | "t" | "T"
: | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF"
: | "tr" | "Tr" | "tR" | "TR" | "rt" | "rT" | "Rt" | "RT"
shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"'
longstring: "'''" `longstringitem`* "'''" | '"""' `longstringitem`* '"""'
shortstringitem: `shortstringchar` | `stringescapeseq`
longstringitem: `longstringchar` | `stringescapeseq`
shortstringchar: <any source character except "\" or newline or the quote>
longstringchar: <any source character except "\">
stringescapeseq: "\" <any source character>
.. productionlist:: python-grammar "spam"
bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`) 'eggs'
bytesprefix: "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"'
longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""'
shortbytesitem: `shortbyteschar` | `bytesescapeseq`
longbytesitem: `longbyteschar` | `bytesescapeseq`
shortbyteschar: <any ASCII character except "\" or newline or the quote>
longbyteschar: <any ASCII character except "\">
bytesescapeseq: "\" <any ASCII character>
One syntactic restriction not indicated by these productions is that whitespace The quote used to start the literal also terminates it, so a string literal
is not allowed between the :token:`~python-grammar:stringprefix` or can only contain the other quote (except with escape sequences, see below).
:token:`~python-grammar:bytesprefix` and the rest of the literal. The source For example:
character set is defined by the encoding declaration; it is UTF-8 if no encoding
declaration is given in the source file; see section :ref:`encodings`.
.. index:: triple-quoted string, Unicode Consortium, raw string .. code-block:: python
'Say "Hello", please.'
"Don't do that!"
Except for this limitation, the choice of quote character (``'`` or ``"``)
does not affect how the literal is parsed.
Inside a string literal, the backslash (``\``) character introduces an
:dfn:`escape sequence`, which has special meaning depending on the character
after the backslash.
For example, ``\"`` denotes the double quote character, and does *not* end
the string:
.. code-block:: pycon
>>> print("Say \"Hello\" to everyone!")
Say "Hello" to everyone!
See :ref:`escape sequences <escape-sequences>` below for a full list of such
sequences, and more details.
.. index:: triple-quoted string
single: """; string literal single: """; string literal
single: '''; string literal single: '''; string literal
In plain English: Both types of literals can be enclosed in matching single quotes Triple-quoted strings
(``'``) or double quotes (``"``). They can also be enclosed in matching groups ---------------------
of three single or double quotes (these are generally referred to as
*triple-quoted strings*). The backslash (``\``) character is used to give special Strings can also be enclosed in matching groups of three single or double
meaning to otherwise ordinary characters like ``n``, which means 'newline' when quotes.
escaped (``\n``). It can also be used to escape characters that otherwise have a These are generally referred to as :dfn:`triple-quoted strings`::
special meaning, such as newline, backslash itself, or the quote character.
See :ref:`escape sequences <escape-sequences>` below for examples. """This is a triple-quoted string."""
In triple-quoted literals, unescaped quotes are allowed (and are
retained), except that three unescaped quotes in a row terminate the literal,
if they are of the same kind (``'`` or ``"``) used at the start::
"""This string has "quotes" inside."""
Unescaped newlines are also allowed and retained::
'''This triple-quoted string
continues on the next line.'''
.. index:: .. index::
single: b'; bytes literal single: u'; string literal
single: b"; bytes literal single: u"; string literal
Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an String prefixes
instance of the :class:`bytes` type instead of the :class:`str` type. They ---------------
may only contain ASCII characters; bytes with a numeric value of 128 or greater
must be expressed with escapes.
.. index:: String literals can have an optional :dfn:`prefix` that influences how the
single: r'; raw string literal content of the literal is parsed, for example:
single: r"; raw string literal
Both string and bytes literals may optionally be prefixed with a letter ``'r'`` .. code-block:: python
or ``'R'``; such constructs are called :dfn:`raw string literals`
and :dfn:`raw bytes literals` respectively and treat backslashes as b"data"
literal characters. As a result, in raw string literals, ``'\U'`` and ``'\u'`` f'{result=}'
escapes are not treated specially.
The allowed prefixes are:
* ``b``: :ref:`Bytes literal <bytes-literal>`
* ``r``: :ref:`Raw string <raw-strings>`
* ``f``: :ref:`Formatted string literal <f-strings>` ("f-string")
* ``t``: :ref:`Template string literal <t-strings>` ("t-string")
* ``u``: No effect (allowed for backwards compatibility)
See the linked sections for details on each type.
Prefixes are case-insensitive (for example, ``B`` works the same as ``b``).
The ``r`` prefix can be combined with ``f``, ``t`` or ``b``, so ``fr``,
``rf``, ``tr``, ``rt``, ``br`` and ``rb`` are also valid prefixes.
.. versionadded:: 3.3 .. versionadded:: 3.3
The ``'rb'`` prefix of raw bytes literals has been added as a synonym The ``'rb'`` prefix of raw bytes literals has been added as a synonym
@ -557,18 +611,35 @@ escapes are not treated specially.
to simplify the maintenance of dual Python 2.x and 3.x codebases. to simplify the maintenance of dual Python 2.x and 3.x codebases.
See :pep:`414` for more information. See :pep:`414` for more information.
.. index::
single: f'; formatted string literal
single: f"; formatted string literal
A string literal with ``f`` or ``F`` in its prefix is a Formal grammar
:dfn:`formatted string literal`; see :ref:`f-strings`. The ``f`` may be --------------
combined with ``r``, but not with ``b`` or ``u``, therefore raw
formatted strings are possible, but formatted bytes literals are not.
In triple-quoted literals, unescaped newlines and quotes are allowed (and are String literals, except :ref:`"f-strings" <f-strings>` and
retained), except that three unescaped quotes in a row terminate the literal. (A :ref:`"t-strings" <t-strings>`, are described by the
"quote" is the character used to open the literal, i.e. either ``'`` or ``"``.) following lexical definitions.
These definitions use :ref:`negative lookaheads <lexical-lookaheads>` (``!``)
to indicate that an ending quote ends the literal.
.. grammar-snippet::
:group: python-grammar
STRING: [`stringprefix`] (`stringcontent`)
stringprefix: <("r" | "u" | "b" | "br" | "rb"), case-insensitive>
stringcontent:
| "'" ( !"'" `stringitem`)* "'"
| '"' ( !'"' `stringitem`)* '"'
| "'''" ( !"'''" `longstringitem`)* "'''"
| '"""' ( !'"""' `longstringitem`)* '"""'
stringitem: `stringchar` | `stringescapeseq`
stringchar: <any `source_character`, except backslash and newline>
longstringitem: `stringitem` | newline
stringescapeseq: "\" <any `source_character`>
Note that as in all lexical definitions, whitespace is significant.
In particular, the prefix (if any) must be immediately followed by the starting
quote.
.. index:: physical line, escape sequence, Standard C, C .. index:: physical line, escape sequence, Standard C, C
single: \ (backslash); escape sequence single: \ (backslash); escape sequence
@ -587,63 +658,57 @@ retained), except that three unescaped quotes in a row terminate the literal. (
.. _escape-sequences: .. _escape-sequences:
Escape sequences Escape sequences
^^^^^^^^^^^^^^^^ ----------------
Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in string and Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in string and
bytes literals are interpreted according to rules similar to those used by bytes literals are interpreted according to rules similar to those used by
Standard C. The recognized escape sequences are: Standard C. The recognized escape sequences are:
+-------------------------+---------------------------------+-------+ .. list-table::
| Escape Sequence | Meaning | Notes | :widths: auto
+=========================+=================================+=======+ :header-rows: 1
| ``\``\ <newline> | Backslash and newline ignored | \(1) |
+-------------------------+---------------------------------+-------+
| ``\\`` | Backslash (``\``) | |
+-------------------------+---------------------------------+-------+
| ``\'`` | Single quote (``'``) | |
+-------------------------+---------------------------------+-------+
| ``\"`` | Double quote (``"``) | |
+-------------------------+---------------------------------+-------+
| ``\a`` | ASCII Bell (BEL) | |
+-------------------------+---------------------------------+-------+
| ``\b`` | ASCII Backspace (BS) | |
+-------------------------+---------------------------------+-------+
| ``\f`` | ASCII Formfeed (FF) | |
+-------------------------+---------------------------------+-------+
| ``\n`` | ASCII Linefeed (LF) | |
+-------------------------+---------------------------------+-------+
| ``\r`` | ASCII Carriage Return (CR) | |
+-------------------------+---------------------------------+-------+
| ``\t`` | ASCII Horizontal Tab (TAB) | |
+-------------------------+---------------------------------+-------+
| ``\v`` | ASCII Vertical Tab (VT) | |
+-------------------------+---------------------------------+-------+
| :samp:`\\\\{ooo}` | Character with octal value | (2,4) |
| | *ooo* | |
+-------------------------+---------------------------------+-------+
| :samp:`\\x{hh}` | Character with hex value *hh* | (3,4) |
+-------------------------+---------------------------------+-------+
Escape sequences only recognized in string literals are: * * Escape Sequence
* Meaning
* * ``\``\ <newline>
* :ref:`string-escape-ignore`
* * ``\\``
* :ref:`Backslash <string-escape-escaped-char>`
* * ``\'``
* :ref:`Single quote <string-escape-escaped-char>`
* * ``\"``
* :ref:`Double quote <string-escape-escaped-char>`
* * ``\a``
* ASCII Bell (BEL)
* * ``\b``
* ASCII Backspace (BS)
* * ``\f``
* ASCII Formfeed (FF)
* * ``\n``
* ASCII Linefeed (LF)
* * ``\r``
* ASCII Carriage Return (CR)
* * ``\t``
* ASCII Horizontal Tab (TAB)
* * ``\v``
* ASCII Vertical Tab (VT)
* * :samp:`\\\\{ooo}`
* :ref:`string-escape-oct`
* * :samp:`\\x{hh}`
* :ref:`string-escape-hex`
* * :samp:`\\N\\{{name}\\}`
* :ref:`string-escape-named`
* * :samp:`\\u{xxxx}`
* :ref:`Hexadecimal Unicode character <string-escape-long-hex>`
* * :samp:`\\U{xxxxxxxx}`
* :ref:`Hexadecimal Unicode character <string-escape-long-hex>`
+-------------------------+---------------------------------+-------+ .. _string-escape-ignore:
| Escape Sequence | Meaning | Notes |
+=========================+=================================+=======+
| :samp:`\\N\\{{name}\\}` | Character named *name* in the | \(5) |
| | Unicode database | |
+-------------------------+---------------------------------+-------+
| :samp:`\\u{xxxx}` | Character with 16-bit hex value | \(6) |
| | *xxxx* | |
+-------------------------+---------------------------------+-------+
| :samp:`\\U{xxxxxxxx}` | Character with 32-bit hex value | \(7) |
| | *xxxxxxxx* | |
+-------------------------+---------------------------------+-------+
Notes: Ignored end of line
^^^^^^^^^^^^^^^^^^^
(1)
A backslash can be added at the end of a line to ignore the newline:: A backslash can be added at the end of a line to ignore the newline::
>>> 'This string will not include \ >>> 'This string will not include \
@ -653,54 +718,177 @@ Notes:
The same result can be achieved using :ref:`triple-quoted strings <strings>`, The same result can be achieved using :ref:`triple-quoted strings <strings>`,
or parentheses and :ref:`string literal concatenation <string-concatenation>`. or parentheses and :ref:`string literal concatenation <string-concatenation>`.
.. _string-escape-escaped-char:
(2) Escaped characters
As in Standard C, up to three octal digits are accepted. ^^^^^^^^^^^^^^^^^^
To include a backslash in a non-:ref:`raw <raw-strings>` Python string
literal, it must be doubled. The ``\\`` escape sequence denotes a single
backslash character::
>>> print('C:\\Program Files')
C:\Program Files
Similarly, the ``\'`` and ``\"`` sequences denote the single and double
quote character, respectively::
>>> print('\' and \"')
' and "
.. _string-escape-oct:
Octal character
^^^^^^^^^^^^^^^
The sequence :samp:`\\\\{ooo}` denotes a *character* with the octal (base 8)
value *ooo*::
>>> '\120'
'P'
Up to three octal digits (0 through 7) are accepted.
In a bytes literal, *character* means a *byte* with the given value.
In a string literal, it means a Unicode character with the given value.
.. versionchanged:: 3.11 .. versionchanged:: 3.11
Octal escapes with value larger than ``0o377`` produce a Octal escapes with value larger than ``0o377`` (255) produce a
:exc:`DeprecationWarning`. :exc:`DeprecationWarning`.
.. versionchanged:: 3.12 .. versionchanged:: 3.12
Octal escapes with value larger than ``0o377`` produce a Octal escapes with value larger than ``0o377`` (255) produce a
:exc:`SyntaxWarning`. In a future Python version they will be eventually :exc:`SyntaxWarning`.
a :exc:`SyntaxError`. In a future Python version they will raise a :exc:`SyntaxError`.
.. _string-escape-hex:
Hexadecimal character
^^^^^^^^^^^^^^^^^^^^^
The sequence :samp:`\\x{hh}` denotes a *character* with the hex (base 16)
value *hh*::
>>> '\x50'
'P'
(3)
Unlike in Standard C, exactly two hex digits are required. Unlike in Standard C, exactly two hex digits are required.
(4) In a bytes literal, *character* means a *byte* with the given value.
In a bytes literal, hexadecimal and octal escapes denote the byte with the In a string literal, it means a Unicode character with the given value.
given value. In a string literal, these escapes denote a Unicode character
with the given value. .. _string-escape-named:
Named Unicode character
^^^^^^^^^^^^^^^^^^^^^^^
The sequence :samp:`\\N\\{{name}\\}` denotes a Unicode character
with the given *name*::
>>> '\N{LATIN CAPITAL LETTER P}'
'P'
>>> '\N{SNAKE}'
'🐍'
This sequence cannot appear in :ref:`bytes literals <bytes-literal>`.
(5)
.. versionchanged:: 3.3 .. versionchanged:: 3.3
Support for name aliases [#]_ has been added. Support for `name aliases <https://www.unicode.org/Public/16.0.0/ucd/NameAliases.txt>`__
has been added.
(6) .. _string-escape-long-hex:
Exactly four hex digits are required.
(7) Hexadecimal Unicode characters
Any Unicode character can be encoded this way. Exactly eight hex digits ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
are required.
These sequences :samp:`\\u{xxxx}` and :samp:`\\U{xxxxxxxx}` denote the
Unicode character with the given hex (base 16) value.
Exactly four digits are required for ``\u``; exactly eight digits are
required for ``\U``.
The latter can encode any Unicode character.
.. code-block:: pycon
>>> '\u1234'
'ሴ'
>>> '\U0001f40d'
'🐍'
These sequences cannot appear in :ref:`bytes literals <bytes-literal>`.
.. index:: unrecognized escape sequence .. index:: unrecognized escape sequence
Unlike Standard C, all unrecognized escape sequences are left in the string Unrecognized escape sequences
unchanged, i.e., *the backslash is left in the result*. (This behavior is ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
useful when debugging: if an escape sequence is mistyped, the resulting output
is more easily recognized as broken.) It is also important to note that the Unlike in Standard C, all unrecognized escape sequences are left in the string
escape sequences only recognized in string literals fall into the category of unchanged, that is, *the backslash is left in the result*::
unrecognized escapes for bytes literals.
>>> print('\q')
\q
>>> list('\q')
['\\', 'q']
Note that for bytes literals, the escape sequences only recognized in string
literals (``\N...``, ``\u...``, ``\U...``) fall into the category of
unrecognized escapes.
.. versionchanged:: 3.6 .. versionchanged:: 3.6
Unrecognized escape sequences produce a :exc:`DeprecationWarning`. Unrecognized escape sequences produce a :exc:`DeprecationWarning`.
.. versionchanged:: 3.12 .. versionchanged:: 3.12
Unrecognized escape sequences produce a :exc:`SyntaxWarning`. In a future Unrecognized escape sequences produce a :exc:`SyntaxWarning`.
Python version they will be eventually a :exc:`SyntaxError`. In a future Python version they will raise a :exc:`SyntaxError`.
.. index::
single: b'; bytes literal
single: b"; bytes literal
.. _bytes-literal:
Bytes literals
--------------
:dfn:`Bytes literals` are always prefixed with ``'b'`` or ``'B'``; they produce an
instance of the :class:`bytes` type instead of the :class:`str` type.
They may only contain ASCII characters; bytes with a numeric value of 128
or greater must be expressed with escape sequences (typically
:ref:`string-escape-hex` or :ref:`string-escape-oct`):
.. code-block:: pycon
>>> b'\x89PNG\r\n\x1a\n'
b'\x89PNG\r\n\x1a\n'
>>> list(b'\x89PNG\r\n\x1a\n')
[137, 80, 78, 71, 13, 10, 26, 10]
Similarly, a zero byte must be expressed using an escape sequence (typically
``\0`` or ``\x00``).
.. index::
single: r'; raw string literal
single: r"; raw string literal
.. _raw-strings:
Raw string literals
-------------------
Both string and bytes literals may optionally be prefixed with a letter ``'r'``
or ``'R'``; such constructs are called :dfn:`raw string literals`
and :dfn:`raw bytes literals` respectively and treat backslashes as
literal characters.
As a result, in raw string literals, :ref:`escape sequences <escape-sequences>`
are not treated specially:
.. code-block:: pycon
>>> r'\d{4}-\d{2}-\d{2}'
'\\d{4}-\\d{2}-\\d{2}'
Even in a raw literal, quotes can be escaped with a backslash, but the Even in a raw literal, quotes can be escaped with a backslash, but the
backslash remains in the result; for example, ``r"\""`` is a valid string backslash remains in the result; for example, ``r"\""`` is a valid string
@ -712,29 +900,6 @@ that a single backslash followed by a newline is interpreted as those two
characters as part of the literal, *not* as a line continuation. characters as part of the literal, *not* as a line continuation.
.. _string-concatenation:
String literal concatenation
----------------------------
Multiple adjacent string or bytes literals (delimited by whitespace), possibly
using different quoting conventions, are allowed, and their meaning is the same
as their concatenation. Thus, ``"hello" 'world'`` is equivalent to
``"helloworld"``. This feature can be used to reduce the number of backslashes
needed, to split long strings conveniently across long lines, or even to add
comments to parts of strings, for example::
re.compile("[A-Za-z_]" # letter or underscore
"[A-Za-z0-9_]*" # letter, digit or underscore
)
Note that this feature is defined at the syntactical level, but implemented at
compile time. The '+' operator must be used to concatenate string expressions
at run time. Also note that literal concatenation can use different quoting
styles for each component (even mixing raw strings and triple quoted strings),
and formatted string literals may be concatenated with plain string literals.
.. index:: .. index::
single: formatted string literal single: formatted string literal
single: interpolated string literal single: interpolated string literal
@ -742,6 +907,8 @@ and formatted string literals may be concatenated with plain string literals.
single: string; interpolated literal single: string; interpolated literal
single: f-string single: f-string
single: fstring single: fstring
single: f'; formatted string literal
single: f"; formatted string literal
single: {} (curly brackets); in formatted string literal single: {} (curly brackets); in formatted string literal
single: ! (exclamation); in formatted string literal single: ! (exclamation); in formatted string literal
single: : (colon); in formatted string literal single: : (colon); in formatted string literal
@ -958,7 +1125,7 @@ the following differences:
.. _numbers: .. _numbers:
Numeric literals Numeric literals
---------------- ================
.. index:: number, numeric literal, integer literal .. index:: number, numeric literal, integer literal
floating-point literal, hexadecimal literal floating-point literal, hexadecimal literal
@ -991,7 +1158,7 @@ actually an expression composed of the unary operator '``-``' and the literal
.. _integers: .. _integers:
Integer literals Integer literals
^^^^^^^^^^^^^^^^ ----------------
Integer literals denote whole numbers. For example:: Integer literals denote whole numbers. For example::
@ -1064,7 +1231,7 @@ Formally, integer literals are described by the following lexical definitions:
.. _floating: .. _floating:
Floating-point literals Floating-point literals
^^^^^^^^^^^^^^^^^^^^^^^ -----------------------
Floating-point (float) literals, such as ``3.14`` or ``1.5``, denote Floating-point (float) literals, such as ``3.14`` or ``1.5``, denote
:ref:`approximations of real numbers <datamodel-float>`. :ref:`approximations of real numbers <datamodel-float>`.
@ -1126,7 +1293,7 @@ lexical definitions:
.. _imaginary: .. _imaginary:
Imaginary literals Imaginary literals
^^^^^^^^^^^^^^^^^^ ------------------
Python has :ref:`complex number <typesnumeric>` objects, but no complex Python has :ref:`complex number <typesnumeric>` objects, but no complex
literals. literals.
@ -1214,14 +1381,26 @@ The following tokens serve as delimiters in the grammar:
( ) [ ] { } ( ) [ ] { }
, : ! . ; @ = , : ! . ; @ =
The period can also occur in floating-point and imaginary literals.
.. _lexical-ellipsis:
A sequence of three periods has a special meaning as an
:py:data:`Ellipsis` literal:
.. code-block:: none
...
The following *augmented assignment operators* serve
lexically as delimiters, but also perform an operation:
.. code-block:: none
-> += -= *= /= //= %= -> += -= *= /= //= %=
@= &= |= ^= >>= <<= **= @= &= |= ^= >>= <<= **=
The period can also occur in floating-point and imaginary literals. A sequence
of three periods has a special meaning as an ellipsis literal. The second half
of the list, the augmented assignment operators, serve lexically as delimiters,
but also perform an operation.
The following printing ASCII characters have special meaning as part of other The following printing ASCII characters have special meaning as part of other
tokens or are otherwise significant to the lexical analyzer: tokens or are otherwise significant to the lexical analyzer:
@ -1236,7 +1415,3 @@ occurrence outside string literals and comments is an unconditional error:
$ ? ` $ ? `
.. rubric:: Footnotes
.. [#] https://www.unicode.org/Public/16.0.0/ucd/NameAliases.txt