This simplifies the Lexical Analysis section on Names (but keeps it technically correct) by putting all the info about non-ASCII characters in a separate (and very technical) section.
It uses a mental model where the parser doesn't handle Unicode complexity “immediately”, but:
- parses any non-ASCII character (outside strings/comments) as part of a name, since these can't (yet) be e.g. operators
- normalizes the name
- validates the name, using the xid_start/xid_continue sets
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
Co-authored-by: Blaise Pabon <blaise@gmail.com>
Co-authored-by: Micha Albert <info@micha.zone>
Co-authored-by: KeithTheEE <kmurrayis@gmail.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Loïc Simon <loic.pano@gmail.com>
Co-authored-by: pauleveritt <pauleveritt@me.com>
* Add t-string prefixes to _all_string_prefixes, and add a test to make sure we catch this error in the future.
* Update lexical analysis docs for t-string prefixes.
Add glossary entry for `token`, and link to it.
Avoid talking about tokens in the SyntaxError intro (errors.rst); at this point
tokenization is too much of a technical detail. (Even to an advanced reader,
the fact that a *single* token is highlighted isn't too relevant. Also, we don't
need to guarantee that it's a single token.)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
In Lexical Analysis f-strings section, NULL in the description
of 'literal character' means '\0'. In the format_spec grammar
production, it is wrong with that meaning and redundant if
instead interpreted as <nothing>. Remove it there.
A backslash-character pair that is not a valid escape sequence now
generates a SyntaxWarning, instead of DeprecationWarning. For
example, re.compile("\d+\.\d+") now emits a SyntaxWarning ("\d" is an
invalid escape sequence), use raw strings for regular expression:
re.compile(r"\d+\.\d+"). In a future Python version, SyntaxError will
eventually be raised, instead of SyntaxWarning.
Octal escapes with value larger than 0o377 (ex: "\477"), deprecated
in Python 3.11, now produce a SyntaxWarning, instead of
DeprecationWarning. In a future Python version they will be
eventually a SyntaxError.
codecs.escape_decode() and codecs.unicode_escape_decode() are left
unchanged: they still emit DeprecationWarning.
* The parser only emits SyntaxWarning for Python 3.12 (feature
version), and still emits DeprecationWarning on older Python
versions.
* Fix SyntaxWarning by using raw strings in Tools/c-analyzer/ and
wasm_build.py.
Also:
* Expand the discussion into its own entry. (Even before this,
text on ``_`` was longet than the text on ``_*``.)
* Briefly note the other common convention for `_`: naming unused
variables.
Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
This is a first edition, ready to go out with the implementation. We'll iterate during the rest of the period leading up to 3.10.0.
Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
Co-authored-by: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Brandt Bucher <brandt@python.org>
Co-authored-by: Raymond Hettinger <1623689+rhettinger@users.noreply.github.com>
Co-authored-by: Guido van Rossum <guido@python.org>
Use an unique identifier for the different grammars documented using
the Sphinx productionlist markup.
productionlist markups of the same grammar, like "expressions" or
"compound statements", use the same identifier "python-grammar".