Commit graph

128 commits

Author SHA1 Message Date
Petr Viktorin
2ff8608b4d
gh-135676: Simplify docs on lexing names (GH-140464)
This simplifies the Lexical Analysis section on Names (but keeps it technically correct) by putting all the info about non-ASCII characters in a separate (and very technical) section.

It uses a mental model where the parser doesn't handle Unicode complexity “immediately”, but:

- parses any non-ASCII character (outside strings/comments) as part of a name, since these can't (yet) be e.g. operators
- normalizes the name
- validates the name, using the xid_start/xid_continue sets


Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
Co-authored-by: Blaise Pabon <blaise@gmail.com>
Co-authored-by: Micha Albert <info@micha.zone>
Co-authored-by: KeithTheEE <kmurrayis@gmail.com>
2025-11-26 16:10:44 +01:00
Stan Ulbrych
1624c646b0
gh-140065: Lexical analysis: Correct note about leading zeros in floating point numbers (GH-140066) 2025-10-15 15:15:45 +00:00
Petr Viktorin
59a6f9d8c5
gh-135676: Add a summary of source characters (GH-138194)
Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
Co-authored-by: Blaise Pabon <blaise@gmail.com>
Co-authored-by: Micha Albert <info@micha.zone>
Co-authored-by: KeithTheEE <kmurrayis@gmail.com>
2025-10-08 16:34:19 +02:00
Benjamin Peterson
5bd4bf04c4
closes gh-138706: update Unicode to 17.0.0 (#138719) 2025-09-11 09:58:39 -07:00
Petr Viktorin
cfcfbdd923
gh-135676: Reword the Operators & Delimiters section(s) (GH-137713)
Co-authored-by: Blaise Pabon <blaise@gmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2025-09-03 15:29:15 +00:00
Stan Ulbrych
0a0cbd43a7
gh-110936: Reorder string literal definition in Lexical Analysis (GH-138063) 2025-09-02 13:48:26 +02:00
Adam Turner
4dae9b1ff1
gh-132661: PEP 750 documentation: second pass (#137020) 2025-08-04 22:45:51 +01:00
Petr Viktorin
777159fa31
gh-135676: Lexical analysis: Reword String literals and related sections (GH-135942)
Co-authored-by: Blaise Pabon <blaise@gmail.com>
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2025-07-23 15:57:54 +00:00
Dave Peck
22c8658906
gh-132661: Document t-strings and templatelib (#135229)
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Loïc Simon <loic.pano@gmail.com>
Co-authored-by: pauleveritt <pauleveritt@me.com>
2025-07-22 12:44:13 +03:00
Petr Viktorin
21f3d15534
gh-135676: lexical analysis: Improve section on Numeric literals (GH-134850) 2025-06-18 16:34:18 +02:00
Eric V. Smith
08c78e02fa
gh-134675: Add t-string prefixes to tokenizer module, lexical analysis doc, and add a test to make sure we catch this error in the future. (#134734)
* Add t-string prefixes to _all_string_prefixes, and add a test to make sure we catch this error in the future.

* Update lexical analysis docs for t-string prefixes.
2025-05-26 13:49:39 -04:00
Petr Viktorin
c7364f79b2
gh-127833: lexical analysis: Improve section on Names (GH-131474)
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
Co-authored-by: Blaise Pabon <blaise@gmail.com>
2025-05-21 16:01:52 +02:00
Petr Viktorin
45bb5ba61a
gh-127833: Add links to token types to the lexical analysis intro (#131468)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2025-05-08 09:38:29 +00:00
Stan Ulbrych
0552ce0fb2
gh-127833: lexical analysis: Add backticks to BOM example (#132407) 2025-05-08 02:34:48 +01:00
Petr Viktorin
30d5205849
gh-116666: Add "token" glossary term (GH-130888)
Add glossary entry for `token`, and link to it.
Avoid talking about tokens in the SyntaxError intro (errors.rst); at this point
tokenization is too much of a technical detail. (Even to an advanced reader,
the fact that a *single* token is highlighted isn't too relevant. Also, we don't
need to guarantee that it's a single token.)

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2025-03-17 17:05:47 +01:00
Paul Hoffman
5dac0dceda
gh-125461: Remove Python 2 from identifiers in doc (GH-125462)
Remove Python 2 from identifiers in doc
2024-10-14 15:26:57 +00:00
Benjamin Peterson
bb904e063d
closes gh-124016: update Unicode to 16.0.0 (#124017) 2024-09-13 07:47:04 -07:00
Shaygan Hooshyari
68fe5758bf
gh-123579: Document exclamation token (#123612) 2024-09-03 16:49:38 +02:00
sobolevn
ea70439bd2
gh-122701: Fix wording of raw strings/bytes in lexical_analysis.rst (#122702)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2024-08-11 21:02:50 +00:00
Serhiy Storchaka
1a0c7b9ba4
gh-121905: Consistently use "floating-point" instead of "floating point" (GH-121907) 2024-07-19 08:06:02 +00:00
Sunghyun Kim
7f64ae30dd
gh-107607: Update comment about utf-8 BOM being ignored (#107858)
---------
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
2024-03-19 11:51:12 -04:00
Terry Jan Reedy
4e45c6c54a
gh-116881: Remove erroneous or redundant grammar NULL (GH-116885)
In Lexical Analysis f-strings section, NULL in the description
of 'literal character' means '\0'.  In the format_spec grammar
production, it is wrong with that meaning and redundant if
instead interpreted as <nothing>.  Remove it there.
2024-03-18 10:31:13 +01:00
Serhiy Storchaka
808a77612f
gh-115664: Fix ordering of more versionadded and versionchanged directives (GH-116298) 2024-03-07 10:05:03 +02:00
Hugo van Kemenade
5bf7580d72
Docs: Use 'f-strings' as header (#112888) 2023-12-10 10:39:51 +02:00
Ezio Melotti
41d8ec5a1b
gh-110631: Fix reST indentation in Doc/reference (#110708)
Fix wrong indentation in the Doc/reference dir.
2023-10-11 22:50:55 +02:00
Jacob Coffee
e27adc68cc
gh-109634: Fix :samp: syntax (GH-110073) 2023-09-29 14:21:34 +03:00
Serhiy Storchaka
92af0cc580
gh-109634: Use :samp: role (GH-109635) 2023-09-23 09:31:20 +03:00
James Gerity
def828995a
fixes gh-109559: Update unicodedata for Unicode 15.1.0 (GH-109560)
---------

Co-authored-by: Benjamin Peterson <benjamin@python.org>
2023-09-19 22:07:47 -07:00
Serhiy Storchaka
f2d07d3289
gh-101100: Sphinx warnings: pick the low hanging fruits (GH-107386) 2023-07-29 08:48:10 +03:00
wulmer
0af247da09
gh-102111: Add link to string escape sequences in re module (#106995)
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2023-07-23 02:50:38 -06:00
Jelle Zijlstra
060277d96b
gh-103921: Document PEP 695 (#104642)
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2023-05-26 10:48:17 -07:00
Lysandros Nikolaou
8e5b3b90c8
gh-102856: Update "Formatted string literals" docs section after PEP701 (#104861) 2023-05-24 15:38:37 +02:00
Victor Stinner
a60ddd31be
gh-98401: Invalid escape sequences emits SyntaxWarning (#99011)
A backslash-character pair that is not a valid escape sequence now
generates a SyntaxWarning, instead of DeprecationWarning.  For
example, re.compile("\d+\.\d+") now emits a SyntaxWarning ("\d" is an
invalid escape sequence), use raw strings for regular expression:
re.compile(r"\d+\.\d+"). In a future Python version, SyntaxError will
eventually be raised, instead of SyntaxWarning.

Octal escapes with value larger than 0o377 (ex: "\477"), deprecated
in Python 3.11, now produce a SyntaxWarning, instead of
DeprecationWarning. In a future Python version they will be
eventually a SyntaxError.

codecs.escape_decode() and codecs.unicode_escape_decode() are left
unchanged: they still emit DeprecationWarning.

* The parser only emits SyntaxWarning for Python 3.12 (feature
  version), and still emits DeprecationWarning on older Python
  versions.
* Fix SyntaxWarning by using raw strings in Tools/c-analyzer/ and
  wasm_build.py.
2022-11-03 17:53:25 +01:00
Benjamin Peterson
fd1e477f53
closes gh-96734: Update to Unicode 15.0.0. (GH-96809) 2022-09-13 15:45:12 -07:00
Ezio Melotti
c3d591fd06
gh-95994: Clarify escaped newlines. (#96066)
* gh-95994: clarify escaped newlines.

* Rephrase ambiguous sentence.

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>

* Use `<newline>` in escape sequences table.

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
2022-08-26 21:05:01 +02:00
Ned Batchelder
3440d197a5
Docs: remove redundant "adverb-adjective" hyphens from compound modifiers (GH-94551)
Discussion: https://discuss.python.org/t/slight-grammar-fix-throughout-adverbs-dont-need-hyphen/17021
2022-07-05 11:16:10 +02:00
slateny
549567c6e7
gh-80143: Add clarification for escape characters (#92292) 2022-05-10 11:12:29 -05:00
Serhiy Storchaka
3483299a24
gh-81548: Deprecate octal escape sequences with value larger than 0o377 (GH-91668) 2022-04-30 13:16:27 +03:00
Terry Jan Reedy
01be5d6446
bpo-24563: Link encoding names to encoding declarations (GH-32274) 2022-04-02 20:13:37 -04:00
Arthur Milchior
32959108f9
bpo-45640: [docs] Tokens are now clickable (GH-29260)
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
2021-11-18 17:06:38 +01:00
Petr Viktorin
3dee0cb621
[docs] lexical_analysis: Expand the text on `_` (GH-28903)
Also:
* Expand the discussion into its own entry. (Even before this,
  text on ``_`` was longet than the text on ``_*``.)

* Briefly note the other common convention for `_`: naming unused
  variables.

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
2021-10-13 18:34:01 +02:00
Benjamin Peterson
024fda47d4
closes bpo-45190: Update Unicode data to version 14.0.0. (GH-28336) 2021-09-14 11:00:38 -07:00
Daniel F Moisset
a22bca6b1e
bpo-42128: Add documentation for pattern matching (PEP 634) (#24664)
This is a first edition, ready to go out with the implementation. We'll iterate during the rest of the period leading up to 3.10.0.

Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
Co-authored-by: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Brandt Bucher <brandt@python.org>
Co-authored-by: Raymond Hettinger <1623689+rhettinger@users.noreply.github.com>
Co-authored-by: Guido van Rossum <guido@python.org>
2021-02-28 20:08:38 -08:00
Victor Stinner
8af239eacf
bpo-41762: Fix usage of productionlist markup in the doc (GH-22281)
Use an unique identifier for the different grammars documented using
the Sphinx productionlist markup.

productionlist markups of the same grammar, like "expressions" or
"compound statements", use the same identifier "python-grammar".
2020-09-18 09:10:15 +02:00
Andre Delfino
788b79fa7b
[doc] Remove superfluous comment about equal in f-strings (GH-22006)
Automerge-Triggered-By: @kushaldas
2020-09-09 23:33:13 -07:00
amaajemyfren
13efaec2e0
bpo-41045: Document debug feature of f-strings ('=') (GH-21509)
Co-Authored-By: Rishi <rishi93dev@gmail.com>

Automerge-Triggered-By: @gvanrossum
2020-07-27 15:31:02 -07:00
Géry Ogam
e2fb8a2c42
Update lexical_analysis.rst (GH-17508)
Use Sphinx role markup for `str.format`.

Automerge-Triggered-By: @csabella
2020-06-12 05:54:29 -07:00
Matteo Bertucci
af23f0d3cf
bpo-40439: Update broken link in lexical analysis docs (GH-20184)
Automerge-Triggered-By: @csabella
2020-05-22 18:12:09 -07:00
Javad Mokhtari
5f9c131c09
bpo-40045: Make "dunder" method documentation easier to locate (#19153)
* issue 40045

* Update lexical_analysis.rst

Make "dunder" method documentation easier(GH-19153)

Co-authored-by: Joannah Nanjekye <33177550+nanjekyejoannah@users.noreply.github.com>
2020-03-27 16:02:51 -03:00
Benjamin Peterson
51796e5d26
Update some www.unicode.org URLs to use HTTPS. (GH-18912) 2020-03-10 21:10:59 -07:00