Commit graph

146 commits

Author SHA1 Message Date
Pablo Galindo Salgado
9216e69a87
gh-105069: Add a readline-like callable to the tokenizer to consume input iteratively (#105070) 2023-05-30 22:43:34 +01:00
Pablo Galindo Salgado
46b52e6e2b
gh-104976: Ensure trailing dedent tokens are emitted as the previous tokenizer (#104980)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
2023-05-26 22:02:26 +01:00
Marta Gómez Macías
8817886ae5
gh-102856: Tokenize performance improvement (#104731) 2023-05-22 00:29:04 +00:00
Marta Gómez Macías
ffe47cb623
gh-104719: Restore Tokenize module constants (#104722) 2023-05-21 17:07:28 +01:00
Marta Gómez Macías
6715f91edc
gh-102856: Python tokenizer implementation for PEP 701 (#104323)
This commit replaces the Python implementation of the tokenize module with an implementation
that reuses the real C tokenizer via a private extension module. The tokenize module now implements
a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward
compatibility.

As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via
the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation.

Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2023-05-21 01:03:02 +01:00
Nikita Sobolev
0cbdd21311
bpo-46565: del loop vars that are leaking into module namespaces (GH-30993) 2022-02-03 11:20:08 +02:00
Pablo Galindo Salgado
a24676bedc
Add tests for the C tokenizer and expose it as a private module (GH-27924) 2021-08-24 17:50:05 +01:00
Pablo Galindo Salgado
b6bde9fc42
bpo-44667: Treat correctly lines ending with comments and no newlines in the Python tokenizer (GH-27499) 2021-07-31 02:17:09 +01:00
Anthony Sottile
15bd9efd01
bpo-43014: Improve performance of tokenize.tokenize by 20-30% 2021-01-24 12:23:17 +03:00
Anthony Sottile
2a58b0636d bpo-5028: Fix up rest of documentation for tokenize documenting line (GH-13686)
https://bugs.python.org/issue5028
2019-05-30 15:06:32 -07:00
Andrew Carr
1e36f75d63 bpo-5028: fix doc bug for tokenize (GH-11683)
https://bugs.python.org/issue5028
2019-05-30 12:31:51 -07:00
penguindustin
9646630895 bpo-36766: Typos in docs and code comments (GH-13116) 2019-05-06 14:57:17 -04:00
Serhiy Storchaka
8ac658114d
bpo-30455: Generate all token related code and docs from Grammar/Tokens. (GH-10370)
"Include/token.h", "Lib/token.py" (containing now some data moved from
"Lib/tokenize.py") and new files "Parser/token.c" (containing the code
moved from "Parser/tokenizer.c") and "Doc/library/token-list.inc" (included
in "Doc/library/token.rst") are now generated from "Grammar/Tokens" by
"Tools/scripts/generate_token.py". The script overwrites files only if
needed and can be used on the read-only sources tree.

"Lib/symbol.py" is now generated by "Tools/scripts/generate_symbol_py.py"
instead of been executable itself.

Added new make targets "regen-token" and "regen-symbol" which are now
dependencies of "regen-all".

The documentation contains now strings for operators and punctuation tokens.
2018-12-22 11:18:40 +02:00
Ammar Askar
c4ef4896ea bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891)
Most of the change involves fixing up the test suite, which previously made
the assumption that there wouldn't be a new line if the input didn't end in
one.

Contributed by Ammar Askar.
2018-07-06 10:19:08 +03:00
Thomas Kluyver
c56b17bd8c bpo-12486: Document tokenize.generate_tokens() as public API (#6957)
* Document tokenize.generate_tokens()

* Add news file

* Add test for generate_tokens

* Document behaviour around ENCODING token

* Add generate_tokens to __all__
2018-06-05 10:26:39 -07:00
Łukasz Langa
c2d384dbd7
bpo-33338: [tokenize] Minor code cleanup (#6573)
This change contains minor things that make diffing between Lib/tokenize.py and
Lib/lib2to3/pgen2/tokenize.py cleaner.
2018-04-23 01:07:11 -07:00
Serhiy Storchaka
d08972fdb9
bpo-33260: Regenerate token.py after removing ASYNC and AWAIT. (GH-6447) 2018-04-11 19:15:51 +03:00
Jelle Zijlstra
ac317700ce bpo-30406: Make async and await proper keywords (#1669)
Per PEP 492, 'async' and 'await' should become proper keywords in 3.7.
2017-10-05 23:24:46 -04:00
Albert-Jan Nijburg
fc354f0785 bpo-25324: copy tok_name before changing it (#1608)
* add test to check if were modifying token

* copy list so import tokenize doesnt have side effects on token

* shorten line

* add tokenize tokens to token.h to get them to show up in token

* move ERRORTOKEN back to its previous location, and fix nitpick

* copy comments from token.h automatically

* fix whitespace and make more pythonic

* change to fix comments from @haypo

* update token.rst and Misc/NEWS

* change wording

* some more wording changes
2017-05-31 16:00:21 +02:00
Albert-Jan Nijburg
c471ca448c bpo-30377: Simplify handling of COMMENT and NL in tokenize.py (#1607) 2017-05-24 14:31:57 +03:00
Jon Dufresne
3972628de3 bpo-30296 Remove unnecessary tuples, lists, sets, and dicts (#1489)
* Replaced list(<generator expression>) with list comprehension
* Replaced dict(<generator expression>) with dict comprehension
* Replaced set(<list literal>) with set literal
* Replaced builtin func(<list comprehension>) with func(<generator
  expression>) when supported (e.g. any(), all(), tuple(), min(), &
  max())
2017-05-18 07:35:54 -07:00
Jim Fasarakis-Hilliard
d4914e9041 Add ELLIPSIS and RARROW. Add tests (#666) 2017-03-14 21:16:15 +01:00
Brett Cannon
a721abac29 Issue #26331: Implement the parsing part of PEP 515.
Thanks to Georg Brandl for the patch.
2016-09-09 14:57:09 -07:00
Serhiy Storchaka
a051bf3afb Issue #26581: Use the first coding cookie on a line, not the last one. 2016-03-20 23:47:48 +02:00
Serhiy Storchaka
e431d3c9aa Issue #26581: Use the first coding cookie on a line, not the last one. 2016-03-20 23:36:29 +02:00
Berker Peksag
a7161e7fac Issue #25977: Fix typos in Lib/tokenize.py
Patch by John Walker.
2015-12-30 01:42:43 +02:00
Berker Peksag
ff8d0873aa Issue #25977: Fix typos in Lib/tokenize.py
Patch by John Walker.
2015-12-30 01:41:58 +02:00
Eric V. Smith
1c8222c80a Issue 25311: Add support for f-strings to tokenize.py. Also added some comments to explain what's happening, since it's not so obvious. 2015-10-26 04:37:55 -04:00
Yury Selivanov
96ec934e75 Issue #24619: Simplify async/await tokenization.
This commit simplifies async/await tokenization in tokenizer.c,
tokenize.py & lib2to3/tokenize.py.  Previous solution was to keep
a stack of async-def & def blocks, whereas the new approach is just
to remember position of the outermost async-def block.

This change won't bring any parsing performance improvements, but
it makes the code much easier to read and validate.
2015-07-23 15:01:58 +03:00
Yury Selivanov
8fb307cd65 Issue #24619: New approach for tokenizing async/await.
This commit fixes how one-line async-defs and defs are tracked
by tokenizer.  It allows to correctly parse invalid code such
as:

>>> async def f():
...     def g(): pass
...     async = 10

and valid code such as:

>>> async def f():
...     async def g(): pass
...     await z

As a consequence, is is now possible to have one-line
'async def foo(): await ..' functions:

>>> async def foo(): return await bar()
2015-07-22 13:33:45 +03:00
Jason R. Coombs
a95a476b3a Issue #20387: Merge test and patch from 3.4.4 2015-06-28 11:13:30 -04:00
Dingyuan Wang
e411b6629f Issue #20387: Restore retention of indentation during untokenize. 2015-06-22 10:01:12 +08:00
Victor Stinner
24d262af0b (Merge 3.5) Issue #23840: tokenize.open() now closes the temporary binary file
on error to fix a resource warning.
2015-05-26 00:46:44 +02:00
Victor Stinner
387729e183 Issue #23840: tokenize.open() now closes the temporary binary file on error to
fix a resource warning.
2015-05-26 00:43:58 +02:00
Yury Selivanov
7544508f02 PEP 0492 -- Coroutines with async and await syntax. Issue #24017. 2015-05-11 22:57:16 -04:00
Serhiy Storchaka
ca8b64461d Issue #23615: Modules bz2, tarfile and tokenize now can be reloaded with
imp.reload().  Patch by Thomas Kluyver.
2015-03-11 17:31:33 +02:00
Serhiy Storchaka
cf4a2f29ad Issue #23615: Modules bz2, tarfile and tokenize now can be reloaded with
imp.reload().  Patch by Thomas Kluyver.
2015-03-11 17:18:03 +02:00
Serhiy Storchaka
845b14cc8e Removed duplicated dict entries. 2015-01-11 12:48:17 +02:00
Victor Stinner
969175091c Issue #22599: Enhance tokenize.open() to be able to call it during Python
finalization.

Before the module kept a reference to the builtins module, but the module
attributes are cleared during Python finalization. Instead, keep directly a
reference to the open() function.

This enhancement is not perfect, calling tokenize.open() can still fail if
called very late during Python finalization.  Usually, the function is called
by the linecache module which is called to display a traceback or emit a
warning.
2014-12-05 10:17:10 +01:00
Victor Stinner
9d279b87d8 (Merge 3.4) Issue #22599: Enhance tokenize.open() to be able to call it during
Python finalization.

Before the module kept a reference to the builtins module, but the module
attributes are cleared during Python finalization. Instead, keep directly a
reference to the open() function.

This enhancement is not perfect, calling tokenize.open() can still fail if
called very late during Python finalization.  Usually, the function is called
by the linecache module which is called to display a traceback or emit a
warning.
2014-12-05 10:18:30 +01:00
Benjamin Peterson
d51374ed78 PEP 465: a dedicated infix operator for matrix multiplication (closes #21176) 2014-04-09 23:55:56 -04:00
Terry Jan Reedy
58719a7de7 Merge with 3.3 2014-02-23 23:40:16 -05:00
Terry Jan Reedy
f106f8f29c whitespace 2014-02-23 23:39:57 -05:00
Terry Jan Reedy
40f8c6774b Merge with 3.3 2014-02-23 23:33:44 -05:00
Terry Jan Reedy
9dc3a36c84 Issue #9974: When untokenizing, use row info to insert backslash+newline.
Original patches by A. Kuchling and G. Rees (#12691).
2014-02-23 23:33:08 -05:00
Terry Jan Reedy
79bf89986c Merge with 3.3 2014-02-17 23:12:37 -05:00
Terry Jan Reedy
5b8d2c3af7 Issue #8478: Untokenizer.compat now processes first token from iterator input.
Patch based on lines from Georg Brandl, Eric Snow, and Gareth Rees.
2014-02-17 23:12:16 -05:00
Terry Jan Reedy
8c8d77254f Untokenize, bad assert: Merge with 3.3 2014-02-17 16:46:43 -05:00
Terry Jan Reedy
5e6db31368 Untokenize: An logically incorrect assert tested user input validity.
Replace it with correct logic that raises ValueError for bad input.
Issues #8478 and #12691 reported the incorrect logic.
Add an Untokenize test case and an initial test method.
2014-02-17 16:45:48 -05:00
Serhiy Storchaka
7282ff6d5b Issue #18960: Fix bugs with Python source code encoding in the second line.
* The first line of Python script could be executed twice when the source
encoding (not equal to 'utf-8') was specified on the second line.

* Now the source encoding declaration on the second line isn't effective if
the first line contains anything except a comment.

* As a consequence, 'python -x' works now again with files with the source
encoding declarations specified on the second file, and can be used again
to make Python batch files on Windows.

* The tokenize module now ignore the source encoding declaration on the second
line if the first line contains anything except a comment.

* IDLE now ignores the source encoding declaration on the second line if the
first line contains anything except a comment.

* 2to3 and the findnocoding.py script now ignore the source encoding
declaration on the second line if the first line contains anything except
a comment.
2014-01-09 18:41:59 +02:00