cpython/Lib/test/test_tokenize.py

import os, glob, random
from cStringIO import StringIO
from test.test_support import (verbose, findfile, is_resource_enabled,
                               TestFailed)
from tokenize import (tokenize, generate_tokens, untokenize,
                      NUMBER, NAME, OP, STRING)

# Test roundtrip for `untokenize`.  `f` is a file path.  The source code in f
# is tokenized, converted back to source code via tokenize.untokenize(),
# and tokenized again from the latter.  The test fails if the second
# tokenization doesn't match the first.
def test_roundtrip(f):
    ## print 'Testing:', f
    fobj = open(f)
    try:
        fulltok = list(generate_tokens(fobj.readline))
    finally:
        fobj.close()

    t1 = [tok[:2] for tok in fulltok]
    newtext = untokenize(t1)
    readline = iter(newtext.splitlines(1)).next
    t2 = [tok[:2] for tok in generate_tokens(readline)]
    if t1 != t2:
        raise TestFailed("untokenize() roundtrip failed for %r" % f)

# This is an example from the docs, set up as a doctest.
def decistmt(s):
    """Substitute Decimals for floats in a string of statements.

    >>> from decimal import Decimal
    >>> s = 'print +21.3e-5*-.1234/81.7'
    >>> decistmt(s)
    "print +Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7')"

    The format of the exponent is inherited from the platform C library.
    Known cases are "e-007" (Windows) and "e-07" (not Windows).  Since
    we're only showing 12 digits, and the 13th isn't close to 5, the
    rest of the output should be platform-independent.

    >>> exec(s) #doctest: +ELLIPSIS
    -3.21716034272e-0...7

    Output from calculations with Decimal should be identical across all
    platforms.

    >>> exec(decistmt(s))
    -3.217160342717258261933904529E-7
    """

    result = []
    g = generate_tokens(StringIO(s).readline)   # tokenize the string
    for toknum, tokval, _, _, _  in g:
        if toknum == NUMBER and '.' in tokval:  # replace NUMBER tokens
            result.extend([
                (NAME, 'Decimal'),
                (OP, '('),
                (STRING, repr(tokval)),
                (OP, ')')
            ])
        else:
            result.append((toknum, tokval))
    return untokenize(result)

def test_main():
    if verbose:
        print 'starting...'

    # This displays the tokenization of tokenize_tests.py to stdout, and
    # regrtest.py checks that this equals the expected output (in the
    # test/output/ directory).
    f = open(findfile('tokenize_tests' + os.extsep + 'txt'))
    tokenize(f.readline)
    f.close()

    # Now run test_roundtrip() over tokenize_test.py too, and over all
    # (if the "compiler" resource is enabled) or a small random sample (if
    # "compiler" is not enabled) of the test*.py files.
    f = findfile('tokenize_tests' + os.extsep + 'txt')
    test_roundtrip(f)

    testdir = os.path.dirname(f) or os.curdir
    testfiles = glob.glob(testdir + os.sep + 'test*.py')
    if not is_resource_enabled('compiler'):
        testfiles = random.sample(testfiles, 10)

    for f in testfiles:
        test_roundtrip(f)

    # Test detecton of IndentationError.
    sampleBadText = """\
def foo():
    bar
  baz
"""

    try:
        for tok in generate_tokens(StringIO(sampleBadText).readline):
            pass
    except IndentationError:
        pass
    else:
        raise TestFailed("Did not detect IndentationError:")

    # Run the doctests in this module.
    from test import test_tokenize  # i.e., this module
    from test.test_support import run_doctest
    run_doctest(test_tokenize)

    if verbose:
        print 'finished'

if __name__ == "__main__":
    test_main()
Add untokenize() function to allow full round-trip tokenization. Should significantly enhance the utility of the module by supporting the creation of tools that modify the token stream and writeback the modified result. 2005-06-10 11:05:19 +00:00			`import os, glob, random`
Merge p3yk branch with the trunk up to revision 45595. This breaks a fair number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway. 2006-04-21 10:40:58 +00:00			`from cStringIO import StringIO`
			`from test.test_support import (verbose, findfile, is_resource_enabled,`
			`TestFailed)`
Add untokenize() function to allow full round-trip tokenization. Should significantly enhance the utility of the module by supporting the creation of tools that modify the token stream and writeback the modified result. 2005-06-10 11:05:19 +00:00			`from tokenize import (tokenize, generate_tokens, untokenize,`
			`NUMBER, NAME, OP, STRING)`
Tests for tokenize.py (Ka-Ping Yee) 1997-10-27 22:15:06 +00:00
Merge p3yk branch with the trunk up to revision 45595. This breaks a fair number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway. 2006-04-21 10:40:58 +00:00			# Test roundtrip for `untokenize`. `f` is a file path. The source code in f
			`# is tokenized, converted back to source code via tokenize.untokenize(),`
			`# and tokenized again from the latter. The test fails if the second`
			`# tokenization doesn't match the first.`
Add untokenize() function to allow full round-trip tokenization. Should significantly enhance the utility of the module by supporting the creation of tools that modify the token stream and writeback the modified result. 2005-06-10 11:05:19 +00:00			`def test_roundtrip(f):`
			`## print 'Testing:', f`
Merge p3yk branch with the trunk up to revision 45595. This breaks a fair number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway. 2006-04-21 10:40:58 +00:00			`fobj = open(f)`
Add untokenize() function to allow full round-trip tokenization. Should significantly enhance the utility of the module by supporting the creation of tools that modify the token stream and writeback the modified result. 2005-06-10 11:05:19 +00:00			`try:`
Merge p3yk branch with the trunk up to revision 45595. This breaks a fair number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway. 2006-04-21 10:40:58 +00:00			`fulltok = list(generate_tokens(fobj.readline))`
Add untokenize() function to allow full round-trip tokenization. Should significantly enhance the utility of the module by supporting the creation of tools that modify the token stream and writeback the modified result. 2005-06-10 11:05:19 +00:00			`finally:`
Merge p3yk branch with the trunk up to revision 45595. This breaks a fair number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway. 2006-04-21 10:40:58 +00:00			`fobj.close()`
Add untokenize() function to allow full round-trip tokenization. Should significantly enhance the utility of the module by supporting the creation of tools that modify the token stream and writeback the modified result. 2005-06-10 11:05:19 +00:00
			`t1 = [tok[:2] for tok in fulltok]`
			`newtext = untokenize(t1)`
			`readline = iter(newtext.splitlines(1)).next`
			`t2 = [tok[:2] for tok in generate_tokens(readline)]`
Merge p3yk branch with the trunk up to revision 45595. This breaks a fair number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway. 2006-04-21 10:40:58 +00:00			`if t1 != t2:`
			`raise TestFailed("untokenize() roundtrip failed for %r" % f)`
Add untokenize() function to allow full round-trip tokenization. Should significantly enhance the utility of the module by supporting the creation of tools that modify the token stream and writeback the modified result. 2005-06-10 11:05:19 +00:00
Merge p3yk branch with the trunk up to revision 45595. This breaks a fair number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway. 2006-04-21 10:40:58 +00:00			`# This is an example from the docs, set up as a doctest.`
Add untokenize() function to allow full round-trip tokenization. Should significantly enhance the utility of the module by supporting the creation of tools that modify the token stream and writeback the modified result. 2005-06-10 11:05:19 +00:00			`def decistmt(s):`
			`"""Substitute Decimals for floats in a string of statements.`

			`>>> from decimal import Decimal`
			`>>> s = 'print +21.3e-5*-.1234/81.7'`
			`>>> decistmt(s)`
			`"print +Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7')"`

Merge p3yk branch with the trunk up to revision 45595. This breaks a fair number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway. 2006-04-21 10:40:58 +00:00			`The format of the exponent is inherited from the platform C library.`
			`Known cases are "e-007" (Windows) and "e-07" (not Windows). Since`
			`we're only showing 12 digits, and the 13th isn't close to 5, the`
			`rest of the output should be platform-independent.`

			`>>> exec(s) #doctest: +ELLIPSIS`
			`-3.21716034272e-0...7`

			`Output from calculations with Decimal should be identical across all`
			`platforms.`

Add untokenize() function to allow full round-trip tokenization. Should significantly enhance the utility of the module by supporting the creation of tools that modify the token stream and writeback the modified result. 2005-06-10 11:05:19 +00:00			`>>> exec(decistmt(s))`
			`-3.217160342717258261933904529E-7`
			`"""`
Merge p3yk branch with the trunk up to revision 45595. This breaks a fair number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway. 2006-04-21 10:40:58 +00:00
Add untokenize() function to allow full round-trip tokenization. Should significantly enhance the utility of the module by supporting the creation of tools that modify the token stream and writeback the modified result. 2005-06-10 11:05:19 +00:00			`result = []`
			`g = generate_tokens(StringIO(s).readline) # tokenize the string`
			`for toknum, tokval, _, _, _ in g:`
			`if toknum == NUMBER and '.' in tokval: # replace NUMBER tokens`
			`result.extend([`
			`(NAME, 'Decimal'),`
			`(OP, '('),`
			`(STRING, repr(tokval)),`
			`(OP, ')')`
			`])`
			`else:`
			`result.append((toknum, tokval))`
			`return untokenize(result)`

Merge p3yk branch with the trunk up to revision 45595. This breaks a fair number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway. 2006-04-21 10:40:58 +00:00			`def test_main():`
			`if verbose:`
			`print 'starting...'`

			`# This displays the tokenization of tokenize_tests.py to stdout, and`
			`# regrtest.py checks that this equals the expected output (in the`
			`# test/output/ directory).`
			`f = open(findfile('tokenize_tests' + os.extsep + 'txt'))`
			`tokenize(f.readline)`
			`f.close()`

			`# Now run test_roundtrip() over tokenize_test.py too, and over all`
			`# (if the "compiler" resource is enabled) or a small random sample (if`
			`# "compiler" is not enabled) of the test*.py files.`
			`f = findfile('tokenize_tests' + os.extsep + 'txt')`
			`test_roundtrip(f)`

			`testdir = os.path.dirname(f) or os.curdir`
			`testfiles = glob.glob(testdir + os.sep + 'test*.py')`
			`if not is_resource_enabled('compiler'):`
			`testfiles = random.sample(testfiles, 10)`

			`for f in testfiles:`
			`test_roundtrip(f)`

			`# Test detecton of IndentationError.`
			`sampleBadText = """\`
			`def foo():`
			`bar`
			`baz`
			`"""`

			`try:`
			`for tok in generate_tokens(StringIO(sampleBadText).readline):`
			`pass`
			`except IndentationError:`
			`pass`
			`else:`
			`raise TestFailed("Did not detect IndentationError:")`

			`# Run the doctests in this module.`
			`from test import test_tokenize # i.e., this module`
			`from test.test_support import run_doctest`
			`run_doctest(test_tokenize)`

			`if verbose:`
			`print 'finished'`
Add untokenize() function to allow full round-trip tokenization. Should significantly enhance the utility of the module by supporting the creation of tools that modify the token stream and writeback the modified result. 2005-06-10 11:05:19 +00:00
Merge p3yk branch with the trunk up to revision 45595. This breaks a fair number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway. 2006-04-21 10:40:58 +00:00			`if __name__ == "__main__":`
			`test_main()`