mirror of https://github.com/python/cpython.git synced 2025-12-31 12:33:28 +00:00

Guido van Rossum 41bc101dd6

GH-98831: "Generate" the interpreter (#98830 )

The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code).

The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md.

This is surely a work-in-progress. An easy next step could be auto-generating super-instructions.

**IMPORTANT: Merge Conflicts**

If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically).

2022-11-02 21:31:26 -07:00

1.7 KiB

Raw Blame History

Tooling to generate interpreters

What's currently here:

lexer.py: lexer for C, originally written by Mark Shannon
plexer.py: OO interface on top of lexer.py; main class: PLexer
parser.py: Parser for instruction definition DSL; main class Parser
generate_cases.py: driver script to read Python/bytecodes.c and write Python/generated_cases.c.h

Temporarily also:

extract_cases.py: script to extract cases from Python/ceval.c and write them to Python/bytecodes.c
bytecodes_template.h: template used by extract_cases.py

The DSL for the instruction definitions in Python/bytecodes.c is described here. Note that there is some dummy C code at the top and bottom of the file to fool text editors like VS Code into believing this is valid C code.

A bit about the parser

The parser class uses a pretty standard recursive descent scheme, but with unlimited backtracking. The PLexer class tokenizes the entire input before parsing starts. We do not run the C preprocessor. Each parsing method returns either an AST node (a Node instance) or None, or raises SyntaxError (showing the error in the C source).

Most parsing methods are decorated with @contextual, which automatically resets the tokenizer input position when None is returned. Parsing methods may also raise SyntaxError, which is irrecoverable. When a parsing method returns None, it is possible that after backtracking a different parsing method returns a valid AST.

Neither the lexer nor the parsers are complete or fully correct. Most known issues are tersely indicated by # TODO: comments. We plan to fix issues as they become relevant.

1.7 KiB Raw Blame History

Tooling to generate interpreters

A bit about the parser

1.7 KiB

Raw Blame History