cpython/Tools/cases_generator
mpage d7bb7c7817
gh-118331: Fix a couple of issues when list allocation fails (#130811)
* Fix use after free in list objects

Set the items pointer in the list object to NULL after the items array
is freed during list deallocation. Otherwise, we can end up with a list
object added to the free list that contains a pointer to an already-freed
items array.

* Mark `_PyList_FromStackRefStealOnSuccess` as escaping

I think technically it's not escaping, because the only object that
can be decrefed if allocation fails is an exact list, which cannot
execute arbitrary code when it is destroyed. However, this seems less
intrusive than trying to special cases objects in the assert in `_Py_Dealloc`
that checks for non-null stackpointers and shouldn't matter for performance.
2025-03-05 10:42:09 -08:00
..
_typing_backports.py gh-104504: cases generator: Add --warn-unreachable to the mypy config (#108112) 2023-08-21 00:40:41 +01:00
analyzer.py gh-118331: Fix a couple of issues when list allocation fails (#130811) 2025-03-05 10:42:09 -08:00
cwriter.py GH-128682: Account for escapes in DECREF_INPUTS (GH-129953) 2025-02-12 17:44:59 +00:00
generators_common.py GH-130296: Avoid stack transients in four instructions. (GH-130310) 2025-02-28 18:00:38 +00:00
interpreter_definition.md gh-119786: Fix miscellaneous typos in InternalDocs/interpreter_definition.md (#127957) 2024-12-15 19:11:50 +02:00
lexer.py GH-128682: Spill the stack pointer in labels, as well as instructions (GH-129618) 2025-02-04 12:18:31 +00:00
mypy.ini Replace strict_concatenate = True with extra_checks = True (#126391) 2025-01-25 12:44:23 +03:00
opcode_id_generator.py GH-122390: Replace _Py_GetbaseOpcode with _Py_GetBaseCodeUnit (GH-122942) 2024-08-13 14:22:57 +01:00
opcode_metadata_generator.py GH-130296: Avoid stack transients in four instructions. (GH-130310) 2025-02-28 18:00:38 +00:00
optimizer_generator.py GH-128682: Account for escapes in DECREF_INPUTS (GH-129953) 2025-02-12 17:44:59 +00:00
parser.py GH-128682: Spill the stack pointer in labels, as well as instructions (GH-129618) 2025-02-04 12:18:31 +00:00
parsing.py GH-128682: Spill the stack pointer in labels, as well as instructions (GH-129618) 2025-02-04 12:18:31 +00:00
plexer.py gh-106812: Refactor cases_generator to allow uops with array stack effects (#107564) 2023-08-04 09:35:56 -07:00
py_metadata_generator.py GH-120024: Tidy up case generator code a bit. (GH-122780) 2024-08-08 10:57:59 +01:00
README.md Rename tier 2 redundancy eliminator to optimizer (#115888) 2024-02-26 08:42:53 -08:00
stack.py GH-128682: Account for escapes in DECREF_INPUTS (GH-129953) 2025-02-12 17:44:59 +00:00
target_generator.py gh-129989: Properly disable tailcall interp in configure (GH-129991) 2025-02-16 03:01:24 +08:00
tier1_generator.py gh-129989: Change Py_TAIL_CALL_INTERP ifndef to ! (#130269) 2025-02-18 15:48:49 +00:00
tier2_generator.py GH-130296: Avoid stack transients in four instructions. (GH-130310) 2025-02-28 18:00:38 +00:00
uop_id_generator.py gh-120417: Remove unused imports in cases_generator (#120622) 2024-06-17 21:58:56 +02:00
uop_metadata_generator.py GH-126222: Fix _PyUop_num_popped (GH-126507) 2024-11-07 10:48:27 +00:00

Tooling to generate interpreters

Documentation for the instruction definitions in Python/bytecodes.c ("the DSL") is here.

What's currently here:

  • analyzer.py: code for converting AST generated by Parser to more high-level structure for easier interaction
  • lexer.py: lexer for C, originally written by Mark Shannon
  • plexer.py: OO interface on top of lexer.py; main class: PLexer
  • parsing.py: Parser for instruction definition DSL; main class: Parser
  • parser.py helper for interactions with parsing.py
  • tierN_generator.py: a couple of driver scripts to read Python/bytecodes.c and write Python/generated_cases.c.h (and several other files)
  • optimizer_generator.py: reads Python/bytecodes.c and Python/optimizer_bytecodes.c and writes Python/optimizer_cases.c.h
  • stack.py: code to handle generalized stack effects
  • cwriter.py: code which understands tokens and how to format C code; main class: CWriter
  • generators_common.py: helpers for generators
  • opcode_id_generator.py: generate a list of opcodes and write them to Include/opcode_ids.h
  • opcode_metadata_generator.py: reads the instruction definitions and write the metadata to Include/internal/pycore_opcode_metadata.h
  • py_metadata_generator.py: reads the instruction definitions and write the metadata to Lib/_opcode_metadata.py
  • target_generator.py: generate targets for computed goto dispatch and write them to Python/opcode_targets.h
  • uop_id_generator.py: generate a list of uop IDs and write them to Include/internal/pycore_uop_ids.h
  • uop_metadata_generator.py: reads the instruction definitions and write the metadata to Include/internal/pycore_uop_metadata.h

Note that there is some dummy C code at the top and bottom of Python/bytecodes.c to fool text editors like VS Code into believing this is valid C code.

A bit about the parser

The parser class uses a pretty standard recursive descent scheme, but with unlimited backtracking. The PLexer class tokenizes the entire input before parsing starts. We do not run the C preprocessor. Each parsing method returns either an AST node (a Node instance) or None, or raises SyntaxError (showing the error in the C source).

Most parsing methods are decorated with @contextual, which automatically resets the tokenizer input position when None is returned. Parsing methods may also raise SyntaxError, which is irrecoverable. When a parsing method returns None, it is possible that after backtracking a different parsing method returns a valid AST.

Neither the lexer nor the parsers are complete or fully correct. Most known issues are tersely indicated by # TODO: comments. We plan to fix issues as they become relevant.