gh-119786: cleanup internal docs and fix internal links (#127485)

2025-12-31 04:23:37 +00:00 · 2024-12-01 18:12:22 +01:00 · 2024-12-01 18:12:22 +01:00 · 04673d2f14
commit 04673d2f14
parent 1bc4f076d1
11 changed files with 152 additions and 148 deletions
--- a/InternalDocs/README.md
+++ b/InternalDocs/README.md
@ -1,4 +1,3 @@
-
 # CPython Internals Documentation

 The documentation in this folder is intended for CPython maintainers.
--- a/InternalDocs/adaptive.md
+++ b/InternalDocs/adaptive.md
@ -96,6 +96,7 @@ ### Choice of specializations
 Specialized instructions must be fast. In order to be fast,
 specialized instructions should be tailored for a particular
 set of values that allows them to:
+
 1. Verify that incoming value is part of that set with low overhead.
 2. Perform the operation quickly.

@ -107,9 +108,11 @@ ### Choice of specializations
 dictionaries that have a keys with the expected version.

 This can be tested quickly:
+
 * `globals->keys->dk_version == expected_version`

 and the operation can be performed quickly:
+
 * `value = entries[cache->index].me_value;`.

 Because it is impossible to measure the performance of an instruction without
@ -122,6 +125,7 @@ ### Choice of specializations
 ### Implementation of specialized instructions

 In general, specialized instructions should be implemented in two parts:
+
 1. A sequence of guards, each of the form
   `DEOPT_IF(guard-condition-is-false, BASE_NAME)`.
 2. The operation, which should ideally have no branches and
--- a/InternalDocs/changing_grammar.md
+++ b/InternalDocs/changing_grammar.md
@ -32,7 +32,7 @@ ## Checklist
  [`Include/internal/pycore_ast.h`](../Include/internal/pycore_ast.h) and
  [`Python/Python-ast.c`](../Python/Python-ast.c).

-* [`Parser/lexer/`](../Parser/lexer/) contains the tokenization code.
+* [`Parser/lexer/`](../Parser/lexer) contains the tokenization code.
  This is where you would add a new type of comment or string literal, for example.

 * [`Python/ast.c`](../Python/ast.c) will need changes to validate AST objects
@ -60,4 +60,4 @@ ## Checklist
  to the tokenizer.

 * Documentation must be written! Specifically, one or more of the pages in
-  [`Doc/reference/`](../Doc/reference/) will need to be updated.
+  [`Doc/reference/`](../Doc/reference) will need to be updated.
--- a/InternalDocs/compiler.md
+++ b/InternalDocs/compiler.md
@ -1,4 +1,3 @@
-
 Compiler design
 ===============

@ -7,8 +6,8 @@

 In CPython, the compilation from source code to bytecode involves several steps:

-1. Tokenize the source code [Parser/lexer/](../Parser/lexer/)
-   and [Parser/tokenizer/](../Parser/tokenizer/).
+1. Tokenize the source code [Parser/lexer/](../Parser/lexer)
+   and [Parser/tokenizer/](../Parser/tokenizer).
 2. Parse the stream of tokens into an Abstract Syntax Tree
   [Parser/parser.c](../Parser/parser.c).
 3. Transform AST into an instruction sequence
@ -134,9 +133,8 @@
 `FunctionDef()` constructor function sets 'kind' to `FunctionDef_kind` and
 initializes the *name*, *args*, *body*, and *attributes* fields.

-See also
-[Green Tree Snakes - The missing Python AST docs](https://greentreesnakes.readthedocs.io/en/latest)
- by Thomas Kluyver.
+See also [Green Tree Snakes - The missing Python AST docs](
+https://greentreesnakes.readthedocs.io/en/latest) by Thomas Kluyver.

 Memory management
 =================
@ -260,11 +258,11 @@
 [Include/internal/pycore_asdl.h](../Include/internal/pycore_asdl.h).
 Functions and macros for creating `asdl_xx_seq *` types are as follows:

-`_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`
+* `_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`:
  Allocate memory for an `asdl_generic_seq` of the specified length
-`_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`
+* `_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`:
  Allocate memory for an `asdl_identifier_seq` of the specified length
-`_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`
+* `_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`:
  Allocate memory for an `asdl_int_seq` of the specified length

 In addition to the three types mentioned above, some ASDL sequence types are
@ -273,19 +271,19 @@
 Macros for using both manually defined and automatically generated ASDL
 sequence types are as follows:

-`asdl_seq_GET(asdl_xx_seq *, int)`
+* `asdl_seq_GET(asdl_xx_seq *, int)`:
  Get item held at a specific position in an `asdl_xx_seq`
-`asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`
+* `asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`:
  Set a specific index in an `asdl_xx_seq` to the specified value

 Untyped counterparts exist for some of the typed macros. These are useful
 when a function needs to manipulate a generic ASDL sequence:

-`asdl_seq_GET_UNTYPED(asdl_seq *, int)`
+* `asdl_seq_GET_UNTYPED(asdl_seq *, int)`:
  Get item held at a specific position in an `asdl_seq`
-`asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`
+* `asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`:
  Set a specific index in an `asdl_seq` to the specified value
-`asdl_seq_LEN(asdl_seq *)`
+* `asdl_seq_LEN(asdl_seq *)`:
  Return the length of an `asdl_seq` or `asdl_xx_seq`

 Note that typed macros and functions are recommended over their untyped
@ -379,14 +377,14 @@

 Emission of bytecode is handled by the following macros:

-* `ADDOP(struct compiler *, location, int)`
+* `ADDOP(struct compiler *, location, int)`:
  add a specified opcode
-* `ADDOP_IN_SCOPE(struct compiler *, location, int)`
+* `ADDOP_IN_SCOPE(struct compiler *, location, int)`:
  like `ADDOP`, but also exits current scope; used for adding return value
  opcodes in lambdas and closures
-* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`
+* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`:
  add an opcode that takes an integer argument
-* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`
+* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`:
  add an opcode with the proper argument based on the position of the
  specified PyObject in PyObject sequence object, but with no handling of
  mangled names; used for when you
@ -394,17 +392,17 @@
  parameters where name mangling is not possible and the scope of the
  name is known; *TYPE* is the name of PyObject sequence
  (`names` or `varnames`)
-* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`
+* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`:
  just like `ADDOP_O`, but steals a reference to PyObject
-* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`
+* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`:
  just like `ADDOP_O`, but name mangling is also handled; used for
  attribute loading or importing based on name
-* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`
+* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`:
  add the `LOAD_CONST` opcode with the proper argument based on the
  position of the specified PyObject in the consts table.
-* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`
+* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`:
  just like `ADDOP_LOAD_CONST_NEW`, but steals a reference to PyObject
-* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`
+* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`:
  create a jump to a basic block

 The `location` argument is a struct with the source location to be
@ -433,7 +431,7 @@
 bytecode. This includes transforming pseudo instructions into actual instructions,
 converting jump targets from logical labels to relative offsets, and
 construction of the [exception table](exception_handling.md) and
-[locations table](locations.md).
+[locations table](code_objects.md#source-code-locations).
 The bytecode and tables are then wrapped into a `PyCodeObject` along with additional
 metadata, including the `consts` and `names` arrays, information about function
 reference to the source code (filename, etc). All of this is implemented by
@ -453,7 +451,7 @@
 Important files
 ===============

-* [Parser/](../Parser/)
+* [Parser/](../Parser)

  * [Parser/Python.asdl](../Parser/Python.asdl):
    ASDL syntax file.
@ -534,7 +532,7 @@
  * [Python/instruction_sequence.c](../Python/instruction_sequence.c):
    A data structure representing a sequence of bytecode-like pseudo-instructions.

-* [Include/](../Include/)
+* [Include/](../Include)

  * [Include/cpython/code.h](../Include/cpython/code.h)
    : Header file for [Objects/codeobject.c](../Objects/codeobject.c);
@ -570,7 +568,7 @@
    by
    [Tools/cases_generator/opcode_id_generator.py](../Tools/cases_generator/opcode_id_generator.py).

-* [Objects/](../Objects/)
+* [Objects/](../Objects)

  * [Objects/codeobject.c](../Objects/codeobject.c)
    : Contains PyCodeObject-related code.
@ -579,7 +577,7 @@
    : Contains the `frame_setlineno()` function which should determine whether it is allowed
    to make a jump between two points in a bytecode.

-* [Lib/](../Lib/)
+* [Lib/](../Lib)

  * [Lib/opcode.py](../Lib/opcode.py)
    : opcode utilities exposed to Python.
@ -591,7 +589,7 @@
 Objects
 =======

-* [Locations](locations.md): Describes the location table
+* [Locations](code_objects.md#source-code-locations): Describes the location table
 * [Frames](frames.md): Describes frames and the frame stack
 * [Objects/object_layout.md](../Objects/object_layout.md): Describes object layout for 3.11 and later
 * [Exception Handling](exception_handling.md): Describes the exception table
--- a/InternalDocs/exception_handling.md
+++ b/InternalDocs/exception_handling.md
@ -107,13 +107,12 @@
 -----------------------------

 Conceptually, the exception table consists of a sequence of 5-tuples:
-```
+
 1. `start-offset` (inclusive)
 2. `end-offset` (exclusive)
 3. `target`
 4. `stack-depth`
 5. `push-lasti` (boolean)
-```

 All offsets and lengths are in code units, not bytes.

@ -129,12 +128,13 @@
 It also happens that depth is generally quite small.

 So, we need to encode:
+
 ```
-    `start` (up to 30 bits)
-    `size` (up to 30 bits)
-    `target` (up to 30 bits)
-    `depth` (up to ~8 bits)
-    `lasti` (1 bit)
+start   (up to 30 bits)
+size    (up to 30 bits)
+target  (up to 30 bits)
+depth   (up to ~8 bits)
+lasti   (1 bit)
 ```

 We need a marker for the start of the entry, so the first byte of entry will have the most significant bit set.
@ -145,23 +145,26 @@
 In addition, we combine `depth` and `lasti` into a single value, `((depth<<1)+lasti)`, before encoding.

 For example, the exception entry:
+
 ```
-    `start`:  20
-    `end`:    28
-    `target`: 100
-    `depth`:  3
-    `lasti`:  False
+start:              20
+end:                28
+target:             100
+depth:              3
+lasti:              False
 ```

 is encoded by first converting to the more compact four value form:
+
 ```
-    `start`:         20
-    `size`:          8
-    `target`:        100
-  `depth<<1+lasti`:  6
+start:              20
+size:               8
+target:             100
+depth<<1+lasti:     6
 ```

 which is then encoded as:
+
 ```
 148     (MSB + 20 for start)
 8       (size)
--- a/InternalDocs/frames.md
+++ b/InternalDocs/frames.md
@ -27,6 +27,7 @@ # Allocation
 ## Layout

 Each activation record is laid out as:
+
 * Specials
 * Locals
 * Stack
--- a/InternalDocs/garbage_collector.md
+++ b/InternalDocs/garbage_collector.md
@ -1,4 +1,3 @@
-
 Garbage collector design
 ========================

--- a/InternalDocs/generators.md
+++ b/InternalDocs/generators.md
@ -1,4 +1,3 @@
-
 Generators
 ========== 

--- a/InternalDocs/interpreter.md
+++ b/InternalDocs/interpreter.md
@ -1,4 +1,3 @@
-
 The bytecode interpreter
 ========================

--- a/InternalDocs/parser.md
+++ b/InternalDocs/parser.md
@ -1,4 +1,3 @@
-
 Guide to the parser
 ===================

@ -444,15 +443,15 @@
 Once you have made the changes to the grammar files, to regenerate the `C`
 parser (the one used by the interpreter) just execute:

-```
-    make regen-pegen
+```shell
+$ make regen-pegen
 ```

 using the `Makefile` in the main directory.  If you are on Windows you can
 use the Visual Studio project files to regenerate the parser or to execute:

-```
-    ./PCbuild/build.bat --regen
+```dos
+PCbuild/build.bat --regen
 ```

 The generated parser file is located at [`Parser/parser.c`](../Parser/parser.c).
@ -468,15 +467,15 @@
 need to regenerate the meta-parser (the parser that parses the grammar files).
 To do so just execute:

-```
-    make regen-pegen-metaparser
+```shell
+$ make regen-pegen-metaparser
 ```

 If you are on Windows you can use the Visual Studio project files
 to regenerate the parser or to execute:

-```
-    ./PCbuild/build.bat --regen
+```dos
+PCbuild/build.bat --regen
 ```


@ -516,15 +515,15 @@
 file. If you change this file to add new tokens, make sure to regenerate the
 files by executing:

-```
-    make regen-token
+```shell
+$ make regen-token
 ```

 If you are on Windows you can use the Visual Studio project files to regenerate
 the tokens or to execute:

-```
-    ./PCbuild/build.bat --regen
+```dos
+PCbuild/build.bat --regen
 ```

 How tokens are generated and the rules governing this are completely up to the tokenizer
@ -593,7 +592,7 @@
 meaning in context. Trying to use a hard keyword as a variable will always
 fail:

-```
+```pycon
 >>> class = 3
 File "<stdin>", line 1
    class = 3
@ -609,7 +608,7 @@
 While soft keywords don't have this limitation if used in a context other the
 one where they are defined as keywords:

-```
+```pycon
 >>> match = 45
 >>> foo(match="Yeah!")
 ```
@ -621,7 +620,7 @@

 You can get a list of all keywords defined in the grammar from Python:

-```
+```pycon
 >>> import keyword
 >>> keyword.kwlist
 ['False', 'None', 'True', 'and', 'as', 'assert', 'async', 'await', 'break',
@ -632,7 +631,7 @@

 as well as soft keywords:

-```
+```pycon
 >>> import keyword
 >>> keyword.softkwlist
 ['_', 'case', 'match']
@ -798,7 +797,7 @@
 tests, depending on the nature of the new feature you are adding.

 Tests for the parser generator itself can be found in the
-[test_peg_generator](../Lib/test_peg_generator) directory.
+[test_peg_generator](../Lib/test/test_peg_generator) directory.


 Debugging generated parsers
@ -816,14 +815,14 @@
 parser. To do this, you can go to the [Tools/peg_generator](../Tools/peg_generator)
 directory on the CPython repository and manually call the parser generator by executing:

-```
+```shell
 $ python -m pegen python <PATH TO YOUR GRAMMAR FILE>
 ```

 This will generate a file called `parse.py` in the same directory that you
 can use to parse some input:

-```
+```shell
 $ python parse.py file_with_source_code_to_test.py
 ```

@ -848,7 +847,7 @@

 To activate verbose mode you can add the `-d` flag when executing Python:

-```
+```shell
 $ python -d file_to_test.py
 ```

--- a/InternalDocs/string_interning.md
+++ b/InternalDocs/string_interning.md
@ -2,6 +2,7 @@ # String interning

 *Interned* strings are conceptually part of an interpreter-global
 *set* of interned strings, meaning that:
+
 - no two interned strings have the same content (across an interpreter);
 - two interned strings can be safely compared using pointer equality
  (Python `is`).
@ -61,6 +62,7 @@ ## Immortality and reference counting

 The converse is not true: interned strings can be mortal.
 For mortal interned strings:
+
 - the 2 references from the interned dict (key & value) are excluded from
  their refcount
 - the deallocator (`unicode_dealloc`) removes the string from the interned dict
@ -90,6 +92,7 @@ ## Internal API
 The functions take ownership of (“steal”) the reference to their argument,
 and update the argument with a *new* reference.
 This means:
+
 - They're “reference neutral”.
 - They must not be called with a borrowed reference.