Skip to content

Commit d84dd7a

Browse files
Bring compiler docs up to speed with Python 3.10 (python#706)
1 parent c8f80ae commit d84dd7a

File tree

1 file changed

+165
-53
lines changed

1 file changed

+165
-53
lines changed

compiler.rst

Lines changed: 165 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -139,8 +139,8 @@ that is needed to completely free all memory used by the compiler.
139139
In general, unless you are working on the critical core of the compiler, memory
140140
management can be completely ignored. But if you are working at either the
141141
very beginning of the compiler or the end, you need to care about how the arena
142-
works. All code relating to the arena is in either :file:`Include/pyarena.h` or
143-
:file:`Python/pyarena.c`.
142+
works. All code relating to the arena is in either
143+
:file:`Include/Internal/pycore_pyarena.h` or :file:`Python/pyarena.c`.
144144

145145
``PyArena_New()`` will create a new arena. The returned ``PyArena`` structure
146146
will store pointers to all memory given to it. This does the bookkeeping of
@@ -159,49 +159,131 @@ are very rare. However, if you've allocated a PyObject, you must tell
159159
the arena about it by calling ``PyArena_AddPyObject()``.
160160

161161

162-
Parse Tree to AST
163-
-----------------
162+
Source Code to AST
163+
------------------
164+
165+
The AST is generated from source code using the function
166+
``_PyParser_ASTFromString()`` or ``_PyParser_ASTFromFile()``
167+
(from :file:`Parser/peg_api.c`) depending on the input type.
168+
169+
After some checks, a helper function in :file:`Parser/parser.c` begins applying
170+
production rules on the source code it receives; converting source code to
171+
tokens and matching these tokens recursively to their corresponding rule. The
172+
rule's corresponding rule function is called on every match. These rule
173+
functions follow the format :samp:`xx_rule`. Where *xx* is the grammar rule
174+
that the function handles and is automatically derived from
175+
:file:`Grammar/python.gram` by :file:`Tools/peg_generator/pegen/c_generator.py`.
176+
177+
Each rule function in turn creates an AST node as it goes along. It does this
178+
by allocating all the new nodes it needs, calling the proper AST node creation
179+
functions for any required supporting functions and connecting them as needed.
180+
This continues until all nonterminal symbols are replaced with terminals. If an
181+
error occurs, the rule functions backtrack and try another rule function. If
182+
there are no more rules, an error is set and the parsing ends.
183+
184+
The AST node creation helper functions have the name :samp:`_PyAST_{xx}`
185+
where *xx* is the AST node that the function creates. These are defined by the
186+
ASDL grammar and contained in :file:`Python/Python-ast.c` (which is generated by
187+
:file:`Parser/asdl_c.py` from :file:`Parser/Python.asdl`). This all leads to a
188+
sequence of AST nodes stored in ``asdl_seq`` structs.
189+
190+
To demonstrate everything explained so far, here's the
191+
rule function responsible for a simple named import statement such as
192+
``import sys``. Note that error-checking and debugging code has been
193+
omitted. Removed parts are represented by ``...``.
194+
Furthermore, some comments have been added for explanation. These comments
195+
may not be present in the actual code.
196+
197+
.. code-block:: c
164198
165-
The AST is generated from the parse tree (see :file:`Python/ast.c`) using the
166-
function ``PyAST_FromNode()``.
199+
// This is the production rule (from python.gram) the rule function
200+
// corresponds to:
201+
// import_name: 'import' dotted_as_names
202+
static stmt_ty
203+
import_name_rule(Parser *p)
204+
{
205+
...
206+
stmt_ty _res = NULL;
207+
{ // 'import' dotted_as_names
208+
...
209+
Token * _keyword;
210+
asdl_alias_seq* a;
211+
// The tokenizing steps.
212+
if (
213+
(_keyword = _PyPegen_expect_token(p, 513)) // token='import'
214+
&&
215+
(a = dotted_as_names_rule(p)) // dotted_as_names
216+
)
217+
{
218+
...
219+
// Generate an AST for the import statement.
220+
_res = _PyAST_Import ( a , ...);
221+
...
222+
goto done;
223+
}
224+
...
225+
}
226+
_res = NULL;
227+
done:
228+
...
229+
return _res;
230+
}
167231
168-
The function begins a tree walk of the parse tree, creating various AST
169-
nodes as it goes along. It does this by allocating all new nodes it
170-
needs, calling the proper AST node creation functions for any required
171-
supporting functions, and connecting them as needed.
172232
173-
Do realize that there is no automated nor symbolic connection between
174-
the grammar specification and the nodes in the parse tree. No help is
175-
directly provided by the parse tree as in yacc.
233+
To improve backtracking performance, some rules (chosen by applying a
234+
``(memo)`` flag in the grammar file) are memoized. Each rule function checks if
235+
a memoized version exists and returns that if so, else it continues in the
236+
manner stated in the previous paragraphs.
176237

177-
For instance, one must keep track of which node in the parse tree
178-
one is working with (e.g., if you are working with an 'if' statement
179-
you need to watch out for the ':' token to find the end of the conditional).
238+
There are macros for creating and using ``asdl_xx_seq *`` types, where *xx* is
239+
a type of the ASDL sequence. Three main types are defined
240+
manually -- ``generic``, ``identifier`` and ``int``. These types are found in
241+
:file:`Python/asdl.c` and its corresponding header file
242+
:file:`Include/Internal/pycore_asdl.h`. Functions and macros
243+
for creating ``asdl_xx_seq *`` types are as follows:
180244

181-
The functions called to generate AST nodes from the parse tree all have
182-
the name :samp:`ast_for_{xx}` where *xx* is the grammar rule that the function
183-
handles (``alias_for_import_name`` is the exception to this). These in turn
184-
call the constructor functions as defined by the ASDL grammar and
185-
contained in :file:`Python/Python-ast.c` (which was generated by
186-
:file:`Parser/asdl_c.py`) to create the nodes of the AST. This all leads to a
187-
sequence of AST nodes stored in ``asdl_seq`` structs.
245+
``_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)``
246+
Allocate memory for an ``asdl_int_seq`` of the specified length
247+
``_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)``
248+
Allocate memory for an ``asdl_identifier_seq`` of the specified length
249+
``_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)``
250+
Allocate memory for an ``asdl_generic_seq`` of the specified length
188251

189-
Function and macros for creating and using ``asdl_seq *`` types as found
190-
in :file:`Python/asdl.c` and :file:`Include/asdl.h` are as follows:
252+
In addition to the three types mentioned above, some ASDL sequence types are
253+
automatically generated by :file:`Parser/asdl_c.py` and found in
254+
:file:`Include/Internal/pycore_ast.h`. Macros for using both manually defined
255+
and automatically generated ASDL sequence types are as follows:
191256

192-
``_Py_asdl_seq_new(Py_ssize_t, PyArena *)``
193-
Allocate memory for an ``asdl_seq`` for the specified length
194-
``asdl_seq_GET(asdl_seq *, int)``
257+
``asdl_seq_GET(asdl_xx_seq *, int)``
258+
Get item held at a specific position in an ``asdl_xx_seq``
259+
``asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)``
260+
Set a specific index in an ``asdl_xx_seq`` to the specified value
261+
262+
Untyped counterparts exist for some of the typed macros. These are useful
263+
when a function needs to manipulate a generic ASDL sequence:
264+
265+
``asdl_seq_GET_UNTYPED(asdl_seq *, int)``
195266
Get item held at a specific position in an ``asdl_seq``
196-
``asdl_seq_SET(asdl_seq *, int, stmt_ty)``
267+
``asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)``
197268
Set a specific index in an ``asdl_seq`` to the specified value
198269
``asdl_seq_LEN(asdl_seq *)``
199-
Return the length of an ``asdl_seq``
270+
Return the length of an ``asdl_seq`` or ``asdl_xx_seq``
271+
272+
Note that typed macros and functions are recommended over their untyped
273+
counterparts. Typed macros carry out checks in debug mode and aid
274+
debugging errors caused by incorrectly casting from ``void *``.
200275

201276
If you are working with statements, you must also worry about keeping
202277
track of what line number generated the statement. Currently the line
203278
number is passed as the last parameter to each ``stmt_ty`` function.
204279

280+
.. versionchanged:: 3.9
281+
The new PEG parser generates an AST directly without creating a
282+
parse tree. ``Python/ast.c`` is now only used to validate the AST for
283+
debugging purposes.
284+
285+
.. seealso:: :pep:`617` (PEP 617 -- New PEG parser for CPython)
286+
205287

206288
Control Flow Graphs
207289
-------------------
@@ -248,13 +330,13 @@ global). With that done, the second pass essentially flattens the CFG
248330
into a list and calculates jump offsets for final output of bytecode.
249331

250332
The conversion process is initiated by a call to the function
251-
``PyAST_CompileObject()`` in :file:`Python/compile.c`. This function does both the
333+
``_PyAST_Compile()`` in :file:`Python/compile.c`. This function does both the
252334
conversion of the AST to a CFG and outputting final bytecode from the CFG.
253335
The AST to CFG step is handled mostly by two functions called by
254-
``PyAST_CompileObject()``; ``PySymtable_BuildObject()`` and ``compiler_mod()``. The former
336+
``_PyAST_Compile()``; ``_PySymtable_Build()`` and ``compiler_mod()``. The former
255337
is in :file:`Python/symtable.c` while the latter is in :file:`Python/compile.c`.
256338

257-
``PySymtable_BuildObject()`` begins by entering the starting code block for the
339+
``_PySymtable_Build()`` begins by entering the starting code block for the
258340
AST (passed-in) and then calling the proper :samp:`symtable_visit_{xx}` function
259341
(with *xx* being the AST node type). Next, the AST tree is walked with
260342
the various code blocks that delineate the reach of a local variable
@@ -369,6 +451,9 @@ command before adding the new bytecode target to :file:`Python/ceval.c` will
369451
result in an error. You should only run ``make regen-importlib`` after the new
370452
bytecode target has been added.
371453

454+
.. note:: On Windows, running the ``./build.bat`` script will automatically
455+
regenerate the required files without requiring additional arguments.
456+
372457
Finally, you need to introduce the use of the new bytecode. Altering
373458
:file:`Python/compile.c` and :file:`Python/ceval.c` will be the primary places
374459
to change. You must add the case for a new opcode into the 'switch'
@@ -419,7 +504,28 @@ Important Files
419504

420505
asdl_c.py
421506
"Generate C code from an ASDL description." Generates
422-
:file:`Python/Python-ast.c` and :file:`Include/Python-ast.h`.
507+
:file:`Python/Python-ast.c` and :file:`Include/Internal/pycore_ast.h`.
508+
509+
parser.c
510+
The new PEG parser introduced in Python 3.9.
511+
Generated by :file:`Tools/peg_generator/pegen/c_generator.py`
512+
from the grammar :file:`Grammar/python.gram`. Creates the AST from
513+
source code. Rule functions for their corresponding production rules
514+
are found here.
515+
516+
peg_api.c
517+
Contains high-level functions which are used by the interpreter to
518+
create an AST from source code .
519+
520+
pegen.c
521+
Contains helper functions which are used by functions in
522+
:file:`Parser/parser.c` to construct the AST. Also contains helper
523+
functions which help raise better error messages when parsing source
524+
code.
525+
526+
pegen.h
527+
Header file for the corresponding :file:`Parser/pegen.c`. Also contains
528+
definitions of the ``Parser`` and ``Token`` structs.
423529

424530
+ Python/
425531

@@ -437,7 +543,7 @@ Important Files
437543
identifier. Used by :file:`Python-ast.c` for marshalling AST nodes.
438544

439545
ast.c
440-
Converts Python's parse tree into the abstract syntax tree.
546+
Used for validating the AST.
441547

442548
ast_opt.c
443549
Optimizes the AST.
@@ -469,31 +575,37 @@ Important Files
469575

470576
+ Include/
471577

472-
Python-ast.h
473-
Contains the actual definitions of the C structs as generated by
474-
:file:`Python/Python-ast.c`.
475-
"Automatically generated by :file:`Parser/asdl_c.py`".
476-
477-
asdl.h
478-
Header for the corresponding :file:`Python/ast.c`.
479-
480-
ast.h
481-
Declares ``PyAST_FromNode()`` external (from :file:`Python/ast.c`).
482-
483578
code.h
484579
Header file for :file:`Objects/codeobject.c`; contains definition of
485580
``PyCodeObject``.
486581

487-
symtable.h
488-
Header for :file:`Python/symtable.c`. ``struct symtable`` and
489-
``PySTEntryObject`` are defined here.
490-
491-
pyarena.h
492-
Header file for the corresponding :file:`Python/pyarena.c`.
493-
494582
opcode.h
495583
One of the files that must be modified if :file:`Lib/opcode.py` is.
496584

585+
+ Internal/
586+
587+
pycore_ast.h
588+
Contains the actual definitions of the C structs as generated by
589+
:file:`Python/Python-ast.c`.
590+
"Automatically generated by :file:`Parser/asdl_c.py`".
591+
592+
pycore_asdl.h
593+
Header for the corresponding :file:`Python/ast.c`
594+
595+
pycore_ast.h
596+
Declares ``_PyAST_Validate()`` external (from :file:`Python/ast.c`).
597+
598+
pycore_symtable.h
599+
Header for :file:`Python/symtable.c`. ``struct symtable`` and
600+
``PySTEntryObject`` are defined here.
601+
602+
pycore_parser.h
603+
Header for the corresponding :file:`Parser/peg_api.c`.
604+
605+
pycore_pyarena.h
606+
Header file for the corresponding :file:`Python/pyarena.c`.
607+
608+
497609
+ Objects/
498610

499611
codeobject.c

0 commit comments

Comments
 (0)