Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
cancel some edits
  • Loading branch information
picnixz committed Dec 1, 2024
commit f6476e86390cdbd643b62c80fc95baa10ace0897
2 changes: 1 addition & 1 deletion InternalDocs/changing_grammar.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Below is a checklist of things that may need to change.
to regenerate [`Parser/parser.c`](../Parser/parser.c).
(This runs Python's parser generator, [`Tools/peg_generator`](../Tools/peg_generator)).

* [`Grammar/Tokens`](../Grammar/Tokens) is a place for adding new token types. After
* [`Grammar/Tokens`](../Grammar/Tokens) is a place for adding new token types. After
changing it, run ``make regen-token`` to regenerate
[`Include/internal/pycore_token.h`](../Include/internal/pycore_token.h),
[`Parser/token.c`](../Parser/token.c), [`Lib/token.py`](../Lib/token.py)
Expand Down
117 changes: 58 additions & 59 deletions InternalDocs/compiler.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion InternalDocs/exception_handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,5 +190,5 @@ Exception Chaining Implementation
refers to setting the `__context__` and `__cause__` fields of an exception as it is
being raised. The `__context__` field is set by `_PyErr_SetObject()` in
[Python/errors.c](../Python/errors.c) (which is ultimately called by all
`PyErr_Set*()` functions). The `__cause__` field (explicit chaining) is set by
`PyErr_Set*()` functions). The `__cause__` field (explicit chaining) is set by
the `RAISE_VARARGS` bytecode.
4 changes: 2 additions & 2 deletions InternalDocs/frames.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,10 +125,10 @@ the next instruction to be executed. During a call to a python function,
to see in an exception traceback.

The `return_offset` field determines where a `RETURN` should go in the caller,
relative to `instr_ptr`. It is only meaningful to the callee, so it needs to
relative to `instr_ptr`. It is only meaningful to the callee, so it needs to
be set in any instruction that implements a call (to a Python function),
including CALL, SEND and BINARY_SUBSCR_GETITEM, among others. If there is no
callee, then return_offset is meaningless. It is necessary to have a separate
callee, then return_offset is meaningless. It is necessary to have a separate
field for the return offset because (1) if we apply this offset to `instr_ptr`
while executing the `RETURN`, this is too early and would lose us information
about the previous instruction which we could need for introspecting and
Expand Down
56 changes: 28 additions & 28 deletions InternalDocs/garbage_collector.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Starting in version 3.13, CPython contains two GC implementations:
performing a collection for thread safety.

Both implementations use the same basic algorithms, but operate on different
data structures. See the section on
data structures. See the section on
[Differences between GC implementations](#Differences-between-GC-implementations)
for the details.

Expand All @@ -64,7 +64,7 @@ Memory layout and object structure
==================================

The garbage collector requires additional fields in Python objects to support
garbage collection. These extra fields are different in the default and the
garbage collection. These extra fields are different in the default and the
free-threaded builds.


Expand Down Expand Up @@ -111,11 +111,11 @@ that in the [Optimization: incremental collection](#Optimization-incremental-col
they are also reused to fulfill other purposes when the full doubly linked list
structure is not needed as a memory optimization.

Doubly linked lists are used because they efficiently support the most frequently required operations. In
Doubly linked lists are used because they efficiently support the most frequently required operations. In
general, the collection of all objects tracked by GC is partitioned into disjoint sets, each in its own
doubly linked list. Between collections, objects are partitioned into "generations", reflecting how
often they've survived collection attempts. During collections, the generation(s) being collected
are further partitioned into, for example, sets of reachable and unreachable objects. Doubly linked lists
doubly linked list. Between collections, objects are partitioned into "generations", reflecting how
often they've survived collection attempts. During collections, the generation(s) being collected
are further partitioned into, for example, sets of reachable and unreachable objects. Doubly linked lists
support moving an object from one partition to another, adding a new object, removing an object
entirely (objects tracked by GC are most often reclaimed by the refcounting system when GC
isn't running at all!), and merging partitions, all with a small constant number of pointer updates.
Expand All @@ -128,7 +128,7 @@ GC for the free-threaded build
In the free-threaded build, Python objects contain a 1-byte field
`ob_gc_bits` that is used to track garbage collection related state. The
field exists in all objects, including ones that do not support cyclic
garbage collection. The field is used to identify objects that are tracked
garbage collection. The field is used to identify objects that are tracked
by the collector, ensure that finalizers are called only once per object,
and, during garbage collection, differentiate reachable vs. unreachable objects.

Expand Down Expand Up @@ -192,7 +192,7 @@ the interpreter create cycles everywhere. Some notable examples:
have internal links to themselves.

To correctly dispose of these objects once they become unreachable, they need
to be identified first. To understand how the algorithm works, let’s take
to be identified first. To understand how the algorithm works, let’s take
the case of a circular linked list which has one link referenced by a
variable `A`, and one self-referencing object which is completely
unreachable:
Expand Down Expand Up @@ -220,15 +220,15 @@ unreachable:
2
```

The GC starts with a set of candidate objects it wants to scan. In the
The GC starts with a set of candidate objects it wants to scan. In the
default build, these "objects to scan" might be all container objects or a
smaller subset (or "generation"). In the free-threaded build, the collector
smaller subset (or "generation"). In the free-threaded build, the collector
always scans all container objects.

The objective is to identify all the unreachable objects. The collector does
The objective is to identify all the unreachable objects. The collector does
this by identifying reachable objects; the remaining objects must be
unreachable. The first step is to identify all of the "to scan" objects that
are **directly** reachable from outside the set of candidate objects. These
unreachable. The first step is to identify all of the "to scan" objects that
are **directly** reachable from outside the set of candidate objects. These
objects have a refcount larger than the number of incoming references from
within the candidate set.

Expand All @@ -241,7 +241,7 @@ interpreter will not modify the real reference count field.
![gc-image1](images/python-cyclic-gc-1-new-page.png)

The GC then iterates over all containers in the first list and decrements by one the
`gc_ref` field of any other object that container is referencing. Doing
`gc_ref` field of any other object that container is referencing. Doing
this makes use of the `tp_traverse` slot in the container class (implemented
using the C API or inherited by a superclass) to know what objects are referenced by
each container. After all the objects have been scanned, only the objects that have
Expand Down Expand Up @@ -273,7 +273,7 @@ When the GC encounters an object which is reachable (`gc_ref > 0`), it traverses
its references using the `tp_traverse` slot to find all the objects that are
reachable from it, moving them to the end of the list of reachable objects (where
they started originally) and setting its `gc_ref` field to 1. This is what happens
to `link_2` and `link_3` below as they are reachable from `link_1`. From the
to `link_2` and `link_3` below as they are reachable from `link_1`. From the
state in the previous image and after examining the objects referred to by `link_1`
the GC knows that `link_3` is reachable after all, so it is moved back to the
original list and its `gc_ref` field is set to 1 so that if the GC visits it again,
Expand All @@ -293,7 +293,7 @@ list are really unreachable and can thus be garbage collected.

Pragmatically, it's important to note that no recursion is required by any of this,
and neither does it in any other way require additional memory proportional to the
number of objects, number of pointers, or the lengths of pointer chains. Apart from
number of objects, number of pointers, or the lengths of pointer chains. Apart from
`O(1)` storage for internal C needs, the objects themselves contain all the storage
the GC algorithms require.

Expand All @@ -317,8 +317,8 @@ list.
So instead of not moving at all, the reachable objects B and A are each moved twice.
Why is this a win? A straightforward algorithm to move the reachable objects instead
would move A, B, and C once each. The key is that this dance leaves the objects in
order C, B, A - it's reversed from the original order. On all *subsequent* scans,
none of them will move. Since most objects aren't in cycles, this can save an
order C, B, A - it's reversed from the original order. On all *subsequent* scans,
none of them will move. Since most objects aren't in cycles, this can save an
unbounded number of moves across an unbounded number of later collections. The only
time the cost can be higher is the first time the chain is scanned.

Expand All @@ -331,7 +331,7 @@ follows these steps in order:

1. Handle and clear weak references (if any). Weak references to unreachable objects
are set to `None`. If the weak reference has an associated callback, the callback
is enqueued to be called once the clearing of weak references is finished. We only
is enqueued to be called once the clearing of weak references is finished. We only
invoke callbacks for weak references that are themselves reachable. If both the weak
reference and the pointed-to object are unreachable we do not execute the callback.
This is partly for historical reasons: the callback could resurrect an unreachable
Expand Down Expand Up @@ -490,7 +490,7 @@ to the size of the data, often a word or multiple thereof. This discrepancy
leaves a few of the least significant bits of the pointer unused, which can be
used for tags or to keep other information – most often as a bit field (each
bit a separate tag) – as long as code that uses the pointer masks out these
bits before accessing memory. For example, on a 32-bit architecture (for both
bits before accessing memory. For example, on a 32-bit architecture (for both
addresses and word size), a word is 32 bits = 4 bytes, so word-aligned
addresses are always a multiple of 4, hence end in `00`, leaving the last 2 bits
available; while on a 64-bit architecture, a word is 64 bits = 8 bytes, so
Expand Down Expand Up @@ -519,10 +519,10 @@ of `PyGC_Head` discussed in the `Memory layout and object structure`_ section:
- The `_gc_next` field is used as the "next" pointer to maintain the doubly linked
list but during collection its lowest bit is used to keep the
`NEXT_MASK_UNREACHABLE` flag that indicates if an object is tentatively
unreachable during the cycle detection algorithm. This is a drawback to using only
unreachable during the cycle detection algorithm. This is a drawback to using only
doubly linked lists to implement partitions: while most needed operations are
constant-time, there is no efficient way to determine which partition an object is
currently in. Instead, when that's needed, ad hoc tricks (like the
currently in. Instead, when that's needed, ad hoc tricks (like the
`NEXT_MASK_UNREACHABLE` flag) are employed.

Optimization: delayed untracking containers
Expand Down Expand Up @@ -581,29 +581,29 @@ structure, while the free-threaded build implementation does not use that
data structure.

- The default build implementation stores all tracked objects in a doubly
linked list using `PyGC_Head`. The free-threaded build implementation
linked list using `PyGC_Head`. The free-threaded build implementation
instead relies on the embedded mimalloc memory allocator to scan the heap
for tracked objects.
- The default build implementation uses `PyGC_Head` for the unreachable
object list. The free-threaded build implementation repurposes the
object list. The free-threaded build implementation repurposes the
`ob_tid` field to store a unreachable objects linked list.
- The default build implementation stores flags in the `_gc_prev` field of
`PyGC_Head`. The free-threaded build implementation stores these flags
`PyGC_Head`. The free-threaded build implementation stores these flags
in `ob_gc_bits`.


The default build implementation relies on the
[global interpreter lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock)
for thread safety. The free-threaded build implementation has two "stop the
for thread safety. The free-threaded build implementation has two "stop the
world" pauses, in which all other executing threads are temporarily paused so
that the GC can safely access reference counts and object attributes.

The default build implementation is a generational collector. The
The default build implementation is a generational collector. The
free-threaded build is non-generational; each collection scans the entire
heap.

- Keeping track of object generations is simple and inexpensive in the default
build. The free-threaded build relies on mimalloc for finding tracked
build. The free-threaded build relies on mimalloc for finding tracked
objects; identifying "young" objects without scanning the entire heap would
be more difficult.

Expand Down
32 changes: 16 additions & 16 deletions InternalDocs/interpreter.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ It also has a reference to the `CodeObject` itself.
In addition to the frame, `_PyEval_EvalFrame()` also receives a
[`Thread State`](https://docs.python.org/3/c-api/init.html#c.PyThreadState)
object, `tstate`, which includes things like the exception state and the
recursion depth. The thread state also provides access to the per-interpreter
recursion depth. The thread state also provides access to the per-interpreter
state (`tstate->interp`), which has a pointer to the per-runtime (that is,
truly global) state (`tstate->interp->runtime`).

Expand Down Expand Up @@ -130,15 +130,15 @@ The size of the inline cache for a particular instruction is fixed by its `opcod
Moreover, the inline cache size for all instructions in a
[family of specialized/specializable instructions](adaptive.md)
(for example, `LOAD_ATTR`, `LOAD_ATTR_SLOT`, `LOAD_ATTR_MODULE`) must all be
the same. Cache entries are reserved by the compiler and initialized with zeros.
the same. Cache entries are reserved by the compiler and initialized with zeros.
Although they are represented by code units, cache entries do not conform to the
`opcode` / `oparg` format.

If an instruction has an inline cache, the layout of its cache is described by
a `struct` definition in (`pycore_code.h`)[../Include/internal/pycore_code.h].
This allows us to access the cache by casting `next_instr` to a pointer to this `struct`.
The size of such a `struct` must be independent of the machine architecture, word size
and alignment requirements. For a 32-bit field, the `struct` should use `_Py_CODEUNIT field[2]`.
and alignment requirements. For a 32-bit field, the `struct` should use `_Py_CODEUNIT field[2]`.

The instruction implementation is responsible for advancing `next_instr` past the inline cache.
For example, if an instruction's inline cache is four bytes (that is, two code units) in size,
Expand Down Expand Up @@ -210,12 +210,12 @@ In 3.10 and before, this was the case even when a Python function called
another Python function:
The `CALL` opcode would call the `tp_call` dispatch function of the
callee, which would extract the code object, create a new frame for the call
stack, and then call back into the interpreter. This approach is very general
stack, and then call back into the interpreter. This approach is very general
but consumes several C stack frames for each nested Python call, thereby
increasing the risk of an (unrecoverable) C stack overflow.

Since 3.11, the `CALL` instruction special-cases function objects to "inline"
the call. When a call gets inlined, a new frame gets pushed onto the call
the call. When a call gets inlined, a new frame gets pushed onto the call
stack and the interpreter "jumps" to the start of the callee's bytecode.
When an inlined callee executes a `RETURN_VALUE` instruction, the frame is
popped off the call stack and the interpreter returns to its caller,
Expand Down Expand Up @@ -248,12 +248,12 @@ In this case we allocate a proper `PyFrameObject` and initialize it from the

Things get more complicated when generators are involved, since those do not
follow the push/pop model. This includes async functions, which are based on
the same mechanism. A generator object has space for a `_PyInterpreterFrame`
the same mechanism. A generator object has space for a `_PyInterpreterFrame`
structure, including the variable-size part (used for locals and the eval stack).
When a generator (or async) function is first called, a special opcode
`RETURN_GENERATOR` is executed, which is responsible for creating the
generator object. The generator object's `_PyInterpreterFrame` is initialized
with a copy of the current stack frame. The current stack frame is then popped
generator object. The generator object's `_PyInterpreterFrame` is initialized
with a copy of the current stack frame. The current stack frame is then popped
off the frame stack and the generator object is returned.
(Details differ depending on the `is_entry` flag.)
When the generator is resumed, the interpreter pushes its `_PyInterpreterFrame`
Expand Down Expand Up @@ -317,9 +317,9 @@ With a new bytecode you must also change what is called the "magic number" for
.pyc files: bump the value of the variable `MAGIC_NUMBER` in
[`Lib/importlib/_bootstrap_external.py`](../Lib/importlib/_bootstrap_external.py).
Changing this number will lead to all .pyc files with the old `MAGIC_NUMBER`
to be recompiled by the interpreter on import. Whenever `MAGIC_NUMBER` is
to be recompiled by the interpreter on import. Whenever `MAGIC_NUMBER` is
changed, the ranges in the `magic_values` array in
[`PC/launcher.c`](../PC/launcher.c) may also need to be updated. Changes to
[`PC/launcher.c`](../PC/launcher.c) may also need to be updated. Changes to
[`Lib/importlib/_bootstrap_external.py`](../Lib/importlib/_bootstrap_external.py)
will take effect only after running `make regen-importlib`.

Expand All @@ -333,12 +333,12 @@ will take effect only after running `make regen-importlib`.
> On Windows, running the `./build.bat` script will automatically
> regenerate the required files without requiring additional arguments.

Finally, you need to introduce the use of the new bytecode. Update
Finally, you need to introduce the use of the new bytecode. Update
[`Python/codegen.c`](../Python/codegen.c) to emit code with this bytecode.
Optimizations in [`Python/flowgraph.c`](../Python/flowgraph.c) may also
need to be updated. If the new opcode affects a control flow or the block
need to be updated. If the new opcode affects a control flow or the block
stack, you may have to update the `frame_setlineno()` function in
[`Objects/frameobject.c`](../Objects/frameobject.c). It may also be necessary
[`Objects/frameobject.c`](../Objects/frameobject.c). It may also be necessary
to update [`Lib/dis.py`](../Lib/dis.py) if the new opcode interprets its
argument in a special way (like `FORMAT_VALUE` or `MAKE_FUNCTION`).

Expand All @@ -347,12 +347,12 @@ is already in existence and you do not change the magic number, make
sure to delete your old .py(c|o) files! Even though you will end up changing
the magic number if you change the bytecode, while you are debugging your work
you may be changing the bytecode output without constantly bumping up the
magic number. This can leave you with stale .pyc files that will not be
magic number. This can leave you with stale .pyc files that will not be
recreated.
Running `find . -name '*.py[co]' -exec rm -f '{}' +` should delete all .pyc
files you have, forcing new ones to be created and thus allow you test out your
new bytecode properly. Run `make regen-importlib` for updating the
bytecode of frozen importlib files. You have to run `make` again after this
new bytecode properly. Run `make regen-importlib` for updating the
bytecode of frozen importlib files. You have to run `make` again after this
to recompile the generated C files.

Additional resources
Expand Down
Loading