Stick to 80 char limit and add reference.

python · markshannon · Jul 1, 2021 · Jun 29, 2021 · Jun 30, 2021 · Jun 30, 2021
commit b0c9e51408c801487ea1ef75ae7b6d54bbf9af3f
diff --git a/Python/adaptive.md b/Python/adaptive.md
@@ -2,23 +2,30 @@
 
 ## Families of instructions
 
-The core part of PEP 659 (specializing adaptive interpreter) is the families of instructions that perform the adaptive specialization.
+The core part of PEP 659 (specializing adaptive interpreter) is the families
+of instructions that perform the adaptive specialization.
 
 A family of instructions has the following fundamental properties:
 
-* It corresponds to a single instruction in the code generated by the bytecode compiler.
+* It corresponds to a single instruction in the code
+  generated by the bytecode compiler.
 * It has a single adaptive instruction that records an execution count and,
-   at regular intervals, attempts to specialize itself. If not specializing, it executes
-  the non-adaptive instruction.
-* It has at least one specialized form of the instruction that is tailored for a particular value or set of values at runtime.
+  at regular intervals, attempts to specialize itself. If not specializing,
+  it executes the non-adaptive instruction.
+* It has at least one specialized form of the instruction that is tailored 
+  for a particular value or set of values at runtime.
 * All members of the family have access to same number of cache entries.
-* All members of the family have access to same number of cache entries.
+* All members of the family have access to the same number of cache entries.
-* All members of the family have access to same number of cache entries.
+* All members of the family have access to the same number of cache entries.
   Individual family members do not need to use all of the entries.
 
-The current implementation also requires the following, although these are not fundamental and may change:
+The current implementation also requires the following,
+although these are not fundamental and may change:
 
-* If a family uses one or more entries, then the first entry must be a `_PyAdaptiveEntry` entry.
-* If a family uses no cache entries, then the `oparg` is used as the counter for the adaptive instruction.
-* All instruction names should start with the name of the non-adaptive instruction.
+* If a family uses one or more entries, then the first entry must be a
+  `_PyAdaptiveEntry` entry.
+* If a family uses no cache entries, then the `oparg` is used as the
+  counter for the adaptive instruction.
+* All instruction names should start with the name of the non-adaptive
+  instruction.
 * The adaptive instruction should end in `_ADAPTIVE`.
 * Specialized forms should have names describing their specialization.
 
@@ -32,66 +39,81 @@ and `Tadaptive` is the mean time to execute the specialized and adaptive forms.
 
 `Tadaptive = (sum(Ti*Ni) + Tmiss*Nmiss)/(sum(Ni)+Nmiss)`
 
-`Ti` is the time to execute the `i`th instruction in the family and `Ni` is the number of times that instruction is executed.
-`Tmiss` is the time to process a miss, including de-optimzation and the time to execute the base instruction.
+`Ti` is the time to execute the `i`th instruction in the family and `Ni` is
+the number of times that instruction is executed.
+`Tmiss` is the time to process a miss, including de-optimzation
+and the time to execute the base instruction.
 
-The ideal situation is where misses are rare and the specialized forms are much faster than the base instruction.
+The ideal situation is where misses are rare and the specialized
+forms are much faster than the base instruction.
 `LOAD_GLOBAL` is near ideal, `Nmiss/sum(Ni) ≈ 0`.
 In which case we have `Tadaptive ≈ sum(Ti*Ni)`.
-Since we can expect the specialized forms `LOAD_GLOBAL_MODULE` and `LOAD_GLOBAL_BUILTIN` to be much faster than the adaptive base instruction, we would expect the specialization of `LOAD_GLOBAL` to be profitable.
+Since we can expect the specialized forms `LOAD_GLOBAL_MODULE` and
+`LOAD_GLOBAL_BUILTIN` to be much faster than the adaptive base instruction,
+we would expect the specialization of `LOAD_GLOBAL` to be profitable.
 
 ## Design considerations
 
-While `LOAD_GLOBAL` may be ideal, instructions like `LOAD_ATTR` and `CALL_FUNCTION` are not.
-For maximum performance we want to keep `Ti` low for all specialized instructions and `Nmiss` as low as possible.
+While `LOAD_GLOBAL` may be ideal, instructions like `LOAD_ATTR` and
+`CALL_FUNCTION` are not. For maximum performance we want to keep `Ti`
+low for all specialized instructions and `Nmiss` as low as possible.
 
 Keeping `Nmiss` low means that there should be specializations for almost
-all values seen by the base instruction. Keeping `sum(Ti*Ni)` low means keeping `Ti`
-low which means minimizing branches and dependent memory accesses (pointer chasing).
-These two objectives may be in conflict, requiring judgement and experimentation to
-design the family of instructions.
+all values seen by the base instruction. Keeping `sum(Ti*Ni)` low means
+keeping `Ti` low which means minimizing branches and dependent memory
+accesses (pointer chasing). These two objectives may be in conflict,
+requiring judgement and experimentation to design the family of instructions.
 
 ### Gathering data
 
-Before choosing how to specialize an instruction, it is important to gather some data. What are the pattern of usage of the base instruction?
-Data can best be gathered by instrumenting the interpreter.
-Since a specialization function and adaptive instruction are going to be required,
+Before choosing how to specialize an instruction, it is important to gather
+some data. What are the pattern of usage of the base instruction?
+Data can best be gathered by instrumenting the interpreter. Since a 
+specialization function and adaptive instruction are going to be required,
 instrumentation can most easily be added in the specialization function.
 
 ### Choice of specializations
 
-The performance of the specializing adaptive interpreter relies on the quality of
-specialization and keeping the overhead of specialization low.
+The performance of the specializing adaptive interpreter relies on the
+quality of specialization and keeping the overhead of specialization low.
 
-Specialized instructions must be fast. In order to be fast, specialized instructions should be tailored 
-for a particular set of values that allows them to:
+Specialized instructions must be fast. In order to be fast,
+specialized instructions should be tailored for a particular
+set of values that allows them to:
 1. Verify that incoming value is part of that set with low overhead.
 2. Perform the operation quickly.
 
-This requires that the set of values is chosen such that membership can be tested quickly and
-that membership is sufficient to allow the operation to performed quickly.
+This requires that the set of values is chosen such that membership can be
+tested quickly and that membership is sufficient to allow the operation to
+performed quickly.
 
-For example, `LOAD_GLOBAL_MODULE` is specialized for `globals()` dictionaries that have a keys with the expected version.
+For example, `LOAD_GLOBAL_MODULE` is specialized for `globals()`
+dictionaries that have a keys with the expected version.
 
 This can be tested quickly:
 * `globals->keys->dk_version == expected_version`
 
 and the operation can be performed quickly:
 * `value = globals->keys->entries[index].value`.
 
-Because it is impossible to measure the performance of an instruction without also
-measuring unrelated factors, the assessment of the quality of a specialization will require some judgement.
+Because it is impossible to measure the performance of an instruction without
+also measuring unrelated factors, the assessment of the quality of a
+specialization will require some judgement.
 
-As a general rule, specialized instructions should be much faster than the base instruction.
+As a general rule, specialized instructions should be much faster than the
+base instruction.
 
 ### Implementation of specialized instructions
 
 In general, specialized instructions should be implemented in two parts:
-1. A sequence of guards, each of the form `DEOPT_IF(guard-condition-is-false, BASE_NAME)`,
+1. A sequence of guards, each of the form
+  `DEOPT_IF(guard-condition-is-false, BASE_NAME)`,
   followed by a `record_cache_hit()`.
-2. The operation, which should ideally have no branches and a minimum number of dependent memory accesses.
+2. The operation, which should ideally have no branches and
+  a minimum number of dependent memory accesses.
 
-In practice, the parts may overlap, as data required for guards can be re-used in the operation.
+In practice, the parts may overlap, as data required for guards
+can be re-used in the operation.
 
-If there are branches in the operation, then consider further specialization to eliminate
-the branches.
+If there are branches in the operation, then consider further specialization
+to eliminate the branches.
diff --git a/Python/specialize.c b/Python/specialize.c
@@ -7,6 +7,10 @@
 #include "opcode.h"
 #include "structmember.h"         // struct PyMemberDef, T_OFFSET_EX
 
+/* For guidance on adding or extending families of instructions see
+ * ./adaptive.md
+ */
+
 
 /* We layout the quickened data as a bi-directional array:
  * Instructions upwards, cache entries downwards.