This is a continuation proposal of PEP-489 and later PEPs. PEP-630 notes:
Currently (as of Python 3.10), heap types have no good API to write Py*_Check functions (like PyUnicode_Check exists for str, a static type), and so it is not easy to ensure that instances have a particular C layout.
One known solution is to assign a C layout ID to particular heaptypes. It will be helpful for subclass checking in tp slot methods (e.g. nb_add, tp_dealloc), especially at the final phase where we cannot rely on the module state1.
For more context, see: https://discuss.python.org/t/55598/2
Proposal
-
Adding a pointer member to heaptypes, then asking module authors to assign a preferable value (token) if they agree that:
- The pointer outlives the class, so it's not reused for something else while the class exists.
- It is "owned" by the module where the class lives, so it won't clash with other modules.
For example, an extension modules that automatically wraps C++ classes could assign the typeid.
-
Introducing Py_tp_token slot for the entry:
PyType_Slot foo_slots[] = {
{Py_tp_token, &pointee_in_the_module},
...
};
Unlike other type slots, this slot will accept NULL through the new dedicated Py_TP_USE_SPEC identifier:
{Py_tp_token, Py_TP_USE_SPEC}
The option above will instruct the PyType_FromMetaclass function to use its spec argument as a token (the slot's actual value).
An absence of the slot will disable the feature.
-
Introducing PyType_GetBaseByToken(type, token, ...) helper function
It will find a class whose token is valid and equal to the given one, from the type and superclasses.
Specification
-
The PyHeapTypeObject struct will have a new member, the ht_token void pointer (NULL by default), which will not be inherited by subclasses.
-
The existing PyType_FromMetaclass(..., spec, ...) function will do the following, when the proposed slot ID, Py_tp_token, is detected in spec->slots:
if PyType_Slot.pfunc == Py_TP_USE_SPEC: # NULL check
ht_token = spec
else:
ht_token = PyType_Slot.pfunc
-
PyType_GetSlot(type, Py_tp_token) will return NULL if a static type is given.
-
A helper function will be:
int PyType_GetBaseByToken(PyTypeObject *type, void *token,
PyTypeObject **result)
Scan only the heaptypes that have a non-NULL token, walking the type's tp_mro if exists, or walking the tp_bases recursively.
- On error, set *result to
NULL, set an exception, return -1.
- If there is no type whose token is equal to the given one, set *result to
NULL and return 0.
- Otherwise: set *result to the first found type, return
1.
- (UPDATE) Raise
SystemError when token is NULL.
- (UPDATE) Raise
TypeError when PyType_Check(type) returns false.
The result argument accepts NULL not to assign a reference (check only mode).
Reference implementation
Performance
A subclass check in a slot method currently consists of the following steps:
PyType_GetModuleByDef (walks MRO)
PyModule_GetState
Py*_CheckExact
PyType_IsSubtype (walks MRO)
PyType_GetBaseByToken is cheaper than (1)+(2)+(3), but a little more expensive than 4.PyType_IsSubtype2. Mostly, using the new function alone will be efficient enough except when staying in C functions and repeating (3)(4) with a module state passed around3.
Backwards Compatibility
- One new pointer,
ht_token, is added to heap types.
- One slot ID,
Py_tp_token, is added with an identifier, Py_TP_USE_SPEC.
- One helper function,
PyType_GetBaseByToken, is added, whose documentation will mention the new slot above.
UPDATE: Py_tp_token, Py_TP_USE_SPEC and PyType_GetBaseByToken will be documented individually.
Previous discussions
This is a continuation proposal of PEP-489 and later PEPs. PEP-630 notes:
One known solution is to assign a C layout ID to particular heaptypes. It will be helpful for subclass checking in tp slot methods (e.g.
nb_add,tp_dealloc), especially at the final phase where we cannot rely on the module state1.For more context, see: https://discuss.python.org/t/55598/2
Proposal
Adding a pointer member to heaptypes, then asking module authors to assign a preferable value (token) if they agree that:
For example, an extension modules that automatically wraps C++ classes could assign the
typeid.Introducing
Py_tp_tokenslot for the entry:Unlike other type slots, this slot will accept
NULLthrough the new dedicatedPy_TP_USE_SPECidentifier:{Py_tp_token, Py_TP_USE_SPEC}The option above will instruct the
PyType_FromMetaclassfunction to use itsspecargument as a token (the slot's actual value).An absence of the slot will disable the feature.
Introducing
PyType_GetBaseByToken(type, token, ...)helper functionIt will find a class whose token is valid and equal to the given one, from the type and superclasses.
Specification
The
PyHeapTypeObjectstruct will have a new member, theht_tokenvoid pointer (NULLby default), which will not be inherited by subclasses.The existing
PyType_FromMetaclass(..., spec, ...)function will do the following, when the proposed slot ID,Py_tp_token, is detected inspec->slots:PyType_GetSlot(type, Py_tp_token)will returnNULLif a static type is given.A helper function will be:
Scan only the heaptypes
that have a non-NULL token, walking the type'stp_mroif exists, or walking thetp_basesrecursively.NULL, set an exception, return-1.NULLand return0.1.SystemErrorwhentokenisNULL.TypeErrorwhenPyType_Check(type)returns false.The
resultargument acceptsNULLnot to assign a reference (check only mode).Reference implementation
gh-124153: Introduce PyType_GetBaseByToken function (PoC) python/cpython#121079
A performance evaluation(outdated)Performance
A subclass check in a slot method currently consists of the following steps:
PyType_GetModuleByDef(walks MRO)PyModule_GetStatePy*_CheckExactPyType_IsSubtype(walks MRO)PyType_GetBaseByTokenis cheaper than(1)+(2)+(3), but a little more expensive than4.PyType_IsSubtype2. Mostly, using the new function alone will be efficient enough except when staying in C functions and repeating(3)(4)with a module state passed around3.Backwards Compatibility
ht_token, is added to heap types.Py_tp_token, is added with an identifier,Py_TP_USE_SPEC.PyType_GetBaseByToken, is added,whose documentation will mention the new slot above.UPDATE:
Py_tp_token,Py_TP_USE_SPECandPyType_GetBaseByTokenwill be documented individually.Previous discussions
Footnotes
The GC can clear the module state or can erase the references to the module from heaptypes: gh-115874 ↩
PyType_IsSubtypecan be slower on recent Windows PGO builds due to the unstable optimization. ↩PyType_GetModuleStateis available afterPyType_GetBaseByToken. ↩