diff --git a/Doc/extending/error-handling.rst b/Doc/extending/error-handling.rst new file mode 100644 index 00000000000000..68025d49e4c12c --- /dev/null +++ b/Doc/extending/error-handling.rst @@ -0,0 +1,260 @@ +.. highlight:: c + + +.. _error-handling: + + +*************************** +Error handling in the C API +*************************** + +This chapter covers the details about how Python's C API expresses errors +and how to interact with Python exceptions. + +The exception indicator +======================= + +Python has a thread-local indicator for the state of the current exception. +This indicator is just a ``PyObject *`` referencing an instance of +:class:`BaseException`. You can think of this like the ``errno`` variable in C. + +If a C API function fails, it may set the exception indicator to a Python +exception object. For example, creating a new object may fail and set the +exception indicator to a :class:`MemoryError` object to denote that an +allocation failed. + +Generally speaking, you must not call functions with the exception indicator +set. This is explained in more detail later on. + + +The failure protocol +==================== + +In the C API, ``NULL`` is never a valid ``PyObject *``, so it is used as a +sentinel to indicate failure for functions that return a ``PyObject *``. +In fact, we've already used this! Going back to our ``system`` function, +we can see this in action: + +.. code-block:: c + + :emphasize-lines: 6 + + static PyObject * + spam_system(PyObject *self, PyObject *arg) + { + const char *command = PyUnicode_AsUTF8(arg); + if (command == NULL) { + return NULL; + } + int status = system(command); + PyObject *result = PyLong_FromLong(status); + return result; + } + + +``spam_system`` returns a ``PyObject *``, so we indicate failure by returning +``NULL``. + +.. note:: + + Some functions in the C API return an ``int`` instead of a reference, so they + cannot use ``NULL`` for failure. These functions will usually return ``-1`` + for failure, and ``0`` otherwise. + +To expand on this, let's try to modify ``spam_system`` to raise an +exception if the result is non-zero: + +.. code-block:: c + :emphasize-lines: 6 + + static PyObject * + spam_system(PyObject *self, PyObject *arg) + { + const char *command = PyUnicode_AsUTF8(arg); + if (command == NULL) { + return NULL; + } + int status = system(command); + if (status != 0) { + return NULL; + } + + // We don't know how to return None yet, so let's do this for now. + return PyLong_FromLong(status); + } + +Because ``system`` is not from Python's C API, it has no knowledge of Python's +exception indicator, and thus does not set any exceptions. So, if we were to +run this code with an invalid command, the interpreter would raise a +:class:`SystemError`: + +.. code-block:: pycon + + >>> import spam + >>> result = spam.system('noexist') + SystemError: returned NULL without setting an exception + +To manually raise an exception, we can use :c:func:`PyErr_SetString`, which +will take a reference to an exception class and a C string to use as the +message. All of Python's built-in exceptions are available as global C +variables prefixed with ``PyExc_`` followed by their name in Python. +For example, :class:`RuntimeError` is available as :c:var:`PyExc_RuntimeError`. +The full list is available at :ref:`standardexceptions`. + +With this knowledge, let's make our function raise a ``RuntimeError`` upon +failure: + +.. code-block:: c + :emphasize-lines: 10 + + static PyObject * + spam_system(PyObject *self, PyObject *arg) + { + const char *command = PyUnicode_AsUTF8(arg); + if (command == NULL) { + return NULL; + } + int status = system(command); + if (status != 0) { + PyErr_SetString(PyExc_RuntimeError, "system() call failed"); + return NULL; + } + + // We don't know how to return None yet, so let's do this for now. + return PyLong_FromLong(status); + } + +Now, if we run this: + +.. code-block:: pycon + + >>> import spam + >>> result = spam.system('noexist') + RuntimeError: system() call failed + + +Yay! But, this isn't a very descriptive error message. It'd be nice if users +of ``system`` knew exactly what went wrong when invoking their command. + +We can provide do this by using :c:func:`PyErr_Format`, which takes a format +string following by variadic arguments instead of a single constant string. +This is similar to ``printf`` in C. Let's try it: + +.. code-block:: c + :emphasize-lines: 10-11 + + static PyObject * + spam_system(PyObject *self, PyObject *arg) + { + const char *command = PyUnicode_AsUTF8(arg); + if (command == NULL) { + return NULL; + } + int status = system(command); + if (status != 0) { + PyErr_Format(PyExc_RuntimeError, + "system() returned non-zero exit code %d", status); + return NULL; + } + + // We don't know how to return None yet, so let's do this for now. + return PyLong_FromLong(status); + } + + +And if we try it, everything works as expected: + + +.. code-block:: pycon + + >>> import spam + >>> result = spam.system('noexist') + RuntimeError: system() returned non-zero exit code 127 + + +But, our function still returns ``0`` if it succeeds, which is now useless. +Ideally, we should return ``None``, like a normal Python function would. +Our first instinct might be to return ``NULL``, so let's try it: + +.. code-block:: c + :emphasize-lines: 15 + + static PyObject * + spam_system(PyObject *self, PyObject *arg) + { + const char *command = PyUnicode_AsUTF8(arg); + if (command == NULL) { + return NULL; + } + int status = system(command); + if (status != 0) { + PyErr_Format(PyExc_RuntimeError, + "system() returned non-zero exit code %d", status); + return NULL; + } + + return NULL; + } + +.. code-block:: pycon + + >>> import spam + >>> spam.system('true') + SystemError: returned NULL without setting an exception + + +Nope -- again, ``NULL`` is reserved for exceptions. In Python, ``None`` is still +an object, so we have to return a reference to it. We can do this by returning +a strong reference to :c:var:`Py_None`: + + +.. code-block:: c + :emphasize-lines: 16 + + static PyObject * + spam_system(PyObject *self, PyObject *arg) + { + const char *command = PyUnicode_AsUTF8(arg); + if (command == NULL) { + return NULL; + } + int status = system(command); + if (status != 0) { + PyErr_Format(PyExc_RuntimeError, + "system() returned non-zero exit code %d", status); + return NULL; + } + + // Py_NewRef() is just a shorthand for Py_INCREF() with an expression + return Py_NewRef(Py_None); + } + +.. note:: + + In CPython, :const:`None` is actually an :term:`immortal` object, meaning + that it has a fixed reference count and is never deallocated, and thus + ``Py_INCREF`` has no real effect here. + + +In fact, this is so common that the C API has a macro for it: + + +.. code-block:: c + :emphasize-lines: 15 + + static PyObject * + spam_system(PyObject *self, PyObject *arg) + { + const char *command = PyUnicode_AsUTF8(arg); + if (command == NULL) { + return NULL; + } + int status = system(command); + if (status != 0) { + PyErr_Format(PyExc_RuntimeError, + "system() returned non-zero exit code %d", status); + return NULL; + } + + Py_RETURN_NONE; + } diff --git a/Doc/extending/first-extension-module.rst b/Doc/extending/first-extension-module.rst index 894f5bdbb8f09c..5ad9302388c9a2 100644 --- a/Doc/extending/first-extension-module.rst +++ b/Doc/extending/first-extension-module.rst @@ -475,7 +475,7 @@ So, we'll need to *encode* the data, and we'll use the UTF-8 encoding for it. and the C API has special support for it.) The function to encode a Python string into a UTF-8 buffer is named -:c:func:`PyUnicode_AsUTF8AndSize` [#why-pyunicodeasutf8]_. +:c:func:`PyUnicode_AsUTF8` [#why-pyunicodeasutf8]_. Call it like this: .. code-block:: c @@ -484,13 +484,13 @@ Call it like this: static PyObject * spam_system(PyObject *self, PyObject *arg) { - const char *command = PyUnicode_AsUTF8AndSize(arg, NULL); + const char *command = PyUnicode_AsUTF8(arg, NULL); int status = 3; PyObject *result = PyLong_FromLong(status); return result; } -If :c:func:`PyUnicode_AsUTF8AndSize` is successful, *command* will point to the +If :c:func:`PyUnicode_AsUTF8` is successful, *command* will point to the resulting C string -- a zero-terminated array of bytes [#embedded-nul]_. This buffer is managed by the *arg* object, which means we don't need to free it, but we must follow some rules: @@ -500,14 +500,14 @@ it, but we must follow some rules: garbage-collected. * We must not modify it. This is why we use ``const``. -If :c:func:`PyUnicode_AsUTF8AndSize` was *not* successful, it returns a ``NULL`` +If :c:func:`PyUnicode_AsUTF8` was *not* successful, it returns a ``NULL`` pointer. When calling *any* Python C API, we always need to handle such error cases. The way to do this in general is left for later chapters of this documentation. For now, be assured that we are already handling errors from :c:func:`PyLong_FromLong` correctly. -For the :c:func:`PyUnicode_AsUTF8AndSize` call, the correct way to handle +For the :c:func:`PyUnicode_AsUTF8` call, the correct way to handle errors is returning ``NULL`` from ``spam_system``. Add an ``if`` block for this: @@ -518,7 +518,7 @@ Add an ``if`` block for this: static PyObject * spam_system(PyObject *self, PyObject *arg) { - const char *command = PyUnicode_AsUTF8AndSize(arg); + const char *command = PyUnicode_AsUTF8(arg); if (command == NULL) { return NULL; } @@ -548,7 +548,7 @@ the ``char *`` buffer, and using its result instead of the ``3``: static PyObject * spam_system(PyObject *self, PyObject *arg) { - const char *command = PyUnicode_AsUTF8AndSize(arg); + const char *command = PyUnicode_AsUTF8(arg); if (command == NULL) { return NULL; } diff --git a/Doc/extending/index.rst b/Doc/extending/index.rst index c0c494c3059d99..0cc4b79c30a017 100644 --- a/Doc/extending/index.rst +++ b/Doc/extending/index.rst @@ -46,6 +46,8 @@ source file by including the header ``"Python.h"``. :hidden: first-extension-module.rst + reference-counting.rst + error-handling.rst extending.rst newtypes_tutorial.rst newtypes.rst @@ -77,6 +79,8 @@ as part of this version of CPython. #. :ref:`first-extension-module` +#. :ref:`reference-counting-intro` +#. :ref:`error-handling` Guides for intermediate topics diff --git a/Doc/extending/reference-counting.rst b/Doc/extending/reference-counting.rst new file mode 100644 index 00000000000000..b251398c6f5591 --- /dev/null +++ b/Doc/extending/reference-counting.rst @@ -0,0 +1,191 @@ +.. highlight:: c + + +.. _reference-counting-intro: + + +************************************* +An introduction to reference counting +************************************* + +This chapter covers the basics of CPython's garbage collection scheme. + +What is reference counting? +=========================== + +In CPython, objects are garbage collected through a scheme known as +"reference counting". This means that all objects keeps count of the number +of references to them. + +For example, take the following code: + +.. code-block:: python + + a = object() # refcount: 1 + +In the above code, the ``object()`` has a single reference (``a``), so it has +reference count of 1. If we add more references, the reference count will +increase: + +.. code-block:: python + + a = object() # refcount: 1 + b = a # refcount: 2 + c = b # refcount: 3 + + +When a name is unbinded, the reference count is decremented. If the reference +count of an object reaches zero, the object is immediately deallocated. + +We can visualize this using the :meth:`~object.__del__` method: + +.. code-block:: pycon + + >>> class Test: + ... def __del__(self): + ... print("Deleting") + >>> a = Test() # refcount: 1 + >>> del a # refcount: 0 + Deleting + + +Object references in the C API +============================== + +In the C API, all objects are represented by a pointer to a :c:type:`PyObject`. +This is known as a "reference". +For our purposes, the ``PyObject`` structure contains two important pieces of +information: + +1. The object's type, accessible through :c:macro:`Py_TYPE`. +2. The object's :term:`reference count`, accessible through :c:macro:`Py_REFCNT`. + +When using the C API, we need to manage the reference count of an object on our +own. Or, in other words, we need to tell Python where and when we are using an +object. This is done through two macros: + +1. :c:macro:`Py_INCREF`, which increments the object's reference count. +2. :c:macro:`Py_DECREF`, which decrements the object's reference count. + If the object's reference count becomes zero, the object's destructor is + invoked. + +To understand how this works in practice, let's go back to our ``system`` +function, taking note of ``PyObject *`` uses this time: + +.. code-block:: c + :emphasize-lines: 1-2, 9 + + static PyObject * + spam_system(PyObject *self, PyObject *arg) + { + const char *command = PyUnicode_AsUTF8(arg); + if (command == NULL) { + return NULL; + } + int status = system(command); + PyObject *result = PyLong_FromLong(status); + return result; + } + +Again, each ``PyObject *`` is a reference. There are two types of references +in the C API: + +1. :term:`Strong references `, in which you are responsible + for calling :c:macro:`Py_DECREF` (or otherwise handing off the reference). + At the end of a function, all strong references should have either been + destroyed or handed off (such as by returning it). +2. :term:`Borrowed references `, in which you are *not* + responsible for destroying the reference. + +In the ``spam_system`` function, ``self`` and ``arg`` are borrowed references +(meaning we must not decrement their reference count), but ``result`` is a +strong reference. ``result`` is returned, so the strong reference is given to +the caller. This is also called "stealing" a reference (so, in the above +example, the caller steals our strong reference to ``result``). + + +Reference counting patterns +=========================== + +In Python's C API, most functions will return a strong reference, and as such, +you need to release those references when you are done with them. For example, +let's say that we wanted to change our ``system`` function to only accept ASCII +strings as an input. We would first call :c:func:`PyUnicode_AsASCIIString` to +convert the string to a Python :class:`bytes` object, and then use +:c:macro:`PyBytes_AS_STRING` to extract the internal ``const char *`` buffer +from it. + +To visualize: + +.. code-block:: c + :emphasize-lines: 4-8, 10 + + static PyObject * + spam_system(PyObject *self, PyObject *arg) + { + PyObject *bytes = PyUnicode_AsASCIIString(arg); // Strong reference + if (bytes == NULL) { + return NULL; + } + const char *command = PyBytes_AS_STRING(bytes); + int status = system(command); + Py_DECREF(bytes); // Release the strong reference + PyObject *result = PyLong_FromLong(status); + return result; + } + +Note that we have to call ``Py_DECREF(bytes)`` *after* we call ``system``. +If we did it before, then the string returned by ``PyBytes_AS_STRING`` +might be freed and cause a crash upon trying to use it in ``system``. + + +The pitfalls of reference counting +================================== + +As mentioned previously, *most* functions will return a strong reference, but not +all of them! In the above example, if ``PyUnicode_AsASCIIString`` were to +return a borrowed reference, then there would be a use-after-free somewhere +down the call stack. + +Unfortunately, there is no way to determine whether a reference is strong or +borrowed just by looking at it. This can lead to many memory-safety bugs, +and to make matters worse, debugging bugs of this nature is often very difficult. + +For example, let's add a bug to ``spam_system`` where we release a borrowed +reference: + +.. code-block:: c + :emphasize-lines: 5 + + static PyObject * + spam_system(PyObject *self, PyObject *arg) + { + const char *command = PyUnicode_AsUTF8(arg); + Py_DECREF(arg); // refcount: 0!!!! + if (command == NULL) { + return NULL; + } + int status = system(command); + PyObject *result = PyLong_FromLong(status); + return result; + } + + +Running the above code will result in a crash, but *not* in the +``spam_system`` function. In fact, ``spam_system`` won't even show up in the +stack trace. The crash occurs after ``spam_system`` returns and the *caller* +tries to release its reference to ``arg``, but since we stole the reference, +``arg`` is now invalid. This can make it very difficult to track down where +a reference counting error was made. + +Another common error is forgetting to release a strong reference, in which case +the object will leak its memory. This is known as a "reference leak". +In this case, tools such as `Memray `_ +are able to identify which objects are leaking, which does make debugging +a little bit easier, but objects often hold references to many other objects, +which will *also* leak, making it even harder to find the cause of the leak. + +Because CPython does not track where reference counts are incremented and +decremented, reference counting bugs are notoriously difficult to identify and +fix. This is one of the reasons many developers choose to use other programming +languages and tools when interfacing with Python's C API.