Skip to content
Closed
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
2bc82a6
gh-108337: Add pyatomic.h header
colesbury Oct 17, 2019
9adf4f8
Fix blurb name
colesbury Aug 22, 2023
d70e8ae
Move pyatomic_std/gcc/msc.h to Include/cpython
colesbury Aug 22, 2023
927430d
Fix _InterlockedCompareExchange64 calls on MSVC x86
colesbury Aug 22, 2023
2d6f950
Move pyatomic.h to Include/cpython
colesbury Aug 23, 2023
bf27448
Remove volatile from function signature.
colesbury Aug 23, 2023
60b56f1
Add code documentation to pyatomic.h
colesbury Aug 23, 2023
0131868
Fix typo
colesbury Aug 23, 2023
0474e2f
Revert inadvertent change to stable_abi.py
colesbury Aug 23, 2023
462c20a
Update Include/cpython/pyatomic.h
colesbury Aug 23, 2023
3078328
Update Include/cpython/pyatomic.h
colesbury Aug 23, 2023
4daf1a2
Update Include/cpython/pyatomic.h
colesbury Aug 23, 2023
f932c77
Move code documentation around.
colesbury Aug 23, 2023
e720736
Clean-up pyatomic_msc.h and add Py_BUILD_ASSERT
colesbury Aug 23, 2023
ee6e49f
Add _Py_atomic_load_ptr_acquire.
colesbury Aug 23, 2023
ca8c3b3
Format files
colesbury Aug 24, 2023
2568ad9
Add note about atomics in whatsnew
colesbury Aug 24, 2023
2d08290
Describe volatile meaning on MSVC.
colesbury Aug 24, 2023
71d981e
Add link to atomic_thread_fence documentation
colesbury Aug 24, 2023
457ce21
Fix _Py_atomic_exchange_int64 on 32-bit Windows
colesbury Aug 24, 2023
9dd0f0b
Add pyatomic_gcc/std.h to vcxproj filters
colesbury Aug 29, 2023
7611965
Changes from review:
colesbury Aug 29, 2023
a8e2538
Update Include/cpython/pyatomic.h
colesbury Aug 30, 2023
433319f
Rename ptr to obj
colesbury Aug 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
397 changes: 397 additions & 0 deletions Include/cpython/pyatomic.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,397 @@
#ifndef Py_ATOMIC_H
#define Py_ATOMIC_H

// This header provides cross-platform low-level atomic operations
Comment thread
vstinner marked this conversation as resolved.
// similar to C11 atomics.
//
// Operations are sequentially consistent unless they have a suffix indicating
// otherwise. If in doubt, prefer the sequentially consistent operations.
//
// The "_relaxed" suffix for load and store operations indicates the "relaxed"
// memory order. They don't provide synchronization, but (roughly speaking)
// guarantee somewhat sane behavior for races instead of undefined behavior.
// In practice, they correspond to "normal" hardware load and store instructions,
// so they are almost as inexpensive as plain loads and stores in C.
//
// Note that atomic read-modify-write operations like _Py_atomic_add_* return
// the previous value of the atomic variable, not the new value.
//
// See https://en.cppreference.com/w/c/atomic for more information on C11 atomics.
// See https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2055r0.pdf
// "A Relaxed Guide to memory_order_relaxed" for discussion of and common usage
// or relaxed atomics.

// Atomically adds `value` to `address` and returns the previous value
static inline int
_Py_atomic_add_int(int *address, int value);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The glib library uses atomic for the first argument. Maybe it's a better name than address or pointer, what do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly prefer address or ptr rather than atomic

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the address name. What you pass is not a void* pointer or an uintptr_t address, but a reference to an atomic variable. I suggest to use atomic or obj name.

For add operation, the C11 API uses arg for the second parameter name. I'm fine with value.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think obj makes as much sense in this API because, unlike the C11 API, they are not pointers to atomic objects (e.g., _Atomic type). They are just pointers to plain C variables. I think that's an important distinction.

If you dislike address, I can change it to ptr, which we commonly use and is a bit more concise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, the GCC documentation for built-in atomic functions uses ptr. This is the API that pyatomic.h most closely matches.

https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html


static inline int8_t
_Py_atomic_add_int8(int8_t *address, int8_t value);

static inline int16_t
_Py_atomic_add_int16(int16_t *address, int16_t value);

static inline int32_t
_Py_atomic_add_int32(int32_t *address, int32_t value);

static inline int64_t
_Py_atomic_add_int64(int64_t *address, int64_t value);

static inline intptr_t
_Py_atomic_add_intptr(intptr_t *address, intptr_t value);

static inline unsigned int
_Py_atomic_add_uint(unsigned int *address, unsigned int value);

static inline uint8_t
_Py_atomic_add_uint8(uint8_t *address, uint8_t value);

static inline uint16_t
_Py_atomic_add_uint16(uint16_t *address, uint16_t value);

static inline uint32_t
_Py_atomic_add_uint32(uint32_t *address, uint32_t value);

static inline uint64_t
_Py_atomic_add_uint64(uint64_t *address, uint64_t value);

static inline uintptr_t
_Py_atomic_add_uintptr(uintptr_t *address, uintptr_t value);

static inline Py_ssize_t
_Py_atomic_add_ssize(Py_ssize_t *address, Py_ssize_t value);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's worth exposing atomic ops for all int sizes and signednesses?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect to use at least one atomic operation on each of the data types here (but not every atomic op on every data type). I tried to be consistent on what's defined because it makes understanding what's available easier and testing easier.


// Performs an atomic compare-and-exchange. If `*address` and `expected` are equal,
// then `value` is stored in `*address`. Returns 1 on success and 0 on failure.
// These correspond to the "strong" variations of the C11 atomic_compare_exchange_* functions.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C11 passes expected as a pointer, so that it's updated with the actual value when the latter doesn't match the former. Why not keep that convention here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation was two-fold: First, many of the _Py_atomic_compare_exchange calls in the nogil fork use constants as expected and this would be more verbose if it needs to be a pointer, e.g.:

_Py_atomic_compare_exchange_uint8(&m->v, LOCKED, UNLOCKED)

vs.

uint8_t expected = LOCKED:
_Py_atomic_compare_exchange_uint8(&m->v, &expected, UNLOCKED)

Second, I find this style (no pointer for expected) to be a bit less error-prone. I've been tripped up once or twice by having expected be modified when I didn't expect it.

I don't feel terribly strongly about this, so if there is a general preference for sticking closer to the C11-style API here, I can change it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU the main motivation for the C11 style APIs is for retry loops in lockless data structure implementations. A simplistic example (this may be embarassingly wrong):

struct ListNode;

typedef struct ListNode {
  int value;
  struct ListNode* next;
} ListNode;

void ListAppend(ListNode* list, int new_value) {
  ListNode* new_node = (ListNode*) malloc(sizeof ListNode);
  new_node->value = new_value;
  new_node->next = NULL;
  ListNode* expected = NULL;
  while (_Py_atomic_compare_exchange_ptr(&list->next, &expected, new_node)) {
    list = expected;
    expected = NULL;
  }
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the second argument is a reference (&expected), does it mean that it changes the value of the second argument (*expected)?

For C11 atomic_exchange(), the second argument is not a pointer, but a value (integer), no? https://en.cppreference.com/w/c/atomic/atomic_exchange

C11 atomic_compare_exchange_strong() and atomic_compare_exchange_weak() use a pointer for expected. But this API writes into *expected if the *obj is not equal to *expected.

The behavior of atomic_compare_exchange_* family is as if the following was executed atomically:

if (memcmp(obj, expected, sizeof *obj) == 0) {
    memcpy(obj, &desired, sizeof *obj);
    return true;
} else {
    memcpy(expected, obj, sizeof *obj);
    return false;
}

For this header fie, I would prefer to not have two flavors, the API is already quite long! I would prefer to have a single flavor. If there is an usecase where setting expected is relevant, I suggest to use a pointer for the second argument.

In short, I agree to change the API to int _Py_atomic_compare_exchange_int32(int32_t *obj, int32_t *expected, int32_t desired). The behavior should be well documented.

obj, expected and desired names come from the C11 API.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make the second argument a pointer like the C11 API.

static inline int
_Py_atomic_compare_exchange_int(int *address, int expected, int value);

static inline int
_Py_atomic_compare_exchange_int8(int8_t *address, int8_t expected, int8_t value);

static inline int
_Py_atomic_compare_exchange_int16(int16_t *address, int16_t expected, int16_t value);

static inline int
_Py_atomic_compare_exchange_int32(int32_t *address, int32_t expected, int32_t value);

static inline int
_Py_atomic_compare_exchange_int64(int64_t *address, int64_t expected, int64_t value);

static inline int
_Py_atomic_compare_exchange_intptr(intptr_t *address, intptr_t expected, intptr_t value);

static inline int
_Py_atomic_compare_exchange_uint(unsigned int *address, unsigned int expected, unsigned int value);

static inline int
_Py_atomic_compare_exchange_uint8(uint8_t *address, uint8_t expected, uint8_t value);

static inline int
_Py_atomic_compare_exchange_uint16(uint16_t *address, uint16_t expected, uint16_t value);

static inline int
_Py_atomic_compare_exchange_uint32(uint32_t *address, uint32_t expected, uint32_t value);

static inline int
_Py_atomic_compare_exchange_uint64(uint64_t *address, uint64_t expected, uint64_t value);

static inline int
_Py_atomic_compare_exchange_uintptr(uintptr_t *address, uintptr_t expected, uintptr_t value);

static inline int
_Py_atomic_compare_exchange_ssize(Py_ssize_t *address, Py_ssize_t expected, Py_ssize_t value);

static inline int
_Py_atomic_compare_exchange_ptr(void *address, void *expected, void *value);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be void** address for clarity?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that void ** requires an explicit cast for almost every use, because things like PyObject ** are not implicitly convertible to void **. In other words, currently we can write things like:

PyObject *old_exc = _Py_atomic_exchange_ptr(&tstate->async_exc, exc);

but if address was void **address, we'd have to write:

PyObject *old_exc = _Py_atomic_exchange_ptr((void **)&tstate->async_exc, exc);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large part of the Python C API uses macro to convert arguments to PyObject*. Would it make sense to do the same here?

#define _Py_atomic_exchange_ptr(atomic, value) _Py_atomic_exchange_ptr(_Py_CAST(void**, atomic), (value))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the benefit of that style over the current approach, and it would silently allow passing some integer types to _Py_atomic_exchange_ptr.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, in that case I'm fine with the surprising void* type.


// Atomically replaces `*address` with `value` and returns the previous value of `*address`.
static inline int
_Py_atomic_exchange_int(int *address, int value);

static inline int8_t
_Py_atomic_exchange_int8(int8_t *address, int8_t value);

static inline int16_t
_Py_atomic_exchange_int16(int16_t *address, int16_t value);

static inline int32_t
_Py_atomic_exchange_int32(int32_t *address, int32_t value);

static inline int64_t
_Py_atomic_exchange_int64(int64_t *address, int64_t value);

static inline intptr_t
_Py_atomic_exchange_intptr(intptr_t *address, intptr_t value);

static inline unsigned int
_Py_atomic_exchange_uint(unsigned int *address, unsigned int value);

static inline uint8_t
_Py_atomic_exchange_uint8(uint8_t *address, uint8_t value);

static inline uint16_t
_Py_atomic_exchange_uint16(uint16_t *address, uint16_t value);

static inline uint32_t
_Py_atomic_exchange_uint32(uint32_t *address, uint32_t value);

static inline uint64_t
_Py_atomic_exchange_uint64(uint64_t *address, uint64_t value);

static inline uintptr_t
_Py_atomic_exchange_uintptr(uintptr_t *address, uintptr_t value);

static inline Py_ssize_t
_Py_atomic_exchange_ssize(Py_ssize_t *address, Py_ssize_t value);

static inline void *
_Py_atomic_exchange_ptr(void *address, void *value);

// Performs `*address &= value` atomically and returns the previous value of `*address`.
static inline uint8_t
_Py_atomic_and_uint8(uint8_t *address, uint8_t value);

static inline uint16_t
_Py_atomic_and_uint16(uint16_t *address, uint16_t value);

static inline uint32_t
_Py_atomic_and_uint32(uint32_t *address, uint32_t value);

static inline uint64_t
_Py_atomic_and_uint64(uint64_t *address, uint64_t value);

static inline uintptr_t
_Py_atomic_and_uintptr(uintptr_t *address, uintptr_t value);

// Performs `*address |= value` atomically and returns the previous value of `*address`.
static inline uint8_t
_Py_atomic_or_uint8(uint8_t *address, uint8_t value);

static inline uint16_t
_Py_atomic_or_uint16(uint16_t *address, uint16_t value);

static inline uint32_t
_Py_atomic_or_uint32(uint32_t *address, uint32_t value);

static inline uint64_t
_Py_atomic_or_uint64(uint64_t *address, uint64_t value);

static inline uintptr_t
_Py_atomic_or_uintptr(uintptr_t *address, uintptr_t value);

// Atomically loads `*address` (sequential consistency)
static inline int
_Py_atomic_load_int(const int *address);

static inline int8_t
_Py_atomic_load_int8(const int8_t *address);

static inline int16_t
_Py_atomic_load_int16(const int16_t *address);

static inline int32_t
_Py_atomic_load_int32(const int32_t *address);

static inline int64_t
_Py_atomic_load_int64(const int64_t *address);

static inline intptr_t
_Py_atomic_load_intptr(const intptr_t *address);

static inline uint8_t
_Py_atomic_load_uint8(const uint8_t *address);

static inline uint16_t
_Py_atomic_load_uint16(const uint16_t *address);

static inline uint32_t
_Py_atomic_load_uint32(const uint32_t *address);

static inline uint64_t
_Py_atomic_load_uint64(const uint64_t *address);

static inline uintptr_t
_Py_atomic_load_uintptr(const uintptr_t *address);

static inline unsigned int
_Py_atomic_load_uint(const unsigned int *address);

static inline Py_ssize_t
_Py_atomic_load_ssize(const Py_ssize_t *address);

static inline void *
_Py_atomic_load_ptr(const void *address);

// Loads `*address` (relaxed consistency, i.e., no ordering)
static inline int
_Py_atomic_load_int_relaxed(const int *address);

static inline int8_t
_Py_atomic_load_int8_relaxed(const int8_t *address);

static inline int16_t
_Py_atomic_load_int16_relaxed(const int16_t *address);

static inline int32_t
_Py_atomic_load_int32_relaxed(const int32_t *address);

static inline int64_t
_Py_atomic_load_int64_relaxed(const int64_t *address);

static inline intptr_t
_Py_atomic_load_intptr_relaxed(const intptr_t *address);

static inline uint8_t
_Py_atomic_load_uint8_relaxed(const uint8_t *address);

static inline uint16_t
_Py_atomic_load_uint16_relaxed(const uint16_t *address);

static inline uint32_t
_Py_atomic_load_uint32_relaxed(const uint32_t *address);

static inline uint64_t
_Py_atomic_load_uint64_relaxed(const uint64_t *address);

static inline uintptr_t
_Py_atomic_load_uintptr_relaxed(const uintptr_t *address);

static inline unsigned int
_Py_atomic_load_uint_relaxed(const unsigned int *address);

static inline Py_ssize_t
_Py_atomic_load_ssize_relaxed(const Py_ssize_t *address);

static inline void *
_Py_atomic_load_ptr_relaxed(const void *address);

// Atomically performs `*address = value` (sequential consistency)
static inline void
_Py_atomic_store_int(int *address, int value);

static inline void
_Py_atomic_store_int8(int8_t *address, int8_t value);

static inline void
_Py_atomic_store_int16(int16_t *address, int16_t value);

static inline void
_Py_atomic_store_int32(int32_t *address, int32_t value);

static inline void
_Py_atomic_store_int64(int64_t *address, int64_t value);

static inline void
_Py_atomic_store_intptr(intptr_t *address, intptr_t value);

static inline void
_Py_atomic_store_uint8(uint8_t *address, uint8_t value);

static inline void
_Py_atomic_store_uint16(uint16_t *address, uint16_t value);

static inline void
_Py_atomic_store_uint32(uint32_t *address, uint32_t value);

static inline void
_Py_atomic_store_uint64(uint64_t *address, uint64_t value);

static inline void
_Py_atomic_store_uintptr(uintptr_t *address, uintptr_t value);

static inline void
_Py_atomic_store_uint(unsigned int *address, unsigned int value);

static inline void
_Py_atomic_store_ptr(void *address, void *value);

static inline void
_Py_atomic_store_ssize(Py_ssize_t* address, Py_ssize_t value);

// Stores `*address = value` (relaxed consistency, i.e., no ordering)
static inline void
_Py_atomic_store_int_relaxed(int *address, int value);

static inline void
_Py_atomic_store_int8_relaxed(int8_t *address, int8_t value);

static inline void
_Py_atomic_store_int16_relaxed(int16_t *address, int16_t value);

static inline void
_Py_atomic_store_int32_relaxed(int32_t *address, int32_t value);

static inline void
_Py_atomic_store_int64_relaxed(int64_t *address, int64_t value);

static inline void
_Py_atomic_store_intptr_relaxed(intptr_t *address, intptr_t value);

static inline void
_Py_atomic_store_uint8_relaxed(uint8_t* address, uint8_t value);

static inline void
_Py_atomic_store_uint16_relaxed(uint16_t *address, uint16_t value);

static inline void
_Py_atomic_store_uint32_relaxed(uint32_t *address, uint32_t value);

static inline void
_Py_atomic_store_uint64_relaxed(uint64_t *address, uint64_t value);

static inline void
_Py_atomic_store_uintptr_relaxed(uintptr_t *address, uintptr_t value);

static inline void
_Py_atomic_store_uint_relaxed(unsigned int *address, unsigned int value);

static inline void
_Py_atomic_store_ptr_relaxed(void *address, void *value);

static inline void
_Py_atomic_store_ssize_relaxed(Py_ssize_t *address, Py_ssize_t value);

// Stores `*address = value` (release operation)
static inline void
_Py_atomic_store_uint64_release(uint64_t *address, uint64_t value);

static inline void
_Py_atomic_store_ptr_release(void *address, void *value);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert, but why is it useful to expose "release" operations if no "acquire" operations are exposed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always use stronger orderings for correctness (i.e., "seq_cst" everywhere instead of "acquire"). From a performance view, "release" is substantially faster than "seq_cst" stores on x86/x86-64, but "acquire" generates the same code as "seq_cst" loads on both x86/x86-64 and aarch64.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think CPython support is limited to x86 and ARM variants, so the set of atomic ops exposed should probably be made consistent nevertheless?

Also, using "seq_cst" in combination with "release" will probably make the code more difficult to reason about, than if "acquire" is exposed (and memory ordering is already hard to reason about!).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a _Py_atomic_load_ptr_acquire for consistency (and I think I can remove the _Py_atomic_store_uint64_release).



// Sequential consistency fence
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a more elaborated documentation, "fence" is kind of weak.

__faststorefence doc:

Guarantees that every previous memory reference, including both load and store memory references, is globally visible before any subsequent memory reference.

_ReadWriteBarrier doc:

Limits the compiler optimizations that can reorder memory accesses across the point of the call.

dmb ish:

The data memory barrier ensures that all preceding writes are issued before any subsequent memory operations (including speculative memory access).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge I have is that it's really hard to describe what fences do in a way that's helpful and accurate. The above documentation is too strong for C11 fences.

There's https://en.cppreference.com/w/c/atomic/atomic_thread_fence, but I find it vague. And the C++ documentation (https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence) is more detailed but really hard to understand.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add one or two of these links here. It's ok to have references to external doc, it's better than no doc :-)

static inline void
_Py_atomic_fence_seq_cst(void);

// Release fence
static inline void
_Py_atomic_fence_release(void);
Comment thread
colesbury marked this conversation as resolved.
Outdated


#ifndef _Py_USE_GCC_BUILTIN_ATOMICS
#if defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
#define _Py_USE_GCC_BUILTIN_ATOMICS 1
#elif defined(__clang__)
#if __has_builtin(__atomic_load)
#define _Py_USE_GCC_BUILTIN_ATOMICS 1
#endif
#endif
#endif
Comment thread
colesbury marked this conversation as resolved.

#if _Py_USE_GCC_BUILTIN_ATOMICS
#define Py_ATOMIC_GCC_H
#include "cpython/pyatomic_gcc.h"
#elif __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_ATOMICS__)
#define Py_ATOMIC_STD_H
#include "cpython/pyatomic_std.h"
#elif defined(_MSC_VER)
#define Py_ATOMIC_MSC_H
#include "cpython/pyatomic_msc.h"
#else
#error "define pyatomic for this platform"
#endif
Comment thread
colesbury marked this conversation as resolved.

#endif /* Py_ATOMIC_H */

Loading