Skip to content

groupby_next data race on free-threaded builds #150791

@KowalskiThomas

Description

@KowalskiThomas

Bug report

Bug description:

Two threads calling next on the same groupby object concurrently in a free-threaded build race on the currgrouper field, corrupting the iterator's internal state and producing AttributeError on slot accesses of live objects.

groupby_next has no Py_BEGIN_CRITICAL_SECTION guard. The first thing it does is write gbo->currgrouper = NULL:

static PyObject *
groupby_next(PyObject *op)
{
PyObject *grouper;
groupbyobject *gbo = groupbyobject_CAST(op);
gbo->currgrouper = NULL;
/* skip to next iteration group */
for (;;) {
if (gbo->currkey == NULL)

Later, after the loop exits, it calls _grouper_create, which writes parent->currgrouper = igo:

static PyObject *
_grouper_create(groupbyobject *parent, PyObject *tgtkey)
{
itertools_state *state = parent->state;
_grouperobject *igo = PyObject_GC_New(_grouperobject, state->_grouper_type);
if (igo == NULL)
return NULL;
igo->parent = Py_NewRef(parent);
igo->tgtkey = Py_NewRef(tgtkey);
parent->currgrouper = igo; /* borrowed reference */
PyObject_GC_Track(igo);
return (PyObject *)igo;
}

If Thread A is past line 633 (just stored the new grouper pointer) while Thread B is at line 537 (about to store NULL), Thread B overwrites the pointer Thread A just wrote. Both are plain pointer stores with no synchronisation.


The following script reproduces the condition under which this happens.

import itertools, threading

class K:
    __slots__ = ("v",)
    def __init__(self, v): self.v = v
    def __eq__(self, o): return isinstance(o, K) and self.v == o.v
    def __hash__(self): return hash(self.v)

def consume(g):
    try:
        while True:
            _, _ = next(g)
    except StopIteration:
        pass

keys = [K(i) for i in range(500_000)]
g = itertools.groupby(keys)
threads = [threading.Thread(target=consume, args=(g,)) for _ in range(8)]
for t in threads: t.start()
for t in threads: t.join()

On a free-threaded build, this results in the following Python logs...

Exception in Thread-15 (consume):
  File "<python-input-2>", line 6, in __eq__
    def __eq__(self, o): return isinstance(o, K) and self.v == o.v
                                                     ^^^^^^
AttributeError: 'K' object has no attribute 'v'

... and the following TSan logs.

WARNING: ThreadSanitizer: data race (pid=20464)
  Write of size 8 at 0x0003026e4a88 by thread T2:
    #0 groupby_next itertoolsmodule.c:537 (python.exe:arm64+0x10042a8b8)
    #1 builtin_next bltinmodule.c:1770

  Previous write of size 8 at 0x0003026e4a88 by thread T1:
    #0 _grouper_create itertoolsmodule.c:633 (python.exe:arm64+0x10042ac5c)
    #1 groupby_next itertoolsmodule.c:570 (python.exe:arm64+0x10042ac5c)
    #2 builtin_next bltinmodule.c:1770

CPython versions tested on:

CPython main branch

Operating systems tested on:

macOS

Linked PRs

Metadata

Metadata

Assignees

No one assigned
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions