Bug report
Bug description:
Two threads calling next on the same groupby object concurrently in a free-threaded build race on the currgrouper field, corrupting the iterator's internal state and producing AttributeError on slot accesses of live objects.
groupby_next has no Py_BEGIN_CRITICAL_SECTION guard. The first thing it does is write gbo->currgrouper = NULL:
|
static PyObject * |
|
groupby_next(PyObject *op) |
|
{ |
|
PyObject *grouper; |
|
groupbyobject *gbo = groupbyobject_CAST(op); |
|
|
|
gbo->currgrouper = NULL; |
|
/* skip to next iteration group */ |
|
for (;;) { |
|
if (gbo->currkey == NULL) |
Later, after the loop exits, it calls _grouper_create, which writes parent->currgrouper = igo:
|
static PyObject * |
|
_grouper_create(groupbyobject *parent, PyObject *tgtkey) |
|
{ |
|
itertools_state *state = parent->state; |
|
_grouperobject *igo = PyObject_GC_New(_grouperobject, state->_grouper_type); |
|
if (igo == NULL) |
|
return NULL; |
|
igo->parent = Py_NewRef(parent); |
|
igo->tgtkey = Py_NewRef(tgtkey); |
|
parent->currgrouper = igo; /* borrowed reference */ |
|
|
|
PyObject_GC_Track(igo); |
|
return (PyObject *)igo; |
|
} |
If Thread A is past line 633 (just stored the new grouper pointer) while Thread B is at line 537 (about to store NULL), Thread B overwrites the pointer Thread A just wrote. Both are plain pointer stores with no synchronisation.
The following script reproduces the condition under which this happens.
import itertools, threading
class K:
__slots__ = ("v",)
def __init__(self, v): self.v = v
def __eq__(self, o): return isinstance(o, K) and self.v == o.v
def __hash__(self): return hash(self.v)
def consume(g):
try:
while True:
_, _ = next(g)
except StopIteration:
pass
keys = [K(i) for i in range(500_000)]
g = itertools.groupby(keys)
threads = [threading.Thread(target=consume, args=(g,)) for _ in range(8)]
for t in threads: t.start()
for t in threads: t.join()
On a free-threaded build, this results in the following Python logs...
Exception in Thread-15 (consume):
File "<python-input-2>", line 6, in __eq__
def __eq__(self, o): return isinstance(o, K) and self.v == o.v
^^^^^^
AttributeError: 'K' object has no attribute 'v'
... and the following TSan logs.
WARNING: ThreadSanitizer: data race (pid=20464)
Write of size 8 at 0x0003026e4a88 by thread T2:
#0 groupby_next itertoolsmodule.c:537 (python.exe:arm64+0x10042a8b8)
#1 builtin_next bltinmodule.c:1770
Previous write of size 8 at 0x0003026e4a88 by thread T1:
#0 _grouper_create itertoolsmodule.c:633 (python.exe:arm64+0x10042ac5c)
#1 groupby_next itertoolsmodule.c:570 (python.exe:arm64+0x10042ac5c)
#2 builtin_next bltinmodule.c:1770
CPython versions tested on:
CPython main branch
Operating systems tested on:
macOS
Linked PRs
Bug report
Bug description:
Two threads calling
nexton the samegroupbyobject concurrently in a free-threaded build race on thecurrgrouperfield, corrupting the iterator's internal state and producingAttributeErroron slot accesses of live objects.groupby_nexthas noPy_BEGIN_CRITICAL_SECTIONguard. The first thing it does is writegbo->currgrouper = NULL:cpython/Modules/itertoolsmodule.c
Lines 531 to 540 in c5516e7
Later, after the loop exits, it calls
_grouper_create, which writesparent->currgrouper = igo:cpython/Modules/itertoolsmodule.c
Lines 624 to 637 in c5516e7
If Thread A is past line 633 (just stored the new grouper pointer) while Thread B is at line 537 (about to store
NULL), Thread B overwrites the pointer Thread A just wrote. Both are plain pointer stores with no synchronisation.The following script reproduces the condition under which this happens.
On a free-threaded build, this results in the following Python logs...
... and the following TSan logs.
CPython versions tested on:
CPython main branch
Operating systems tested on:
macOS
Linked PRs
groupby.next#150792