Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Presize the list for batched()
  • Loading branch information
rhettinger committed Oct 18, 2022
commit ea9511e781dea7b25d8929332534afefdb6dadcd
12 changes: 7 additions & 5 deletions Modules/itertoolsmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ batched_next(batchedobject *bo)
if (it == NULL) {
return NULL;
}
result = PyList_New(0);
result = PyList_New(bo->batch_size);
if (result == NULL) {
return NULL;
}
Expand All @@ -158,12 +158,14 @@ batched_next(batchedobject *bo)
if (item == NULL) {
break;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care how this handles strange iterators that keep going after having raised StopIteration?

>>> from itertools import batched
>>> class A:
...     def __init__(self):
...         self.i = 0
...     def __iter__(self):
...         return self
...     def __next__(self):
...         i = self.i
...         self.i += 1
...         if i == 7:
...             raise StopIteration()
...         return i
...
>>> x = batched(A(), 5)
>>> next(x)
[0, 1, 2, 3, 4]
>>> next(x)
[5, 6]
>>> next(x)
[8, 9, 10, 11, 12]

Maybe the Py_CLEAR(bo->it) could be moved to before this break;.

Copy link
Copy Markdown
Contributor Author

@rhettinger rhettinger Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of leaving this as is. The current PR matches the behavior of the pure python version and the strange iterator works the same way with other tools:

>>> it = zip(range(10), A())
>>> list(it)
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)]
>>> list(it)
[(8, 8), (9, 9)]

>>> it = enumerate(A())
>>> list(it)
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)]
>>> next(it)
(7, 8)
>>> next(it)
(8, 9)

That said, I'm not really sure about this one and could easily be persuaded to add a Py_CLEAR(bo->it) to the code path for the short list.

Copy link
Copy Markdown
Member

@sweeneyde sweeneyde Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem like an edge case that could go either way.

I'm noticing that some iterators do Py_CLEAR(obj->it) as soon as the next() call fails, while other objects have internal iterators that are never NULL. My concern is that batched() clears the iterator sometimes (when the first next() of a batch fails), but not other times (when a subsequent next() fails).

So I would think to do one or the other, but not both:

  1. Clear the iterator immediately after any internal next() call fails, similar to pairwise, islice, chain, and cycle (sort of).

  2. Never clear the iterator and remove the if (it == NULL) check altogether, similar to dropwhile, takewhile, accumulate, compress, zip and enumerate. Unfortunately, it would mean allocating and throwing away PyList_New(n), but that's in the rare case that someone is abusing the iterator.

Maybe removing the Py_CLEAR and the NULL check is most natural?

}
if (PyList_Append(result, item) < 0) {
Py_DECREF(item);
Py_DECREF(result);
PyList_SET_ITEM(result, i, item);
}
if (i < bo->batch_size) {
PyObject *short_list = PyList_GetSlice(result, 0, i);
Py_SETREF(result, short_list);
if (result == NULL) {
return NULL;
}
Py_DECREF(item);
}
if (PyList_GET_SIZE(result) > 0) {
return result;
Expand Down