Skip to content

Remove unnecessary reversed() from random.shuffle().#25698

Closed
rhettinger wants to merge 1 commit into
python:masterfrom
rhettinger:shuffle_direction
Closed

Remove unnecessary reversed() from random.shuffle().#25698
rhettinger wants to merge 1 commit into
python:masterfrom
rhettinger:shuffle_direction

Conversation

@rhettinger
Copy link
Copy Markdown
Contributor

Originally, I was going to just add a comment, but then thought we might as well remove the unnecessary code.

# The use of reversed() in the code below isn't necessary.
# The algorithm works equally well when running forward.
# We run in reverse only because "we've always done it this way".

In CPython, a reversed range runs as fast a forward range:

$ python3.10 -m timeit 'sum(range(10000))'
1000 loops, best of 5: 199 usec per loop
$ python3.10 -m timeit 'sum(reversed(range(10000)))'
2000 loops, best of 5: 199 usec per loop 

With PyPy, reversed() does add some overhead:

$ pypy3 -m timeit 'sum(range(10000))'
50000 loops, average of 7: 9.66 +- 0.0393 usec per loop
$ pypy3 -m timeit 'sum(reversed(range(10000)))'
10000 loops, average of 7: 25.9 +- 0.321 usec per loop

Copy link
Copy Markdown
Member

@tim-one tim-one left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK by me either way, but there are points on both sides. It's not just us - virtually everyone runs the loop from highest to lowest, presumably because that makes correctness easier to see. Python is unique in that it requires "more words" to go in the decreasing direction.

More troublesome to me: this can change shuffle results for people who force the seed. It's a related algorithm, not the same algorithm. For that reason I'm -0 overall - "marginal potential speedup under PyPy, but potentially changes results under CPython" isn't an attractive tradeoff to me.

@rhettinger
Copy link
Copy Markdown
Contributor Author

rhettinger commented Apr 28, 2021

For me, correctness seems equally easy to see. The loop invariant is that x[:i] is shuffled. A new element is mixed in at every step. It's just an insertion-shuffle vs a selection shuffle. The reverseless code is a bit easier because reasoning about reverse loops is more awkward than for forward loops. I hear what you're saying about changing existing seeded shuffles. That is my only misgiving. Thanks for looking at this.

@tim-one
Copy link
Copy Markdown
Member

tim-one commented Apr 29, 2021

Yes, both ways fold in a new item on each iteration - but only the backwards way also removes an item from consideration on each iteration. In the standard way, after the first iteration each item has an equal chance of being swapped into the final position, and that's obvious, and it's thereafter impossible for the final position to change value again. That's what makes it very obvious - you can forget about it, it's done. Each iteration determines the final value at one position.

Go forward instead, and no position is eliminated as iterations go on. Even on the last iteration, you still can't know coming in what the final value of any position will be - you can only state an invariant involving all positions. Try writing formal proofs, and I bet you too will find that harder to account for.

@iritkatriel
Copy link
Copy Markdown
Member

Would it be faster to replace reversed(range(1, len(x))) by range(len(x)-1, 0, -1)?

@tim-one
Copy link
Copy Markdown
Member

tim-one commented Apr 29, 2021

Would it be faster to replace reversed(range(1, len(x))) by range(len(x)-1, 0, -1)?

Under PyPy, almost certainly. But I happen to know that Raymond has an aversion to range() with a negative step so strong that it wasn't worth mentioning 😉.

It so happens I prefer the reversed() spelling too, but then "speed" means nothing to me here - the cost of iteration in this context is bound to be insignificant compared to the cost of calling _randbelow() on each iteration, even under PyPy.

@rhettinger rhettinger closed this Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants