bpo-46258: Streamline isqrt fast path #30333

mdickinson · 2022-01-02T11:24:40Z

This PR makes some minor simplifications to the implementation of math.isqrt. Those simplifications improve the speed of math.isqrt for inputs smaller than 2**64 by around 20% on my machine.

In detail:

In _approximate_sqrt, use a lookup table based on the topmost 8 bits to replace the first two Newton iterations. This reduces the total number of divisions needed from 4 to 2.
Change the return type of _approximate_sqrt from uint64_t to uint32_t.
Replace the u * u - 1U >= m test with the simpler but equivalent test u * u > m.
Add casts to make it clear to the compiler that a 32-bit-by-32-bit division is enough for the first of the two divisions in _approximate_sqrt (though I'd expect most compilers not to need this).
Suggest to the compiler that _approximate_sqrt be inlined.
Simplify the shift computation.

On my Intel x64 machine, the assembly produced by Clang involves one divl instruction and one divq instruction.

There's one subtle but important change here: in the previous implementation of _approximate_sqrt, it was possible for the returned value to be exactly 2**32 (but no larger), so we couldn't assume that it would fit in a uint32_t. With the new code, the returned value is always < 2**32. It's this fact that makes it possible to change the return type, and to replace the u*u - 1 >= m test with u*u > m. To prove this bound: if we ignore overflow for the moment, since the result of _approximate_sqrt is always within 1 of the true square root (following the proof already outlined in the comments), the only way that the result of the final addition can be 2**32 (overflowing to 0) is if the input n exceeds (2**32 - 1)**2. But in that case we know most of the top bits of n: the top 32 bits are either 0xfffffffe or 0xffffffff, and so we can trace through the first steps of _approximate_sqrt - the value of u retrieved from the lookup table is 255, then the value assigned to u in the following line is 65536. Then it's easy to see that if u = 65536, (n >> 17 / u) < 2**31 and u << 15 = 2**31, so the result of the addition fits in a uint32_t.

https://bugs.python.org/issue46258

Modules/mathmodule.c

mdickinson · 2022-01-04T20:40:14Z

Turning this into a draft: I think we can go one better and remove another division, at the expense of expanding the lookup table from 12 bytes to 192 bytes. I may not have time to make those changes before this weekend, though.

mdickinson · 2022-01-05T17:36:10Z

I think we can go one better and remove another division, at the expense of expanding the lookup table from 12 bytes to 192 bytes.

Done. The speedup on my machine for a random selection of 64-bit integers is now over 20%.

mdickinson · 2022-01-15T09:57:57Z

Merging after self-review.

bedevere-bot · 2022-01-15T09:58:08Z

@mdickinson: Please replace # with GH- in the commit message next time. Thanks!

Streamline isqrt fast path

bb22e63

mdickinson added skip issue skip news labels Jan 2, 2022

the-knights-who-say-ni added the CLA signed label Jan 2, 2022

bedevere-bot added the awaiting core review label Jan 2, 2022

mdickinson changed the title ~~Streamline isqrt fast path~~ bpo-46258: Streamline isqrt fast path Jan 4, 2022

mdickinson removed the skip issue label Jan 4, 2022

mdickinson commented Jan 4, 2022

View reviewed changes

Modules/mathmodule.c Outdated Show resolved Hide resolved

Add news entry

cbb08eb

mdickinson removed the skip news label Jan 4, 2022

mdickinson marked this pull request as draft January 4, 2022 20:39

mdickinson added 2 commits January 5, 2022 17:11

Save another Newton step by further expanding the lookup table

1f1a0c7

Update news entry

ac99031

mdickinson marked this pull request as ready for review January 5, 2022 17:36

mdickinson added 2 commits January 5, 2022 17:40

Silence a compiler warning about possible loss of data

dbbcd7d

More direct comment

ecd01b1

mdickinson merged commit d02c5e9 into python:main Jan 15, 2022

bedevere-bot removed the awaiting core review label Jan 15, 2022

mdickinson deleted the streamline-isqrt-fast-path branch January 15, 2022 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-46258: Streamline isqrt fast path #30333

bpo-46258: Streamline isqrt fast path #30333

Uh oh!

mdickinson commented Jan 2, 2022 •

edited

Loading

Uh oh!

Uh oh!

mdickinson commented Jan 4, 2022

Uh oh!

mdickinson commented Jan 5, 2022

Uh oh!

mdickinson commented Jan 15, 2022

Uh oh!

bedevere-bot commented Jan 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

bpo-46258: Streamline isqrt fast path #30333

bpo-46258: Streamline isqrt fast path #30333

Uh oh!

Conversation

mdickinson commented Jan 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mdickinson commented Jan 4, 2022

Uh oh!

mdickinson commented Jan 5, 2022

Uh oh!

mdickinson commented Jan 15, 2022

Uh oh!

bedevere-bot commented Jan 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mdickinson commented Jan 2, 2022 •

edited

Loading