rp2: Build with -fno-math-errno to use the hardware sqrt instruction.#19356
rp2: Build with -fno-math-errno to use the hardware sqrt instruction.#19356Gadgetoid wants to merge 1 commit into
Conversation
Without -fno-math-errno the compiler must keep sqrt()/sqrtf() as library calls so they can set errno on a domain error, even though the Cortex-M33 FPU has a single VSQRT.F32 instruction. MicroPython's math module detects domain errors via isnan/isinf checks on the result rather than errno, so disabling errno here is safe and lets sqrt leverage hardware. Benchmarked on a Pico 2 (RP2350, perfbench N=150 M=100, avg of 3): misc_mandel (complex abs() in its inner loop) improves by ~12%, with no measurable change to non-sqrt benchmarks and a slightly smaller binary. Signed-off-by: Phil Howard <github@gadgetoid.com>
|
Code size report: |
|
Is this only relevant for RP2350, and possibly only ARM mode? If so is it worth wrapping this in Or, does it generate faster code for all archs? |
|
I would not expect any change to RP2040 or RP2350 RISC, but I'll run the tests just in case. (god knows I need to do more of that.) Edit: No change on RISCV but did net an 8 byte size saving. I would suggest it's correct to include it in RISCV builds, but more to let the compiler make informed choices than for any specific size/perf improvement. The 8 bytes was the net result of both a gain and a loss so much for muchness really:
The RP2040 has similar results, layout in RAM changes and some things perform better vs others worse- no clear cut performance advantage and any results would be extremely brittle depending on how the code lands in cache. |
Without -fno-math-errno the compiler must keep sqrt()/sqrtf() as library calls so they can set errno on a domain error, even though the Cortex-M33 FPU has a single VSQRT.F32 instruction.
MicroPython's math module detects domain errors via isnan/isinf checks on the result rather than errno, so disabling errno here is safe and lets sqrt leverage hardware.
Summary
Make RP2 go brr. Seriously when you start to write apps and games with vector graphics and all sorts of shiny stuff you want to eke out every little iota of performance you can!
Testing
Benchmarked on a Pico 2 (RP2350, perfbench N=150 M=100, avg of 3): misc_mandel (complex abs() in its inner loop) improves by ~12+%, with no measurable change to non-sqrt benchmarks and a slightly smaller binary.
Ran the test suite:
937 tests performed (26410 individual testcases)
937 tests passed
122 tests skipped:
Trade-offs and Alternatives
None, as far as I'm aware. This is a win/win.
Generative AI
I used generative AI tools when creating this PR, but a human has checked the
code and is responsible for the code and the description above.