rp2: Revert newlib nano on RP2350, move nano libc to RAM on RP2040.#19352
rp2: Revert newlib nano on RP2350, move nano libc to RAM on RP2040.#19352projectgus wants to merge 3 commits into
Conversation
These are thin wrappers around the ROM functions for memcpy and memset, just a few bytes - this way avoids a cache miss when calling them. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
Fixes performance regression on RP2350 when switching to nano.specs in 6552836. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
As these are the "nano" versions the impact is relatively small. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
|
Code size report: |
I don't understand why code size difference hasn't picked up any increase of static RAM use here. |
Ah OK,something else weird is going on here. Here's ... and as built in latest master branch commit: Somehow this PR is using 3KB more RAM, but if I build these here then the difference is +700 bytes of RAM. Need to investigate more, probably this is a newlib version thing. |
Octoprobe PR report
FailuresGroup: run-tests.py --test-dirs=extmod_hardware
Group: run-tests.py --via-mpy --emit native
Group: run-tests.py --via-mpy
|
| if(PICO_RP2040) | ||
| # Enable nano.specs for RP2040 only. | ||
| # | ||
| # Pico-sdk already enables nosys.specs to stub out syscall handlers, |
There was a problem hiding this comment.
I just saw this related pico-sdk PR: raspberrypi/pico-sdk#3014
That will be in the next pico-sdk release. Not sure if it means anything for us?
Oh wow! These have been there since the beginning and I'm certain I bench marked the effect of putting this in RAM... would be worth revisiting this to test performance of them actually being in RAM. |
Summary
This is a follow-up to fix some performance regressions from #19299:
In this PR:
Testing
Using perfbench, regarding any variation <4% as within the margin of error for cache effects caused by different binaries.
RP2040
Comparing pre-nano MicroPython commit to current master:
Most of these changes are within the margin of error, but notable ones include
core_qstr.pyandcore_str.pybecoming slower.Comparing pre-nano to this PR:
These results are all relatively noisy and it's hard to draw clear conclusions, but overall the second set of changes look to have less significant regression to me. The -6% on
core_import_mpy_single.pyis odd, but this didn't appear in an earlier version of this PR so it's probably noise due to cache layout.(EDIT: Previous version of this analysis I read something backwards!)
However at least we can say there's no obvious regression, and we still have smaller binary size & RAM usage compared to pre-nano. (339608 flash & 12692 RAM pre-nano, 338720 & 12388 with this PR.)
RP2350
Pre-nano vs current master:
😬 Not great, I should have checked this before merging 19929!
Pre-nano versus this PR:
Would expect these results to be basically the same as pre-nano, so I think the main cause of changes here is noise...
Trade-offs and Alternatives
shared/libc/string0.cinstead. This has a version of memcpy that uses full word operations, for example. Initial testing on RP2350 this showed quite small size, and performance mid-way between "nano" libc and the default newlib functions.*gc.c.obj *vm.c.obj *parse.c.objto RAM, but these don't match anything as the pico-sdk setsCMAKE_C_OUTPUT_EXTENSIONto.o. So we should either remove these, or experiment with the RAM/Performance trade-off of putting these parts of MicroPython into RAM.Generative AI
I did not use generative AI tools when creating this PR.