rp2: Revert newlib nano on RP2350, move nano libc to RAM on RP2040. by projectgus · Pull Request #19352 · micropython/micropython

projectgus · 2026-06-18T04:27:26Z

Summary

This is a follow-up to fix some performance regressions from #19299:

Some nano libc functions weren't being picked up by the linker script to link into RAM, causing more flash cache misses. (This probably explains the +720 bytes free .bss in PR 19299!)
RP2350 performance was significantly impacted by switching from the standard libc memcpy & memset, which unroll the loop, to the nano versions which only do byte by byte copies. Thanks @kilograham for bringing this to our attention.

In this PR:

Only enable nano.specs for RP2040.
On RP2040, link all libc string functions, mem functions, and the pico_mem_ops functions to RAM. The impact here is relatively small because there are the nano libc functions, and the "pico_mem_ops" are very small shim wrappers around calls to ROM memcpy and memcmp. Most of these functions were not in RAM in earlier MicroPython versions.

Testing

Using perfbench, regarding any variation <4% as within the margin of error for cache effects caused by different binaries.

RP2040

Comparing pre-nano MicroPython commit to current master:

diff of scores (higher is better)
N=168 M=100                rp2040_pre_nano.txt -> rp2040_master.txt         diff      diff% (error%)
bm_chaos.py                    154.11 ->     152.00 :      -2.11 =  -1.369% (+/-0.09%)
bm_fannkuch.py                  56.47 ->      55.06 :      -1.41 =  -2.497% (+/-0.03%)
bm_fft.py                     1372.81 ->    1439.68 :     +66.87 =  +4.871% (+/-0.02%)
bm_float.py                   1776.33 ->    1772.48 :      -3.85 =  -0.217% (+/-0.09%)
bm_hexiom.py                    23.13 ->      23.23 :      +0.10 =  +0.432% (+/-0.06%)
bm_nqueens.py                 1965.06 ->    2051.67 :     +86.61 =  +4.407% (+/-0.14%)
bm_pidigits.py                 404.07 ->     412.89 :      +8.82 =  +2.183% (+/-0.06%)
bm_wordcount.py                 38.95 ->      34.11 :      -4.84 = -12.426% (+/-0.10%)
core_import_mpy_multi.py       232.11 ->     230.26 :      -1.85 =  -0.797% (+/-0.10%)
core_import_mpy_single.py       44.16 ->      42.96 :      -1.20 =  -2.717% (+/-0.27%)
core_locals.py                  32.57 ->      32.52 :      -0.05 =  -0.154% (+/-0.01%)
core_qstr.py                   125.85 ->     116.52 :      -9.33 =  -7.414% (+/-0.10%)
core_str.py                     17.67 ->      16.61 :      -1.06 =  -5.999% (+/-0.04%)
core_yield_from.py             225.63 ->     225.63 :      +0.00 =  +0.000% (+/-0.01%)
misc_aes.py                    239.24 ->     240.31 :      +1.07 =  +0.447% (+/-0.09%)
misc_mandel.py                1188.01 ->    1244.04 :     +56.03 =  +4.716% (+/-0.05%)
misc_pystone.py               1092.67 ->    1049.90 :     -42.77 =  -3.914% (+/-0.08%)
misc_raytrace.py               164.20 ->     161.21 :      -2.99 =  -1.821% (+/-0.05%)
viper_call0.py                 319.99 ->     319.98 :      -0.01 =  -0.003% (+/-0.00%)
viper_call1a.py                312.78 ->     312.78 :      +0.00 =  +0.000% (+/-0.01%)
viper_call1b.py                235.10 ->     235.10 :      +0.00 =  +0.000% (+/-0.00%)
viper_call1c.py                236.88 ->     236.88 :      +0.00 =  +0.000% (+/-0.00%)
viper_call2a.py                308.16 ->     308.16 :      +0.00 =  +0.000% (+/-0.01%)
viper_call2b.py                206.72 ->     206.72 :      +0.00 =  +0.000% (+/-0.00%)

Most of these changes are within the margin of error, but notable ones include core_qstr.py and core_str.py becoming slower.

Comparing pre-nano to this PR:

diff of scores (higher is better)
N=168 M=100                rp2040_pre_nano.txt -> rp2040_pr_branch.txt         diff      diff% (error%)
bm_chaos.py                    154.11 ->     152.08 :      -2.03 =  -1.317% (+/-0.08%)
bm_fannkuch.py                  56.47 ->      54.81 :      -1.66 =  -2.940% (+/-0.01%)
bm_fft.py                     1372.81 ->    1429.48 :     +56.67 =  +4.128% (+/-0.01%)
bm_float.py                   1776.33 ->    1743.59 :     -32.74 =  -1.843% (+/-0.11%)
bm_hexiom.py                    23.13 ->      22.75 :      -0.38 =  -1.643% (+/-0.05%)
bm_nqueens.py                 1965.06 ->    2125.50 :    +160.44 =  +8.165% (+/-0.05%)
bm_pidigits.py                 404.07 ->     408.43 :      +4.36 =  +1.079% (+/-0.06%)
bm_wordcount.py                 38.95 ->      38.44 :      -0.51 =  -1.309% (+/-0.03%)
core_import_mpy_multi.py       232.11 ->     228.28 :      -3.83 =  -1.650% (+/-0.09%)
core_import_mpy_single.py       44.16 ->      41.46 :      -2.70 =  -6.114% (+/-0.25%)
core_locals.py                  32.57 ->      32.55 :      -0.02 =  -0.061% (+/-0.01%)
core_qstr.py                   125.85 ->     123.39 :      -2.46 =  -1.955% (+/-0.11%)
core_str.py                     17.67 ->      17.33 :      -0.34 =  -1.924% (+/-0.03%)
core_yield_from.py             225.63 ->     225.71 :      +0.08 =  +0.035% (+/-0.01%)
misc_aes.py                    239.24 ->     230.58 :      -8.66 =  -3.620% (+/-0.09%)
misc_mandel.py                1188.01 ->    1229.93 :     +41.92 =  +3.529% (+/-0.06%)
misc_pystone.py               1092.67 ->    1065.89 :     -26.78 =  -2.451% (+/-0.06%)
misc_raytrace.py               164.20 ->     160.63 :      -3.57 =  -2.174% (+/-0.08%)
viper_call0.py                 319.99 ->     319.98 :      -0.01 =  -0.003% (+/-0.00%)
viper_call1a.py                312.78 ->     312.78 :      +0.00 =  +0.000% (+/-0.01%)
viper_call1b.py                235.10 ->     235.09 :      -0.01 =  -0.004% (+/-0.00%)
viper_call1c.py                236.88 ->     236.88 :      +0.00 =  +0.000% (+/-0.01%)
viper_call2a.py                308.16 ->     308.15 :      -0.01 =  -0.003% (+/-0.01%)
viper_call2b.py                206.72 ->     206.71 :      -0.01 =  -0.005% (+/-0.00%)

These results are all relatively noisy and it's hard to draw clear conclusions, but overall the second set of changes look to have less significant regression to me. The -6% on core_import_mpy_single.py is odd, but this didn't appear in an earlier version of this PR so it's probably noise due to cache layout.

(EDIT: Previous version of this analysis I read something backwards!)

However at least we can say there's no obvious regression, and we still have smaller binary size & RAM usage compared to pre-nano. (339608 flash & 12692 RAM pre-nano, 338720 & 12388 with this PR.)

RP2350

Pre-nano vs current master:

diff of scores (higher is better)
N=168 M=100                rp2350_pre_nano.txt -> rp2350_master.txt         diff      diff% (error%)
bm_chaos.py                    307.38 ->     274.25 :     -33.13 = -10.778% (+/-0.08%)
bm_fannkuch.py                  92.60 ->      84.65 :      -7.95 =  -8.585% (+/-0.05%)
bm_fft.py                     2989.25 ->    2776.11 :    -213.14 =  -7.130% (+/-0.03%)
bm_float.py                   4833.23 ->    4362.35 :    -470.88 =  -9.743% (+/-0.09%)
bm_hexiom.py                    43.77 ->      37.62 :      -6.15 = -14.051% (+/-0.05%)
bm_nqueens.py                 3708.24 ->    3124.45 :    -583.79 = -15.743% (+/-0.06%)
bm_pidigits.py                 773.64 ->     578.45 :    -195.19 = -25.230% (+/-0.07%)
bm_wordcount.py                 67.40 ->      67.69 :      +0.29 =  +0.430% (+/-0.03%)
core_import_mpy_multi.py       430.47 ->     445.18 :     +14.71 =  +3.417% (+/-0.08%)
core_import_mpy_single.py       89.38 ->      92.21 :      +2.83 =  +3.166% (+/-0.22%)
core_locals.py                  59.85 ->      57.90 :      -1.95 =  -3.258% (+/-0.04%)
core_qstr.py                   188.09 ->     199.66 :     +11.57 =  +6.151% (+/-0.07%)
core_str.py                     29.84 ->      29.35 :      -0.49 =  -1.642% (+/-0.05%)
core_yield_from.py             401.27 ->     362.21 :     -39.06 =  -9.734% (+/-0.03%)
misc_aes.py                    432.89 ->     393.01 :     -39.88 =  -9.213% (+/-0.07%)
misc_mandel.py                3298.48 ->    3183.31 :    -115.17 =  -3.492% (+/-0.06%)
misc_pystone.py               2012.00 ->    1884.39 :    -127.61 =  -6.342% (+/-0.09%)
misc_raytrace.py               323.51 ->     296.71 :     -26.80 =  -8.284% (+/-0.04%)
viper_call0.py                 559.93 ->     559.91 :      -0.02 =  -0.004% (+/-0.01%)
viper_call1a.py                546.13 ->     546.11 :      -0.02 =  -0.004% (+/-0.01%)
viper_call1b.py                449.52 ->     449.52 :      +0.00 =  +0.000% (+/-0.00%)
viper_call1c.py                456.37 ->     456.35 :      -0.02 =  -0.004% (+/-0.00%)
viper_call2a.py                536.34 ->     536.33 :      -0.01 =  -0.002% (+/-0.01%)
viper_call2b.py                400.32 ->     400.32 :      +0.00 =  +0.000% (+/-0.01%)

😬 Not great, I should have checked this before merging 19929!

Pre-nano versus this PR:

diff of scores (higher is better)
N=168 M=100                rp2350_pre_nano.txt -> rp2350_pr_branch.txt         diff      diff% (error%)
bm_chaos.py                    307.38 ->     309.29 :      +1.91 =  +0.621% (+/-0.08%)
bm_fannkuch.py                  92.60 ->      93.53 :      +0.93 =  +1.004% (+/-0.06%)
bm_fft.py                     2989.25 ->    2818.90 :    -170.35 =  -5.699% (+/-0.03%)
bm_float.py                   4833.23 ->    4707.45 :    -125.78 =  -2.602% (+/-0.12%)
bm_hexiom.py                    43.77 ->      44.29 :      +0.52 =  +1.188% (+/-0.05%)
bm_nqueens.py                 3708.24 ->    3708.21 :      -0.03 =  -0.001% (+/-0.07%)
bm_pidigits.py                 773.64 ->     765.10 :      -8.54 =  -1.104% (+/-0.09%)
bm_wordcount.py                 67.40 ->      65.27 :      -2.13 =  -3.160% (+/-0.04%)
core_import_mpy_multi.py       430.47 ->     429.81 :      -0.66 =  -0.153% (+/-0.07%)
core_import_mpy_single.py       89.38 ->      88.22 :      -1.16 =  -1.298% (+/-0.15%)
core_locals.py                  59.85 ->      59.72 :      -0.13 =  -0.217% (+/-0.04%)
core_qstr.py                   188.09 ->     191.20 :      +3.11 =  +1.653% (+/-0.09%)
core_str.py                     29.84 ->      29.00 :      -0.84 =  -2.815% (+/-0.03%)
core_yield_from.py             401.27 ->     401.29 :      +0.02 =  +0.005% (+/-0.01%)
misc_aes.py                    432.89 ->     433.29 :      +0.40 =  +0.092% (+/-0.06%)
misc_mandel.py                3298.48 ->    3006.84 :    -291.64 =  -8.842% (+/-0.05%)
misc_pystone.py               2012.00 ->    1955.00 :     -57.00 =  -2.833% (+/-0.10%)
misc_raytrace.py               323.51 ->     319.08 :      -4.43 =  -1.369% (+/-0.04%)
viper_call0.py                 559.93 ->     564.12 :      +4.19 =  +0.748% (+/-0.01%)
viper_call1a.py                546.13 ->     550.12 :      +3.99 =  +0.731% (+/-0.01%)
viper_call1b.py                449.52 ->     452.24 :      +2.72 =  +0.605% (+/-0.00%)
viper_call1c.py                456.37 ->     459.16 :      +2.79 =  +0.611% (+/-0.00%)
viper_call2a.py                536.34 ->     540.22 :      +3.88 =  +0.723% (+/-0.01%)
viper_call2b.py                400.32 ->     402.48 :      +2.16 =  +0.540% (+/-0.01%)

Would expect these results to be basically the same as pre-nano, so I think the main cause of changes here is noise...

Trade-offs and Alternatives

We could completely revert rp2: Build with nano.specs, add linker cref table #19299 and keep using default.specs everywhere, the differences are not that big either way so this might be the simpler approach.
Could link our versions of libc memory & string functions from shared/libc/string0.c instead. This has a version of memcpy that uses full word operations, for example. Initial testing on RP2350 this showed quite small size, and performance mid-way between "nano" libc and the default newlib functions.
We could also look at moving string functions and memcpy/memcmp to RAM on RP2350 to get more performance at the cost of less free RAM. This might be worth looking into in a follow-up PR.
While making these changes I noted that the rp2 port linker scripts try to link *gc.c.obj *vm.c.obj *parse.c.obj to RAM, but these don't match anything as the pico-sdk sets CMAKE_C_OUTPUT_EXTENSION to .o. So we should either remove these, or experiment with the RAM/Performance trade-off of putting these parts of MicroPython into RAM.

Generative AI

I did not use generative AI tools when creating this PR.

These are thin wrappers around the ROM functions for memcpy and memset, just a few bytes - this way avoids a cache miss when calling them. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

Fixes performance regression on RP2350 when switching to nano.specs in 6552836. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

As these are the "nano" versions the impact is relatively small. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

github-actions · 2026-06-18T04:42:57Z

Code size report:

Reference:  unix/README: Update the supported targets list. [d901e98]
Comparison: rp2: Link libc string functions to ram on RP2040. [merge of ce1e884]
  mpy-cross:    +0 +0.000% 
   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
      esp32:    +0 +0.000% ESP32_GENERIC
     mimxrt:    +0 +0.000% TEENSY40
        rp2:  +256 +0.028% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

projectgus · 2026-06-18T04:43:54Z

rp2: +256 +0.028% RPI_PICO_W

I don't understand why code size difference hasn't picked up any increase of static RAM use here.

projectgus · 2026-06-18T04:50:15Z

rp2: +256 +0.028% RPI_PICO_W

I don't understand why code size difference hasn't picked up any increase of static RAM use here.

Ah OK,something else weird is going on here.

Here's RPI_PICO_W built in this PR in CI:

2026-06-18T04:30:58.1819030Z Memory region         Used Size  Region Size  %age Used
2026-06-18T04:30:58.1819609Z            FLASH:      878316 B      1200 KB     71.48%
2026-06-18T04:30:58.1820041Z         FLASH_FS:           0 B       848 KB      0.00%
2026-06-18T04:30:58.1820462Z              RAM:       55984 B       256 KB     21.36%
2026-06-18T04:30:58.1820861Z        SCRATCH_X:           0 B          0 B
2026-06-18T04:30:58.1821266Z        SCRATCH_Y:          8 KB         8 KB    100.00%

... and as built in latest master branch commit:

2026-06-12T08:07:33.2710825Z Memory region         Used Size  Region Size  %age Used
2026-06-12T08:07:33.2711362Z            FLASH:      878068 B      1200 KB     71.46%
2026-06-12T08:07:33.2711765Z         FLASH_FS:           0 B       848 KB      0.00%
2026-06-12T08:07:33.2712172Z              RAM:       52944 B       256 KB     20.20%
2026-06-12T08:07:33.2712541Z        SCRATCH_X:           0 B          0 B
2026-06-12T08:07:33.2712924Z        SCRATCH_Y:          8 KB         8 KB    100.00%

Somehow this PR is using 3KB more RAM, but if I build these here then the difference is +700 bytes of RAM.

Need to investigate more, probably this is a newlib version thing.

octoprobe-bot · 2026-06-18T05:01:04Z

Octoprobe PR report

Duration: 27min, 16266 tests run, 100ms/test
Tested ports: rp2
Summary Report , Logdirectory

Test	Tests passed	Tests skipped	Tests xfailed	Tests failed
format flash	5
run-tests.py	4727	565
run-tests.py --via-mpy --emit native	4661	630		1
run-tests.py --via-mpy	4723	567		2
run-perfbench.py	120
run-natmodtests.py	180	23	2
run-tests.py --test-dirs=extmod_hardware	7	30	11	2
run-tests.py --test-dirs=extmod_hardware --emit-native	9	30	11
Total	14432	1845	24	5

Failures

Group: run-tests.py --test-dirs=extmod_hardware

Test	rp2 5334- RPI_PICO2	rp2 5334- RPI_PICO2- RISCV	rp2 552b- RPI_PICO2_W	rp2 5f2c- RPI_PICO_W	rp2 6038- RPI_PICO_W
extmod_hardware/machine_pwm.py	_XFAIL _{xfail_master_478.json}	_XFAIL _{xfail_master_478.json}	_pass	FAIL	FAIL

Group: run-tests.py --via-mpy --emit native

Test	rp2 5334- RPI_PICO2	rp2 5334- RPI_PICO2- RISCV	rp2 552b- RPI_PICO2_W	rp2 5f2c- RPI_PICO_W	rp2 6038- RPI_PICO_W
extmod/select_poll_udp.py	skip	skip	_pass	FAIL	_pass

Group: run-tests.py --via-mpy

Test	rp2 5334- RPI_PICO2	rp2 5334- RPI_PICO2- RISCV	rp2 552b- RPI_PICO2_W	rp2 5f2c- RPI_PICO_W	rp2 6038- RPI_PICO_W
extmod/select_poll_eintr.py	skip	skip	_pass	FAIL	_pass
extmod/select_poll_udp.py	skip	skip	_pass	FAIL	_pass

dpgeorge · 2026-06-18T05:39:38Z

+if(PICO_RP2040)
+    # Enable nano.specs for RP2040 only.
+    #
+    # Pico-sdk already enables nosys.specs to stub out syscall handlers,


I just saw this related pico-sdk PR: raspberrypi/pico-sdk#3014

That will be in the next pico-sdk release. Not sure if it means anything for us?

dpgeorge · 2026-06-18T05:51:11Z

While making these changes I noted that the rp2 port linker scripts try to link *gc.c.obj *vm.c.obj *parse.c.obj to RAM, but these don't match anything as the pico-sdk sets CMAKE_C_OUTPUT_EXTENSION to .o.

Oh wow! These have been there since the beginning and I'm certain I bench marked the effect of putting this in RAM... would be worth revisiting this to test performance of them actually being in RAM.

projectgus added 3 commits June 18, 2026 12:20

rp2: Link the RP2040 pico_mem_ops wrappers to RAM.

c01bdd1

These are thin wrappers around the ROM functions for memcpy and memset, just a few bytes - this way avoids a cache miss when calling them. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

rp2: Only enable nano.specs for RP2040.

c7ed497

Fixes performance regression on RP2350 when switching to nano.specs in 6552836. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

rp2: Link libc string functions to ram on RP2040.

ce1e884

As these are the "nano" versions the impact is relatively small. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>

projectgus added the port-rp2 label Jun 18, 2026

projectgus marked this pull request as draft June 18, 2026 04:50

projectgus mentioned this pull request Jun 18, 2026

rp2: Revert "rp2: Build with nano.specs for newlib-nano.". #19353

Open

dpgeorge reviewed Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rp2: Revert newlib nano on RP2350, move nano libc to RAM on RP2040.#19352

rp2: Revert newlib nano on RP2350, move nano libc to RAM on RP2040.#19352
projectgus wants to merge 3 commits into
micropython:masterfrom
projectgus:bugfix/rp2_nano_performance

projectgus commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

projectgus commented Jun 18, 2026

Uh oh!

projectgus commented Jun 18, 2026

Uh oh!

octoprobe-bot commented Jun 18, 2026

Group: run-tests.py --test-dirs=extmod_hardware

Group: run-tests.py --via-mpy --emit native

Group: run-tests.py --via-mpy

Uh oh!

dpgeorge Jun 18, 2026

Uh oh!

dpgeorge commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

projectgus commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

RP2040

RP2350

Trade-offs and Alternatives

Generative AI

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

projectgus commented Jun 18, 2026

Uh oh!

projectgus commented Jun 18, 2026

Uh oh!

octoprobe-bot commented Jun 18, 2026

Octoprobe PR report

Group: run-tests.py --test-dirs=extmod_hardware

Group: run-tests.py --via-mpy --emit native

Group: run-tests.py --via-mpy

Uh oh!

dpgeorge Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

dpgeorge commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

projectgus commented Jun 18, 2026 •

edited

Loading