py/gc: Track data and skip scan, MICROPY_GC_NO_SCAN#19367
Draft
Gadgetoid wants to merge 3 commits into
Draft
Conversation
The mark phase conservatively scans every word of every reachable block for pointers, so a large bytearray/array buffer is scanned in full on every collection despite holding no pointers. Add an optional per-block "no-scan table" (NTB, 1 bit/block, like the finaliser/weakref tables) and a GC_ALLOC_FLAG_NO_SCAN; tagged head blocks are marked but their contents are not scanned. A no-scan block has no child pointers, so the mark phase also skips the chain-walk for it (n_blocks left 0) and avoids re-reading the allocation table for every block of the buffer just to mark it - this matters for large buffers in slow PSRAM. The tag is written on every allocation (so a reused block never inherits a stale bit) and preserved across realloc moves. Exposed as m_new_no_scan() / m_malloc_no_scan(), which alias plain m_new()/gc_alloc() when disabled, and gated behind MICROPY_GC_NO_SCAN (default off). This commit adds the mechanism only; callers are converted separately. Signed-off-by: Phil Howard <github@gadgetoid.com>
Tag the buffers that only ever hold raw data (never heap pointers) with m_new_no_scan(), so the GC mark phase skips scanning them once MICROPY_GC_NO_SCAN is enabled (a no-op otherwise): py/objarray.c: array/bytearray item storage. py/objstr.c: str/bytes payloads. py/vstr.c: the vstr builder, growth via gc_realloc preserves the tag. Signed-off-by: Phil Howard <github@gadgetoid.com>
For CI, build tests only. Signed-off-by: Phil Howard <github@gadgetoid.com>
|
Code size report: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #19367 +/- ##
=======================================
Coverage 98.51% 98.51%
=======================================
Files 176 176
Lines 22904 22905 +1
=======================================
+ Hits 22563 22564 +1
Misses 341 341 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Contributor
Author
Contributor
Author
Contributor
Author
|
Since this ties in strongly with the optimised tail scan (#19363) here's a graph of them working together, again just SRAM:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Summary
Buffers don't contain pointers. Don't scan them for pointers. This was never really a problem until we had 8MB of PSRAM with in-RAM font data and images.
Currently this brings new
m_new_no_scan()andm_malloc_no_scan()methods for data that's guaranteed (by a contentious user who understands how painful use-after-free bugs are to trace) to contain absolutely 100% no pointers. These mirror the existing methods, using a newGC_ALLOC_FLAG_NO_SCANflag.For better or worse we'll be carrying this change downstream for Tufty 2350, since GC hangups are painful when trying to hit 30-60FPS screen updates. This doesn't eliminate them, but turns a 300ms pause into a 30ms one, the rest of which is handled by #19363
Might be of interest to @sfe-SparkFro
Testing
Aggressive multi-hour tests of both real-world examples (on Tufty 2350) and synthetic GC thrashing benchmarks.
Again this change does not make much of an impact on perfbench since we don't really benchmark GC, and in some cases it can cause a net loss (RP2040 XIP cache lottery).
Trade-offs and Alternatives
This feature sacrifices heap for the additional flag bit in the allocation table, and is thus default disabled. I'd recommend everyone shipping a board with PSRAM enable it as a matter of course, and suggest that leaving the RAM/performance tradeoff to the vendor of each board.
This is a big change to a scary part of MicroPython and as such I'm raising it as a draft in the hope others will exercise it downstream and feed back. I don't expect or need it to be merged, but it's fun to share!
Generative AI
I used generative AI tools when creating this PR, but a human has checked the
code and is responsible for the code and the description above.