py/modstruct: Check for integer overflow in calc_size_items.#19189
Open
beastbroak30 wants to merge 1 commit into
Open
py/modstruct: Check for integer overflow in calc_size_items.#19189beastbroak30 wants to merge 1 commit into
beastbroak30 wants to merge 1 commit into
Conversation
Member
|
Please check the failing CI. |
Without a bounds check a format string like `struct.calcsize("NI")`
with a large repeat count N can overflow the 32-bit size_t accumulator,
wrapping it to a small value. A subsequent struct.pack call then
allocates a tiny buffer and writes far past its end.
Add an O(1) guard in calc_size_items(): before entering the per-item
loop, check whether cnt items of sz bytes each would exceed the
remaining size_t budget. The check is:
if (sz > 0 && cnt > (SIZE_MAX - size) / sz) { raise ValueError }
This is safe against integer overflow in the right-hand side because
size <= SIZE_MAX is maintained as a loop invariant.
The test format `"1I1073741823I"` is used rather than the more
obvious `"1500000000I"` because format counts are parsed via
get_fmt_num(), which calls MP_OBJ_SMALL_INT_VALUE() and therefore
only works correctly for values within MicroPython's small-integer
range (0 .. 2**30 - 1 on 32-bit). The value 1073741823 (= 2**30 - 1)
is the largest count that is a 32-bit small integer; combined with a
4-byte prefix it just exceeds the overflow threshold (1073741822).
Signed-off-by: beastbroak30 <akantarip30@gmail.com>
b50bc6a to
1bfc154
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #19189 +/- ##
==========================================
- Coverage 98.46% 98.45% -0.01%
==========================================
Files 176 176
Lines 22811 22813 +2
==========================================
+ Hits 22460 22461 +1
- Misses 351 352 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Code size report: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
calc_size_items()inpy/modstruct.caccumulates the total buffer sizefor a struct format string into a
size_tvariable without any overflowcheck. On 32-bit targets where
size_tis 4 bytes -- ESP32, RP2040,STM32, SAMD21/51, nRF52 -- a format string such as
'1500000000I'silently wraps the accumulator from ~6 GB back to a small positive value.
vstr_init_len()allocates only that small buffer. Whenstruct_pack_into_internal()writes the real payload it walks far pastthe end of the allocation, corrupting the MicroPython heap.
On embedded targets there is no ASLR, no heap canary, and no MMU
protection between heap regions, so the corruption is reachable from pure
Python with no native-code access required:
The fix adds a single pre-loop overflow guard in
calc_size_items():szequalsalignfor every MicroPython struct format code (allprimitive types), so consecutive items of the same type add exactly
szbytes each. The check is therefore exact for all single-type format
strings and conservative (safe) for mixed layouts.
Testing
I reproduced the overflow by compiling a standalone C harness with a
uint32_taccumulator to simulate 32-bitsize_t(gcc -O2). The buggypath allocates 4 bytes and then attempts a ~6 GB write; the fixed path
raises immediately with no allocation.
A regression test is added at
tests/basics/struct_overflow.py. On32-bit targets the test expects
ValueError,OverflowError, orMemoryErrorto be raised. On 64-bit hostsSIZE_MAXis large enoughthat a
uint32_trepeat count cannot trigger the overflow, so the testprints
SKIPand exits early.Tested on: unix port x86-64 (test prints
SKIPas expected). The fixtargets 32-bit embedded ports; I do not have physical hardware available,
but the guard path is straightforward and covered by the simulated
experiment.
Trade-offs and Alternatives
Code size increase is 32--80 bytes depending on target (see CI code size
report). The added cost is one integer division per format specifier.
Because
szis always a power of two for every MicroPython struct formatcode, the compiler can replace the division with a right shift on
architectures that support it, making the runtime cost negligible.
An alternative per-iteration check (
if (size + sz < size) overflow)would be simpler but would fire on mixed-type format strings where
alignment padding temporarily makes
size + sznon-monotone. Thepre-loop check is safe for all layouts.
Generative AI
I used generative AI tools when creating this PR, but a human has checked
the code and is responsible for the code and the description above.