Skip to content

Core/fix src invalid utf#19321

Draft
Josverl wants to merge 26 commits into
micropython:masterfrom
Josverl:core/fix_src_invalid_utf
Draft

Core/fix src invalid utf#19321
Josverl wants to merge 26 commits into
micropython:masterfrom
Josverl:core/fix_src_invalid_utf

Conversation

@Josverl
Copy link
Copy Markdown
Contributor

@Josverl Josverl commented Jun 7, 2026

Summary

This PR has been split off from PR #18854 , ( And should be based on that)

  • only the last 3 commits are unique to this PR

This includes the changes that fix #17855 (all changes to py/objexcept.c, py/gc.h and py/gc.c) As these are fixing a very niche bug.
This PR is intended so they can be evaluated independently (in particular the code size change).

Testing

The relevant test are included in this PR

Generative AI

I used generative AI tools when creating this PR, but a human has checked the
code and is responsible for the code and the description above.

Josverl and others added 26 commits May 20, 2026 20:09
Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Raises LookupError for not implements error handlers
Improves repr() rendering for unicode.

Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Added feature detection at the start of bytes_decode_errors.py test to
skip gracefully when decode method is not available.
(requires MICROPY_CPYTHON_COMPAT).

This fixes test failures on minimal builds and Windows builds that
may not have this feature enabled.

Test now:
- Checks if decode method exists before running tests
- Prints "SKIP" and exits cleanly if decode is not available
- Works correctly on both full-featured and minimal builds

Verified:
- Standard unix build: All tests pass (14 testcases)
- Minimal unix build: Test skips cleanly
- All bytes/bytearray/string tests pass (82 tests, 2191 testcases)

Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Only accepts `utf-8`, `utf8` or `ascii`

Fixes micropython#15849

Signed-off-by: Jos Verlinde <Jos_Verlinde@hotmail.com>
Fixes: issue 3364
Fixes: issue 13084

Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Fixes Issue 17827

Signed-off-by: Jos Verlinde <Jos_Verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Prevent the test from failing by not testing known unsupported characters.
These will be documented in a cpydiff test.

Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
This allows simpler skipping of tests based on enabled capabilities.

Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Removed the dead \U%08x branch in uni_print_quoted.
Characters ≥ 0x110000 are impossible in valid UTF-8, so the branch
was unreachable. It's replaced by a single else that handles
surrogates (0xD800–0xDFFF) with \u%04x.

Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Added  multi-byte sequences to improve test coverage.

Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <Jos_Verlinde@hotmail.com>
+ Correct a few typos in comments.

Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <Jos_Verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <Jos_Verlinde@hotmail.com>
The MP_IS_COMPRESSED_ROM_STRING macro in qstr.h only checkes
if the first byte of a string is 0xff (compression marker).
This caused user-allocated strings on the heap that happened
to start with 0xff (utf-8 continuation byte) to be incorrectly
treated as compressed ROM string.

Modified decompress_error_text_maybe() to add heap pointer validation
before attempting decompression. The fix checks if the pointer is in
the GC heap - if it is, it cannot be a ROM compressed string and
should not be decompressed.
The validation uses the same logic as the VERIFY_PTR macro from gc.c

Alternative to : micropython#17862

Fixes: micropython#17855
Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <jos_verlinde@hotmail.com>
Signed-off-by: Jos Verlinde <Jos_Verlinde@hotmail.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.48%. Comparing base (dc33f04) to head (0aa9852).
⚠️ Report is 71 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #19321   +/-   ##
=======================================
  Coverage   98.47%   98.48%           
=======================================
  Files         176      176           
  Lines       22845    22905   +60     
=======================================
+ Hits        22497    22558   +61     
+ Misses        348      347    -1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 7, 2026

Code size report:

Reference:  samd/mphalport: Run events at least once in mp_hal_delay_ms. [af38ee1]
Comparison: tests: Test exception handling with heap-allocated unicode-like strings. [merge of 0aa9852]
  mpy-cross:  +464 +0.122% 
   bare-arm:    +0 +0.000% 
minimal x86:   +20 +0.011% 
   unix x64:  +864 +0.101% standard
      stm32:  +472 +0.117% PYBV10
      esp32:  +712 +0.041% ESP32_GENERIC[incl +48(data)]
     mimxrt:  +416 +0.106% TEENSY40
        rp2:  +488 +0.053% RPI_PICO_W
       samd:  +416 +0.150% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:  +610 +0.133% VIRT_RV32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash printing exception detail when source code is not valid UTF-8

1 participant