[mypyc] Add `librt.strings.toupper` and `tolower` codepoint primitives by VaggelisD · Pull Request #21553 · python/mypy

VaggelisD · 2026-05-28T08:18:39Z

6th PR of #21418.

This PR introduces two i32 -> i32 case-conversion helpers, alongside the existing classifiers.

The constraint to flag: A single i32 holds one codepoint, but some Unicode case mappings expand to multiple e.g 'ß'.upper() becomes 'SS', 'ﬁ'.upper() becomes 'FI' etc.

For those inputs the primitive returns the input unchanged; This is the same split CPython makes between Py_UNICODE_TOUPPER (codepoint) and str.upper() (string), with the former returning the first codepoint of the expansion.

Users needing full Unicode case conversion should call s.upper() / s.lower() on the string, for which we already have mypyc primitives (#20948). For ASCII benchmarks, the codepoint primitives are ~5x faster than their str counterparts, avoiding the 1-char allocation.

Two i32 -> i32 case-conversion helpers mirroring the existing codepoint classifiers. ASCII fast paths inline (`a..z <-> A..Z` via add/sub 32); non-ASCII delegates to `str.upper` / `str.lower` on a 1-character string via a shared LibRTStrings_ChangeCase_slow helper. When uppercasing or lowercasing expands to multiple codepoints (e.g. 'ß'.upper() == 'SS', 'ﬁ'.upper() == 'FI'), the helper returns the input unchanged so the signature stays i32 -> i32. Allocation failure aborts via CPyError_OutOfMemory, matching how LibRTStrings_IsIdentifier handles OOM and keeping the helpers ERR_NEVER. Following the inline-in-header pattern landed for isidentifier (python#21522), the bodies live as `static inline` in librt_strings.h so they compile directly into both the librt.strings module and mypyc-emitted code with no capsule indirection. Stack: depends on the librt.strings.isidentifier primitive (python#21522).

github-actions · 2026-05-28T09:23:39Z

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

This comment has been minimized.

Sign in to view

VaggelisD force-pushed the librt-strings-toupper-tolower branch from 84a8652 to c2e43f3 Compare May 28, 2026 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mypyc] Add `librt.strings.toupper` and `tolower` codepoint primitives#21553

[mypyc] Add `librt.strings.toupper` and `tolower` codepoint primitives#21553
VaggelisD wants to merge 1 commit into
python:masterfrom
VaggelisD:librt-strings-toupper-tolower

VaggelisD commented May 28, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

VaggelisD commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

VaggelisD commented May 28, 2026 •

edited

Loading