[mypyc] Add librt.strings.toupper and tolower codepoint primitives#21553
Open
VaggelisD wants to merge 1 commit into
Open
[mypyc] Add librt.strings.toupper and tolower codepoint primitives#21553VaggelisD wants to merge 1 commit into
librt.strings.toupper and tolower codepoint primitives#21553VaggelisD wants to merge 1 commit into
Conversation
This comment has been minimized.
This comment has been minimized.
Two i32 -> i32 case-conversion helpers mirroring the existing codepoint classifiers. ASCII fast paths inline (`a..z <-> A..Z` via add/sub 32); non-ASCII delegates to `str.upper` / `str.lower` on a 1-character string via a shared LibRTStrings_ChangeCase_slow helper. When uppercasing or lowercasing expands to multiple codepoints (e.g. 'ß'.upper() == 'SS', 'fi'.upper() == 'FI'), the helper returns the input unchanged so the signature stays i32 -> i32. Allocation failure aborts via CPyError_OutOfMemory, matching how LibRTStrings_IsIdentifier handles OOM and keeping the helpers ERR_NEVER. Following the inline-in-header pattern landed for isidentifier (python#21522), the bodies live as `static inline` in librt_strings.h so they compile directly into both the librt.strings module and mypyc-emitted code with no capsule indirection. Stack: depends on the librt.strings.isidentifier primitive (python#21522).
84a8652 to
c2e43f3
Compare
Contributor
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
6th PR of #21418.
This PR introduces two
i32 -> i32case-conversion helpers, alongside the existing classifiers.The constraint to flag: A single i32 holds one codepoint, but some Unicode case mappings expand to multiple e.g
'ß'.upper()becomes'SS','fi'.upper()becomes'FI'etc.For those inputs the primitive returns the input unchanged; This is the same split CPython makes between
Py_UNICODE_TOUPPER(codepoint) andstr.upper()(string), with the former returning the first codepoint of the expansion.Users needing full Unicode case conversion should call
s.upper()/s.lower()on the string, for which we already have mypyc primitives (#20948). For ASCII benchmarks, the codepoint primitives are ~5x faster than theirstrcounterparts, avoiding the 1-char allocation.