[mypyc] Add librt.strings.isspace char primitive#21462
Open
VaggelisD wants to merge 3 commits into
Open
Conversation
Adds a codepoint-taking `librt.strings.isspace(c: i32) -> bool` that wraps `Py_UNICODE_ISSPACE`. Combined with the existing `ord(s[i])` specialization (python#20578), this lets per-character hot loops avoid the 1-character `PyUnicode` materialization that `s[i].isspace()` forces. Microbenchmark (counting whitespace in a 12 KB SQL fragment, 5000 iterations): mypyc-compiled `s[i].isspace()` takes 0.075 ms; the codepoint path `c: i32 = i32(ord(s[i])); isspace(c)` takes 0.034 ms, roughly 2.2x faster. Wins compound for tokenizer-shaped workloads mixing classification and literal compares.
This comment has been minimized.
This comment has been minimized.
Contributor
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
Without the _librt suffix, has_test_name_tag returns False and the test imports the installed PyPI librt 0.11.0, which lacks isspace.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR serves as the foundation for the smaller alternative of the
charproposal request.It builds on top of the existing
librt.stringsandord(char)specialization, so code like the following can lower to a direct codepoint check i.e without materializing a 1-character str:Semantics:
i32Falsestr.isspace()on the corresponding 1-character string (ensured by exhaustive test too)This PR adds only
isspace; It does not add any new type-system surface, and it does not introduce the rest of the codepoint helperfamily. I'll be contributing these next if this direction looks good.