Skip to content

Match CPython's islower/isupper exactly#7646

Merged
youknowone merged 1 commit into
RustPython:mainfrom
joshuamegnauth54:improve-islower-isupper
Apr 24, 2026
Merged

Match CPython's islower/isupper exactly#7646
youknowone merged 1 commit into
RustPython:mainfrom
joshuamegnauth54:improve-islower-isupper

Conversation

@joshuamegnauth54
Copy link
Copy Markdown
Contributor

@joshuamegnauth54 joshuamegnauth54 commented Apr 21, 2026

This PR fixes a regression from my last islower/isupper patch. Python's Bytes doesn't assume an encoding, so methods like islower should only consider ASCII casing.

I updated islower/isupper for UTF-8 and WTF-8 to match CPython exactly. The two functions now use the same properties as CPython and in the exact same way that CPython.

I updated the unit tests to pass on Python 3.15. Unicode updates sometimes cause properties to shift. I previously tested everything on Python 3.14, but that lead to failures that I assumed were bugs but were actually due to Unicode differences. For example, U+0295 is a lower case letter in older Unicode versions but is NOT in newer versions.


With Python 3.14:

assert "\u0295".islower()
assert not "\u0295".isupper()

With Python 3.15 and RustPython:

assert not "\u0295".islower()
assert not "\u0295".isupper()
assert not "\u0295".istitle()

^It's imperative to test with the latest Python because of Unicode differences. I, uh, spent like three hours trying to fix a bug that didn't exist because I was crosschecking the tests with Python 3.14... Oops. I bumped CI's Python version to 3.15 as well. Reverted because it's more complicated than I thought.

Here is CPython's isupper implementation for comparison.

CPython's core logic for islower (isupper is basically the same)

/* Special case for empty strings */
if (length == 0)
    Py_RETURN_FALSE;

cased = 0;
for (i = 0; i < length; i++) {
    const Py_UCS4 ch = PyUnicode_READ(kind, data, i);

    if (Py_UNICODE_ISUPPER(ch) || Py_UNICODE_ISTITLE(ch))
        Py_RETURN_FALSE;
    else if (!cased && Py_UNICODE_ISLOWER(ch))
        cased = 1;
}
return PyBool_FromLong(cased);

RustPython's islower:

let mut all_cased = false;
for chunk in self.as_bytes().utf8_chunks().map(|c| c.valid()) {
    if chunk.chars().any(|c| {
        INVALID::for_char(c)
            || GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c))
    }) {
        return false;
    }
    if !all_cased && chunk.chars().any(VALID::for_char) {
        all_cased = true;
    }
}
all_cased

They're the same now! 😁

Summary by CodeRabbit

  • Bug Fixes

    • Improved case-checking: text strings use Unicode casedness detection while byte/encoded checks use ASCII-only scanning, preventing misclassification for mixed/invalid byte sequences and certain rare or non-cased characters.
  • Tests

    • Updated and expanded tests for Unicode edge cases, version-dependent casing, non-cased scripts, and bytes case-checking with invalid/mixed sequences.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Reworked string casedness detection: added a generic AnyStr::is_cased<VALID, INVALID> helper, changed py_islower/py_isupper to use an ASCII-fast-path plus is_cased UTF-8 scan (moving away from specific CodePointSetData checks), updated builtin wiring and tests, and made a small cast simplification in _thread.rs.

Changes

Cohort / File(s) Summary
Case detection core
crates/vm/src/anystr.rs
Added fn is_cased<VALID, INVALID>(&self) -> bool; reimplemented py_islower/py_isupper to use an ASCII byte scan for quick decisions and to call the generic ICU-property-based is_cased for UTF-8 characters, removing reliance on specific CodePointSetData/Alphabetic checks.
Builtins wiring
crates/vm/src/builtins/str.rs
Brought Lowercase/Uppercase predicates into scope and updated AnyStr impls for str/Wtf8 to use is_cased::<Lowercase, Uppercase>() and vice versa; minor import regrouping.
Thread util minor change
crates/vm/src/stdlib/_thread.rs
On Unix, changed handle.as_pthread_t() as u64 to as _ to rely on the function's return-type context; no behavioral change.
Tests updated
extra_tests/snippets/builtin_str.py, extra_tests/snippets/builtin_str_encode.py
Adjusted casing tests to be version-aware for \u0295, added non-cased checks (e.g., "ジョジョ"), and expanded str/bytes .isupper()/.islower() coverage including invalid-UTF8 byte scenarios.

Sequence Diagram(s)

(omitted — changes are focused on algorithmic/case-detection logic within the VM and do not introduce a multi-component sequential flow requiring visualization)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh
  • youknowone

Poem

🐰 I hopped through bytes and Unicode trees,
I sniffed at Lowercase and Uppercase with ease.
A helper sprang up, tidy and bright,
ASCII first, then UTF-8 light.
Nose twitch—case sorted, nibble of delight! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: aligning islower/isupper implementations with CPython's exact behavior, which is the primary objective across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/anystr.rs`:
- Around line 442-456: The is_cased helper (fn is_cased<VALID, INVALID>)
currently only rejects INVALID property and thus lets titlecase characters slip;
update the loop to also detect and reject titlecase characters for each char in
chunk.chars() (e.g., call CharExt::is_titlecase(ch) or check
GeneralCategory::for_char(ch) == GeneralCategory::TitlecaseLetter) so any
titlecase char causes an early return false, preserving the existing VALID
detection logic and final all_cased result.

In `@crates/vm/src/stdlib/_thread.rs`:
- Line 469: The return in function thread_to_id currently returns
handle.as_pthread_t() which is a platform-dependent RawPthread; explicitly cast
that value to u64 before returning to match the function's u64 return type and
ensure Unix portability — locate thread_to_id and change the return to cast
handle.as_pthread_t() into u64 (e.g., as u64) so the conversion is explicit and
clear.

In `@extra_tests/snippets/builtin_str_encode.py`:
- Around line 24-26: Replace the incorrect 6-digit escape and wrong expectation:
use the 8-digit escape sequence \U0001f431 for the emoji in both assertions and
change the first assertion to expect True (emoji bytes are ignored) by asserting
"\U0001f431 CAT".encode().isupper(), while keeping the second assertion as
"\U0001f431 cat".encode().islower().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 4a7cc4e6-e414-4b7d-9ca0-4066c74a3fa2

📥 Commits

Reviewing files that changed from the base of the PR and between d5921d1 and c5810cc.

📒 Files selected for processing (5)
  • crates/vm/src/anystr.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/stdlib/_thread.rs
  • extra_tests/snippets/builtin_str.py
  • extra_tests/snippets/builtin_str_encode.py

Comment thread crates/vm/src/anystr.rs
Comment thread crates/vm/src/stdlib/_thread.rs Outdated
Comment thread extra_tests/snippets/builtin_str_encode.py Outdated
@joshuamegnauth54 joshuamegnauth54 marked this pull request as draft April 21, 2026 03:36
@joshuamegnauth54 joshuamegnauth54 force-pushed the improve-islower-isupper branch 2 times, most recently from a3559f2 to 5c0fa80 Compare April 21, 2026 18:43
@joshuamegnauth54 joshuamegnauth54 changed the title Match CPython's islower/upper more closely Match CPython's islower/isupper exactly Apr 21, 2026
@fanninpm
Copy link
Copy Markdown
Contributor

Are there any relevant tests in, say, Lib/test/string_tests.py or Lib/test/test_str.py?

@joshuamegnauth54
Copy link
Copy Markdown
Contributor Author

It looks like extra_tests/snippets/stdlib_imp.py fails with Python 3.15 but passes with Python 3.14. 🤔 I'll take a look at this later to see if I can fix it.

@fanninpm I skimmed them but couldn't find any tests for Unicode properties. It makes sense, though, because the properties may change across Unicode versions (e.g. \u0295 above). RustPython can probably remove its Unicode tests too once its string implementations match CPython exactly.

@joshuamegnauth54 joshuamegnauth54 marked this pull request as ready for review April 21, 2026 23:08
@joshuamegnauth54 joshuamegnauth54 force-pushed the improve-islower-isupper branch from 5c0fa80 to dde4ac3 Compare April 21, 2026 23:08
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/vm/src/anystr.rs (1)

444-462: Optional: fold is_cased into a single pass over chars.

Functionally correct and the titlecase rejection addresses the regression flagged on the previous revision. A minor refinement: each chunk is traversed twice (chunk.chars().any(...) for the invalid/titlecase check, then again for the valid check). A single for c in ... loop is a touch more efficient on long strings and mirrors CPython's unicode_islower_impl control flow more directly.

♻️ Proposed single-pass refactor
     fn is_cased<VALID, INVALID>(&self) -> bool
     where
         VALID: BinaryProperty,
         INVALID: BinaryProperty,
     {
         let mut all_cased = false;
-        for chunk in self.as_bytes().utf8_chunks().map(|c| c.valid()) {
-            if chunk.chars().any(|c| {
-                INVALID::for_char(c)
-                    || GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c))
-            }) {
-                return false;
-            }
-            if !all_cased && chunk.chars().any(VALID::for_char) {
-                all_cased = true;
-            }
+        for c in self
+            .as_bytes()
+            .utf8_chunks()
+            .flat_map(|c| c.valid().chars())
+        {
+            if INVALID::for_char(c)
+                || GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c))
+            {
+                return false;
+            }
+            if !all_cased && VALID::for_char(c) {
+                all_cased = true;
+            }
         }
         all_cased
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/anystr.rs` around lines 444 - 462, The current is_cased
implementation iterates each utf8 chunk twice (once to check
INVALID::for_char/TitlecaseLetter via GeneralCategory::for_char and again to
check VALID::for_char), so refactor is_cased to perform a single pass: for each
chunk from self.as_bytes().utf8_chunks().map(|c| c.valid()) iterate each char
once, immediately return false if INVALID::for_char(c) or
GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c)) is
true, and otherwise set a flag when VALID::for_char(c) is seen; after processing
all chunks return that flag. Keep the same trait bounds (VALID: BinaryProperty,
INVALID: BinaryProperty) and behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/vm/src/anystr.rs`:
- Around line 444-462: The current is_cased implementation iterates each utf8
chunk twice (once to check INVALID::for_char/TitlecaseLetter via
GeneralCategory::for_char and again to check VALID::for_char), so refactor
is_cased to perform a single pass: for each chunk from
self.as_bytes().utf8_chunks().map(|c| c.valid()) iterate each char once,
immediately return false if INVALID::for_char(c) or
GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c)) is
true, and otherwise set a flag when VALID::for_char(c) is seen; after processing
all chunks return that flag. Keep the same trait bounds (VALID: BinaryProperty,
INVALID: BinaryProperty) and behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: d31dd5c7-e5bc-4027-bdd5-f17b032670cf

📥 Commits

Reviewing files that changed from the base of the PR and between c5810cc and dde4ac3.

📒 Files selected for processing (5)
  • crates/vm/src/anystr.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/stdlib/_thread.rs
  • extra_tests/snippets/builtin_str.py
  • extra_tests/snippets/builtin_str_encode.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/vm/src/stdlib/_thread.rs
  • extra_tests/snippets/builtin_str.py
  • extra_tests/snippets/builtin_str_encode.py

@joshuamegnauth54 joshuamegnauth54 force-pushed the improve-islower-isupper branch from dde4ac3 to a771228 Compare April 21, 2026 23:21
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/vm/src/anystr.rs (1)

441-465: is_cased now faithfully mirrors CPython's unicode_is{upper,lower}_impl (titlecase regression fixed).

The previous concern about titlecase code points (e.g. "\u01c5A".isupper() returning true) is resolved by the GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c)) guard, which is equivalent to CPython's Py_UNICODE_ISTITLE (set on Lt). The control flow — early-return false on INVALID or titlecase, set all_cased once on VALID, return all_cased — matches CPython line-for-line.

The as_bytes().utf8_chunks().flat_map(|c| c.valid().chars()) pattern is the right choice for the Wtf8 impl too: isolated surrogates are dropped, but since surrogates are never Lu/Ll/Lt in CPython either, the observable behavior is identical.

Optional readability nit (non-blocking): the titlecase test could be written more directly, in line with the nearby isdecimal style on str.rs:

♻️ Optional tweak
-            if INVALID::for_char(c)
-                || GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c))
-            {
+            if INVALID::for_char(c)
+                || matches!(GeneralCategory::for_char(c), GeneralCategory::TitlecaseLetter)
+            {
                 return false;
             }

This also lets you drop GeneralCategoryGroup from the imports at the top of the file if no other user remains.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/anystr.rs` around lines 441 - 465, Replace the Titlecase guard
in is_cased to check the GeneralCategory directly (e.g. compare
GeneralCategory::for_char(c) to the TitlecaseLetter variant or call a dedicated
is_titlecase method) instead of using
GeneralCategoryGroup::TitlecaseLetter.contains(...); update the import list to
remove GeneralCategoryGroup if unused (keep GeneralCategory) so the check
mirrors the nearby isdecimal style and removes the extra import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/vm/src/anystr.rs`:
- Around line 441-465: Replace the Titlecase guard in is_cased to check the
GeneralCategory directly (e.g. compare GeneralCategory::for_char(c) to the
TitlecaseLetter variant or call a dedicated is_titlecase method) instead of
using GeneralCategoryGroup::TitlecaseLetter.contains(...); update the import
list to remove GeneralCategoryGroup if unused (keep GeneralCategory) so the
check mirrors the nearby isdecimal style and removes the extra import.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 5635961c-c3cc-4e5a-bf28-03108999e18b

📥 Commits

Reviewing files that changed from the base of the PR and between dde4ac3 and a771228.

📒 Files selected for processing (5)
  • crates/vm/src/anystr.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/stdlib/_thread.rs
  • extra_tests/snippets/builtin_str.py
  • extra_tests/snippets/builtin_str_encode.py
✅ Files skipped from review due to trivial changes (2)
  • crates/vm/src/stdlib/_thread.rs
  • extra_tests/snippets/builtin_str_encode.py

@joshuamegnauth54 joshuamegnauth54 force-pushed the improve-islower-isupper branch from ead1069 to 0acaac3 Compare April 23, 2026 17:42
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/vm/src/anystr.rs (1)

444-465: is_cased correctly rejects opposite-case and titlecase — matches CPython.

The titlecase rejection (previously flagged) is now in place, so strings like "DžA".isupper() and "Dža".islower() will return false as CPython does.

Two minor notes:

  1. The titlecase test is slightly indirect — a direct equality is clearer and avoids constructing a group:
♻️ Optional simplification
-            if INVALID::for_char(c)
-                || GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c))
-            {
+            if INVALID::for_char(c) || GeneralCategory::for_char(c) == GeneralCategory::TitlecaseLetter {
                 return false;
             }
  1. For Wtf8, unpaired surrogates are encoded as invalid UTF-8 bytes and get skipped by utf8_chunks().flat_map(|c| c.valid().chars()). Surrogates aren't cased, so this is semantically fine, but worth a brief comment so future readers don't mistake the chunking for a bug.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/anystr.rs` around lines 444 - 465, The is_cased function should
use a direct titlecase comparison and document surrogate behavior: replace the
indirect
GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c))
with an equality check GeneralCategory::for_char(c) ==
GeneralCategory::TitlecaseLetter inside is_cased (keeping the early return where
INVALID::for_char(c) triggers false), and add a short comment near the
utf8_chunks().flat_map(...) explaining that Wtf8 encodes unpaired surrogates as
invalid UTF‑8 bytes which are skipped by utf8_chunks so surrogates won’t be
seen/cased; reference the is_cased function and the VALID/INVALID,
GeneralCategory, and utf8_chunks symbols when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/vm/src/anystr.rs`:
- Around line 444-465: The is_cased function should use a direct titlecase
comparison and document surrogate behavior: replace the indirect
GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c))
with an equality check GeneralCategory::for_char(c) ==
GeneralCategory::TitlecaseLetter inside is_cased (keeping the early return where
INVALID::for_char(c) triggers false), and add a short comment near the
utf8_chunks().flat_map(...) explaining that Wtf8 encodes unpaired surrogates as
invalid UTF‑8 bytes which are skipped by utf8_chunks so surrogates won’t be
seen/cased; reference the is_cased function and the VALID/INVALID,
GeneralCategory, and utf8_chunks symbols when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 4741774e-310b-4e4d-88cf-c5c207167f99

📥 Commits

Reviewing files that changed from the base of the PR and between a771228 and 0acaac3.

📒 Files selected for processing (5)
  • crates/vm/src/anystr.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/stdlib/_thread.rs
  • extra_tests/snippets/builtin_str.py
  • extra_tests/snippets/builtin_str_encode.py
✅ Files skipped from review due to trivial changes (1)
  • crates/vm/src/stdlib/_thread.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • extra_tests/snippets/builtin_str_encode.py

This PR fixes a regression from my last islower/isupper patch.
Python's Bytes doesn't assume an encoding, so methods like islower
should only consider ASCII casing.

I updated islower/isupper for UTF-8 and WTF-8 to match CPython more
closely. The two functions now use the same properties as CPython and
now match CPython exactly.

I updated the unit tests to pass on Python 3.15. Unicode updates
sometimes cause properties to shift. I previously tested everything on
Python 3.14, but that lead to failures that I assumed were bugs but were
actually due to Unicode differences. For example, U+0295 is a lower case
letter in older Unicode versions but is NOT in newer versions.

One of the new tests is disabled on Python 3.14 for now because it will
fail in CI till CI is bumped to 3.15.
@joshuamegnauth54 joshuamegnauth54 force-pushed the improve-islower-isupper branch from 0acaac3 to 043dbc5 Compare April 24, 2026 00:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/vm/src/anystr.rs (1)

455-457: Prefer direct GeneralCategory equality over group containment for single-category checks.

GeneralCategoryGroup::TitlecaseLetter is a single-category group, so .contains(...) is unnecessary. The codebase already uses this pattern in str.rs with matches!(GeneralCategory::for_char(c), GeneralCategory::DecimalNumber) for equivalent checks.

♻️ Proposed tweak
-            if INVALID::for_char(c)
-                || GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c))
-            {
+            if INVALID::for_char(c)
+                || GeneralCategory::for_char(c) == GeneralCategory::TitlecaseLetter
+            {
                 return false;
             }

After this change, GeneralCategoryGroup can be removed from the import list at lines 7–9, as it has no other uses in this file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/anystr.rs` around lines 455 - 457, The code uses
GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c)) for
a single-category check; replace that containment call with a direct
GeneralCategory comparison (e.g., match or matches! against
GeneralCategory::TitlecaseLetter) alongside the existing INVALID::for_char(c)
check in the same if, and then remove GeneralCategoryGroup from the imports
(it's no longer used in this file); update the condition that currently reads
GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c)) to
use direct equality/matching against GeneralCategory::TitlecaseLetter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/vm/src/anystr.rs`:
- Around line 455-457: The code uses
GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c)) for
a single-category check; replace that containment call with a direct
GeneralCategory comparison (e.g., match or matches! against
GeneralCategory::TitlecaseLetter) alongside the existing INVALID::for_char(c)
check in the same if, and then remove GeneralCategoryGroup from the imports
(it's no longer used in this file); update the condition that currently reads
GeneralCategoryGroup::TitlecaseLetter.contains(GeneralCategory::for_char(c)) to
use direct equality/matching against GeneralCategory::TitlecaseLetter.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 4021c102-3e15-47ce-80f5-80b372772a1d

📥 Commits

Reviewing files that changed from the base of the PR and between 0acaac3 and 043dbc5.

📒 Files selected for processing (5)
  • crates/vm/src/anystr.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/stdlib/_thread.rs
  • extra_tests/snippets/builtin_str.py
  • extra_tests/snippets/builtin_str_encode.py
✅ Files skipped from review due to trivial changes (1)
  • crates/vm/src/stdlib/_thread.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • extra_tests/snippets/builtin_str.py
  • extra_tests/snippets/builtin_str_encode.py

Copy link
Copy Markdown
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@youknowone youknowone merged commit dc0c814 into RustPython:main Apr 24, 2026
20 checks passed
@joshuamegnauth54 joshuamegnauth54 deleted the improve-islower-isupper branch April 24, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants