Emit more specific syntax errors for incompatible string prefixes by dylwil3 · Pull Request #19466 · astral-sh/ruff

dylwil3 · 2025-07-21T15:08:35Z

Following the CPython changes here:

Improve error message for strings with conflicting prefixes python/cpython#133197
gh-133197: Improve error message for ft"" and bt"" cases python/cpython#133202
gh-133197: Improve error message for incompatible string / bytes prefixes python/cpython#133242

we modify the lexing of double character string prefixes to emit a more useful syntax error in the case of invalid double character prefixes. Our implementation differs slightly from that of CPython in that our error message retains the casing of the prefixes and we emit a separate error for repeated prefixes (like uu"hello").

Example

def foo():
    x = uf"a more complicated {example}"
    return x

Before:

❯ ruff check --no-cache --isolated ex.py
ex.py:2:11: SyntaxError: Simple statements must be separated by newlines or semicolons
  |
1 | def foo():
2 |     x = uf"a more complicated {example}"
  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3 |     return x
  |

Found 1 error.

After:

❯ cargo run -p ruff -- check --no-cache --isolated ex.py
ex.py:2:9: SyntaxError: "u" and "f" prefixes are incompatible
  |
1 | def foo():
2 |     x = uf"a more complicated {example}"
  |         ^^
3 |     return x
  |

Found 1 error.

Closes #19342

github-actions · 2025-07-21T15:52:37Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to parse examples/mcp/databricks_mcp_cookbook.ipynb:12:1:8: Simple statements must be separated by newlines or semicolons

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to parse examples/mcp/databricks_mcp_cookbook.ipynb:12:1:8: Simple statements must be separated by newlines or semicolons

MichaReiser

Nice.

I'm curious to see how this looks in a parsed AST to better understand the impact of this change.

MichaReiser · 2025-07-21T15:49:47Z

+source: crates/ruff_python_parser/src/lexer.rs
+expression: "lex(r#\"uU\"hello\"\"#, Mode::Module, TextSize::default())"
+---
+## Tokens


Let's also add a parser test. I'm curious to see how the parser recovers overall.

Added.

What happens is that the parser will behave as though the string had no prefixes and parse it as an ordinary string literal. What may be potentially misleading is that the range of this string will include the skipped prefix. So, for example, if you added up the lengths for the content of the string, the quote (single or triple), and the prefix (apparently 0), you would come up short.

I don't think this trips up the parser, and we don't do anything downstream with source that has invalid syntax - i.e. no lint rules are run - so I don't think we have to worry.

But I could try to arrange for the range to exclude the prefix? Then the AST would behave as if there was nothing there - like it does when we skip trivia.

I don't think this trips up the parser, and we don't do anything downstream with source that has invalid syntax - i.e. no lint rules are run - so I don't think we have to worry.

I think we do run all token based rules.

But I could try to arrange for the range to exclude the prefix? Then the AST would behave as if there was nothing there - like it does when we skip trivia.

We would need to emit the prefix as an Unknown token because the lexer isn't allowed to skip over any content. I suspect that this will undo much of the improvements you made (as the Unknown token will confuse the parser)

Hmm... what if instead we parsed the thing as a string, like we're currently doing, but then expanded the possible prefixes to include invalid ones. So instead of the AST showing prefix: Empty it was like prefix: InvalidPair(first,second)? Then all the characters would be accounted for.

I think we do run all token based rules.

oh interesting! I'll add a test fixture for some of those then

That's certainly worth exploring. One challenge is that the data needs to fit into TokenFlags, which is a u8. The other, that you still need to pick a TokenKind.

The last challenge is that it requires designing how the different StringFlags represent this new state (most of those currently assert that the flags are valid).

I'm not sure I can help you much with this right now as it seems fairly low priority (while a nice improvement).

For now, I suggest making a local improvement (which might be to go with Unknown)

MichaReiser · 2025-07-21T15:53:16Z

+                        // need to record it.
+                        self.push_error(LexicalError::new(


It would be helpful to include an explaination why it's okay to drop the unknown token (push_token should probably be marked with #[must_use]).

Is there a risk that any downstream user of the tokens or AST will drip over the fact that the token/ast prefix doesn't match the source prefix?

MichaReiser · 2025-07-21T16:45:59Z

+                                        value: "there",
+                                        flags: StringLiteralFlags {
+                                            quote_style: Double,
+                                            prefix: Empty,


Empty seems suboptimal. I think we should try to preserve the most important prefix (e.g. if f is present, favor it over u)

What is the most important prefix for ft"hello {there}"? or bf"hey"?

Not sure for ft but I would ignore the binary prefix for the second case.

My concern with the current solution is that ignoring both prefixes is the least correct. Ignoring just one would be closer to what the user intended.

fwiw in the current state of affairs we are parsing the full prefix as a Name and then the string literal as an ordinary string. So I feel like even if we parsed this as an ordinary string with an invalid prefix it would be an improvement.

but I'll try to think of a way to implement what you're suggesting!

MichaReiser · 2025-08-18T07:38:39Z

@dylwil3 what's the status of this PR?

dylwil3 · 2025-08-18T12:26:19Z

I'll revert to draft and re-open when I get a chance to visit it again - apologies for cluttering the open PRs!

dylwil3 · 2025-12-10T18:02:50Z

Gonna close this since I won't be able to get back to it until higher priority items are cleared - which may not be soon. Others are welcome to pick it up!

dylwil3 added 3 commits July 21, 2025 10:02

add lexical error for prefixes

1a46fa8

emit specific error for try lex double char prefix

4c450cf

update snapshots

430bb7b

dylwil3 requested review from MichaReiser and dhruvmanila as code owners July 21, 2025 15:08

dylwil3 added the parser Related to the parser label Jul 21, 2025

MichaReiser reviewed Jul 21, 2025

View reviewed changes

dylwil3 added 3 commits July 21, 2025 11:42

update doc comment

7d7ac63

add parser test

cf45718

clarify comment

f535779

MichaReiser reviewed Jul 21, 2025

View reviewed changes

dylwil3 marked this pull request as draft August 18, 2025 12:26

dylwil3 closed this Dec 10, 2025

Conversation

dylwil3 commented Jul 21, 2025

Example

Uh oh!

github-actions Bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dylwil3 Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser commented Aug 18, 2025

Uh oh!

dylwil3 commented Aug 18, 2025

Uh oh!

dylwil3 commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jul 21, 2025 •

edited

Loading

`ruff-ecosystem` results

dylwil3 Jul 21, 2025 •

edited

Loading