Skip to content

bpo-37621: Don't emit NEWLINE tokens on blank line continuations#14840

Closed
miedzinski wants to merge 5 commits into
python:mainfrom
miedzinski:blank-linecont
Closed

bpo-37621: Don't emit NEWLINE tokens on blank line continuations#14840
miedzinski wants to merge 5 commits into
python:mainfrom
miedzinski:blank-linecont

Conversation

@miedzinski
Copy link
Copy Markdown
Contributor

@miedzinski miedzinski commented Jul 18, 2019

@the-knights-who-say-ni
Copy link
Copy Markdown

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

Copy link
Copy Markdown
Contributor

@asottile asottile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels correct to me -- mind adding some tests for this?

Copy link
Copy Markdown
Contributor

@asottile asottile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any measurable side-effects on the C tokenizer? not sure -- tests look good though

@miedzinski
Copy link
Copy Markdown
Contributor Author

Yes, just found one. Currently (pre-PR)

def f():
    \
  x = 1
  return x

throws an IndentationError (unmatched dedent) on line 4, because line 3 has indentation level from line 2. IMO indent from line 2 should be ignored, because it's empty.

Comment thread Lib/test/test_tokenize.py
OP ':' (1, 8) (1, 9)
NEWLINE '\\n' (1, 9) (1, 10)
NL '\\n' (2, 5) (2, 6)
INDENT ' ' (3, 0) (3, 2)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before PR the indent would be 4 spaces, not 2.

@miedzinski
Copy link
Copy Markdown
Contributor Author

Now this code is accepted, because we calculate indentation after line continuation and then push it to the stack. C and Python tokenizers should behave the same.

@lysnikolaou
Copy link
Copy Markdown
Member

This is no longer relevant after the changes to the tokenizer in 3.12.

@asottile
Copy link
Copy Markdown
Contributor

@lysnikolaou this change also was fixing the c tokenizer

@lysnikolaou
Copy link
Copy Markdown
Member

lysnikolaou commented Oct 12, 2023

The C tokenizer tokenizes the file attached to the issue correctly as far as I can tell.

cpython onmain [$?] via C v15.0.0-clang via 🐍 pyenv 3.11.3cat linecont.py
\

\

\


cpython onmain [$?] via C v15.0.0-clang via 🐍 pyenv 3.11.3 
❯ ./python.exe -m tokenize linecont.py
0,0-0,0:            ENCODING       'utf-8'        
2,0-2,1:            NL             '\n'           
4,0-4,1:            NL             '\n'           
6,0-6,1:            NL             '\n'           
7,0-7,0:            ENDMARKER      ''             

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants