Skip to content

bpo-45467: Fix IncrementalDecoder and StreamReader in the "raw-unicode-escape" codec#28944

Merged
serhiy-storchaka merged 2 commits into
python:mainfrom
serhiy-storchaka:incremental-raw-unicode-escape-codec
Oct 14, 2021
Merged

bpo-45467: Fix IncrementalDecoder and StreamReader in the "raw-unicode-escape" codec#28944
serhiy-storchaka merged 2 commits into
python:mainfrom
serhiy-storchaka:incremental-raw-unicode-escape-codec

Conversation

@serhiy-storchaka
Copy link
Copy Markdown
Member

@serhiy-storchaka serhiy-storchaka commented Oct 14, 2021

They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.

https://bugs.python.org/issue45467

…e-escape" codec

They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.
Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just proposed a minor change.

Comment thread Objects/unicodeobject.c Outdated
@@ -6803,7 +6814,10 @@ PyUnicode_DecodeRawUnicodeEscape(const char *s,
startinpos = s - starts - 2;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you initialize startinpos to 0? It is not techincally needed, but it helps me to follow the code for the if (s >= end) { case above.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. There was a bug, and it was unnoticed due to bugs in tests which hided also other bug.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha ha, ok, good that I helped you to find bugs. I didn't notice them. The code only looked suspicious, I failed to follow goto.

Copy link
Copy Markdown
Member

@malemburg malemburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks, Serhiy.

@serhiy-storchaka serhiy-storchaka merged commit 39aa983 into python:main Oct 14, 2021
@miss-islington
Copy link
Copy Markdown
Contributor

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10.
🐍🍒⛏🤖

@miss-islington
Copy link
Copy Markdown
Contributor

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.10 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 39aa98346d5dd8ac591a7cafb467af21c53f1e5d 3.10

@serhiy-storchaka serhiy-storchaka deleted the incremental-raw-unicode-escape-codec branch October 14, 2021 17:04
@miss-islington
Copy link
Copy Markdown
Contributor

Sorry @serhiy-storchaka, I had trouble checking out the 3.9 backport branch.
Please backport using cherry_picker on command line.
cherry_picker 39aa98346d5dd8ac591a7cafb467af21c53f1e5d 3.9

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request Oct 14, 2021
…e-escape" codec (pythonGH-28944)

They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.

(cherry picked from commit 39aa983)
@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Oct 14, 2021
@bedevere-bot
Copy link
Copy Markdown

GH-28952 is a backport of this pull request to the 3.10 branch.

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request Oct 14, 2021
…-unicode-escape" codec (pythonGH-28944)

They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.
(cherry picked from commit 39aa983)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@bedevere-bot
Copy link
Copy Markdown

GH-28953 is a backport of this pull request to the 3.9 branch.

@bedevere-bot
Copy link
Copy Markdown

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot ARM64 macOS 3.x has failed when building commit 39aa983.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/725/builds/157) and take a look at the build logs.
  4. Check if the failure is related to this commit (39aa983) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/725/builds/157

Failed tests:

  • test_importlib

Failed subtests:

  • test_multiprocessing_pool_circular_import - test.test_importlib.test_threaded_import.ThreadedImportTests

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

408 tests OK.

10 slowest tests:

  • test_concurrent_futures: 3 min 33 sec
  • test_multiprocessing_spawn: 2 min 32 sec
  • test_multiprocessing_forkserver: 1 min 48 sec
  • test_unparse: 1 min 31 sec
  • test_asyncio: 1 min 20 sec
  • test_tokenize: 1 min 15 sec
  • test_logging: 1 min 6 sec
  • test_lib2to3: 58.8 sec
  • test_capi: 51.0 sec
  • test_io: 46.2 sec

1 test failed:
test_importlib

16 tests skipped:
test_devpoll test_epoll test_gdb test_ioctl test_msilib
test_multiprocessing_fork test_ossaudiodev test_spwd
test_startfile test_tix test_tk test_ttk_guionly test_winconsoleio
test_winreg test_winsound test_zipfile64

1 re-run test:
test_importlib

Total duration: 10 min 30 sec

Click to see traceback logs
Traceback (most recent call last):
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/test/test_importlib/test_threaded_import.py", line 258, in test_multiprocessing_pool_circular_import
    script_helper.assert_python_ok(fn)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/test/support/script_helper.py", line 160, in assert_python_ok
    return _assert_python(True, *args, **env_vars)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/test/support/script_helper.py", line 145, in _assert_python
    res.fail(cmd_line)
    ^^^^^^^^^^^^^^^^^^
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/test/support/script_helper.py", line 72, in fail
    raise AssertionError("Process return code is %d\n"
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Process return code is 1
command line: ['/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/python.exe', '-X', 'faulthandler', '-I', '/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/test/test_importlib/partial/pool_in_threads.py']


Traceback (most recent call last):
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/multiprocessing/resource_tracker.py", line 209, in main
    cache[rtype].remove(name)
    ^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '/psm_424153d9'


Traceback (most recent call last):
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/multiprocessing/resource_tracker.py", line 209, in main
    cache[rtype].remove(name)
    ^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '/psm_6840eece'


Traceback (most recent call last):
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/test/test_importlib/partial/pool_in_threads.py", line 9, in t
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/multiprocessing/context.py", line 119, in Pool
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/multiprocessing/pool.py", line 196, in __init__
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/multiprocessing/context.py", line 113, in SimpleQueue
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/multiprocessing/queues.py", line 342, in __init__
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/multiprocessing/context.py", line 68, in Lock
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/multiprocessing/synchronize.py", line 162, in __init__
  File "/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/multiprocessing/synchronize.py", line 57, in __init__
OSError: [Errno 24] Too many open files
/Users/buildbot/buildarea/3.x.pablogsal-macos-m1.macos-with-brew/build/Lib/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 116 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
---

serhiy-storchaka added a commit that referenced this pull request Oct 14, 2021
…-unicode-escape" codec (GH-28944) (GH-28952)

They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.
(cherry picked from commit 39aa983)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this pull request Oct 14, 2021
…e-escape" codec (GH-28944) (GH-28953)

They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.

(cherry picked from commit 39aa983)
@vstinner
Copy link
Copy Markdown
Member

Thanks for this interesting fix! I like incremental encoders and decoders ;-) Sometimes, writing a correct implementation is challenging, but when it works, it's very convenient!

@serhiy-storchaka
Copy link
Copy Markdown
Member Author

What is left it is fixing the rest of broken incremental decoders (idna, punycode, and binary codecs like hex_codec, base64_codec, etc). A lot of work.

@serhiy-storchaka serhiy-storchaka removed their assignment Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-bug An unexpected behavior, bug, or error

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants