bpo-45467: Fix IncrementalDecoder and StreamReader in the "raw-unicode-escape" codec#28944
Conversation
…e-escape" codec They support now splitting escape sequences between input chunks. Add the third parameter "final" in codecs.raw_unicode_escape_decode(). It is True by default to match the former behavior.
vstinner
left a comment
There was a problem hiding this comment.
LGTM, I just proposed a minor change.
| @@ -6803,7 +6814,10 @@ PyUnicode_DecodeRawUnicodeEscape(const char *s, | |||
| startinpos = s - starts - 2; | |||
There was a problem hiding this comment.
Could you initialize startinpos to 0? It is not techincally needed, but it helps me to follow the code for the if (s >= end) { case above.
There was a problem hiding this comment.
Good catch. There was a bug, and it was unnoticed due to bugs in tests which hided also other bug.
There was a problem hiding this comment.
Ha ha, ok, good that I helped you to find bugs. I didn't notice them. The code only looked suspicious, I failed to follow goto.
malemburg
left a comment
There was a problem hiding this comment.
Looks good. Thanks, Serhiy.
|
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10. |
|
Sorry, @serhiy-storchaka, I could not cleanly backport this to |
|
Sorry @serhiy-storchaka, I had trouble checking out the |
…e-escape" codec (pythonGH-28944) They support now splitting escape sequences between input chunks. Add the third parameter "final" in codecs.raw_unicode_escape_decode(). It is True by default to match the former behavior. (cherry picked from commit 39aa983)
|
GH-28952 is a backport of this pull request to the 3.10 branch. |
…-unicode-escape" codec (pythonGH-28944) They support now splitting escape sequences between input chunks. Add the third parameter "final" in codecs.raw_unicode_escape_decode(). It is True by default to match the former behavior. (cherry picked from commit 39aa983) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
GH-28953 is a backport of this pull request to the 3.9 branch. |
|
…-unicode-escape" codec (GH-28944) (GH-28952) They support now splitting escape sequences between input chunks. Add the third parameter "final" in codecs.raw_unicode_escape_decode(). It is True by default to match the former behavior. (cherry picked from commit 39aa983) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
Thanks for this interesting fix! I like incremental encoders and decoders ;-) Sometimes, writing a correct implementation is challenging, but when it works, it's very convenient! |
|
What is left it is fixing the rest of broken incremental decoders (idna, punycode, and binary codecs like hex_codec, base64_codec, etc). A lot of work. |
They support now splitting escape sequences between input chunks.
Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.
https://bugs.python.org/issue45467