gh-143927: Normalize all line endings (CR, CRLF, and LF) in configparser#143929
Conversation
| value) | ||
| if value is not None or not self._allow_no_value: | ||
| value = delimiter + str(value).replace('\n', '\n\t') | ||
| # Convert all possible line-endings into '\n\t' |
There was a problem hiding this comment.
Not sure the exact underlying security issue. As is, I think this will break round tripping on Windows; Text I/O used by the open expects to encode/decode newlines in particular formats on Windows by default (https://github.com/sethmlarson/cpython/blob/2a122fd420f5b425fd39848a48f5eb196cee2aa7/Lib/configparser.py#L753). See #143428 (comment) for a recent related case + more background links.
In the open calls can we use newline="" potentially?
There was a problem hiding this comment.
Byte-for-byte round-tripping is already kind of broken for newlines. This is a text file; relying on newline format (or e.g. trailing whitespace) is a bad idea to begin with.
In the open calls can we use
newline=""potentially?
Potentially, but it would change the behavior for all users on Windows.
And it wouldn't solve the issue, since users can pass in a file they opened themselves.
|
This PR is stale because it has been open for 30 days with no activity. |
| value) | ||
| if value is not None or not self._allow_no_value: | ||
| value = delimiter + str(value).replace('\n', '\n\t') | ||
| # Convert all possible line-endings into '\n\t' |
There was a problem hiding this comment.
Byte-for-byte round-tripping is already kind of broken for newlines. This is a text file; relying on newline format (or e.g. trailing whitespace) is a bad idea to begin with.
In the open calls can we use
newline=""potentially?
Potentially, but it would change the behavior for all users on Windows.
And it wouldn't solve the issue, since users can pass in a file they opened themselves.
| value = (delimiter + str(value).replace('\r\n', '\n') | ||
| .replace('\r', '\n').replace('\n', '\n\t')) |
There was a problem hiding this comment.
On my system, re is significantly faster than three replaces for reasonable-sized input.
| value = (delimiter + str(value).replace('\r\n', '\n') | |
| .replace('\r', '\n').replace('\n', '\n\t')) | |
| value = delimiter + re.sub('\r\n?|\n', '\n\t', str(value)) |
There was a problem hiding this comment.
Interesting, re is about four times slower than three replacess on my macOS:
❯ p --version --version
Python 3.15.0b2 (v3.15.0b2:94a64bbc6ce, Jun 2 2026, 11:57:29) [Clang 21.0.0 (clang-2100.1.1.101)]
❯ p -m timeit -s "v = 'lorem ipsum dolor sit amet '*5" "v.replace('\r\n', '\n').replace('\r', '\n').replace('\n', '\n\t')"
2000000 loops, best of 5: 189 nsec per loop
❯ p -m timeit -s "import re; v = 'lorem ipsum dolor sit amet '*5" "re.sub('\r\n?|\n', '\n\t', v)"
500000 loops, best of 5: 773 nsec per loop
❯ p -m timeit -s "v = ('lorem ipsum dolor sit amet '*5 + '\n')*5" "v.replace('\r\n', '\n').replace('\r', '\n').replace('\n', '\n\t')"
500000 loops, best of 5: 936 nsec per loop
❯ p -m timeit -s "import re; v = ('lorem ipsum dolor sit amet '*5 + '\n')*5" "re.sub('\r\n?|\n', '\n\t', v)"
100000 loops, best of 5: 3.43 usec per loop
❯ p -m timeit -s "v = 'a\r\nb\nc\rd ' * 200" "v.replace('\r\n', '\n').replace('\r', '\n').replace('\n', '\n\t')"
20000 loops, best of 5: 10.2 usec per loop
❯ p -m timeit -s "import re; v = 'a\r\nb\nc\rd ' * 200" "re.sub('\r\n?|\n', '\n\t', v)"
5000 loops, best of 5: 43.1 usec per loop
❯ p -m timeit -s "v = 'a\r\nb\nc\rd ' * 10000" "v.replace('\r\n', '\n').replace('\r', '\n').replace('\n', '\n\t')"
500 loops, best of 5: 516 usec per loop
❯ p -m timeit -s "import re; v = 'a\r\nb\nc\rd ' * 10000" "re.sub('\r\n?|\n', '\n\t', v)"
100 loops, best of 5: 2.07 msec per loop|
Thanks @sethmlarson for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11, 3.12, 3.13, 3.14, 3.15. |
|
GH-152002 is a backport of this pull request to the 3.15 branch. |
|
GH-152003 is a backport of this pull request to the 3.14 branch. |
|
GH-152004 is a backport of this pull request to the 3.13 branch. |
|
GH-152005 is a backport of this pull request to the 3.12 branch. |
|
GH-152006 is a backport of this pull request to the 3.11 branch. |
|
GH-152007 is a backport of this pull request to the 3.10 branch. |
Uh oh!
There was an error while loading. Please reload this page.