gh-146333: Fix quadratic regex backtracking in configparser option parsing#146399
Conversation
12c55b1 to
fe7efda
Compare
| # Compiled regular expression for matching sections | ||
| SECTCRE = re.compile(_SECT_TMPL, re.VERBOSE) | ||
| # Compiled regular expression for matching options with typical separators | ||
| OPTCRE = re.compile(_OPT_TMPL.format(delim="=|:"), re.VERBOSE) |
There was a problem hiding this comment.
This is safe because the option name is already stripped via .rstrip() in _handle_option (line 1160), and the value is stripped via .strip() (line 1169).
The regexes are publicly exposed, this breaks it for people who use them directly.
fe7efda to
85407ee
Compare
|
Good point, thanks. Updated to keep the regexes unchanged. Instead, |
|
Overriding Would it work to add negative lookahead, @joshuaswanson, please don't force-push to CPython PR branches -- it makes the changes a little harder to follow for reviewers, and every PR gets squashed anyway. |
|
Won't force-push again, sorry about that. The simple The fix restructures the option group to |
|
This looks good, thank you! |
|
The issue was tagged with "security". Should we backport this change to all supported Python versions (3.10-3.14)? |
|
Thanks @joshuaswanson for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10. |
|
Yes. |
|
Thanks @joshuaswanson for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11. |
|
Thanks @joshuaswanson for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12. |
|
Thanks @joshuaswanson for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13. |
|
Thanks @joshuaswanson for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14. |
|
Sorry, @joshuaswanson and @encukou, I could not cleanly backport this to |
|
Sorry, @joshuaswanson and @encukou, I could not cleanly backport this to |
|
Sorry, @joshuaswanson and @encukou, I could not cleanly backport this to |
|
Sorry, @joshuaswanson and @encukou, I could not cleanly backport this to |
…ion parsing (pythonGH-146399) Use negative lookahead in option regex to prevent backtracking, and to avoid changing logic outside the regexes (since people could use the regex directly). (cherry picked from commit 7e0a0be) Co-authored-by: Joshua Swanson <22283299+joshuaswanson@users.noreply.github.com>
|
GH-148287 is a backport of this pull request to the 3.14 branch. |
| parser = configparser.RawConfigParser() | ||
| content = "[section]\n" + "x" + " " * 40000 + "y" + "\n" | ||
| # This should complete almost instantly. Before the fix, | ||
| # it would take over a minute due to catastrophic backtracking. |
There was a problem hiding this comment.
On my system it takes ~8 seconds, I suggest increasing it in backports to a larger number of whitespace, as 8 seconds is short enough that it could be missed.
The
_OPT_TMPLand_OPT_NV_TMPLregexes have quadratic backtracking when a line contains many spaces between non-delimiter characters. The lazy.*?in the option group and the\s*before the delimiter overlap on whitespace, so the engine tries every possible split point.The fix removes
\s*before the delimiter. This is safe because the option name is already stripped via.rstrip()in_handle_option(line 1160), and the value is stripped via.strip()(line 1169).Before:
x+ 40000 spaces +ytakes ~86 secondsAfter: ~0.004 seconds
configparser.RawConfigParser.{OPTCRE,OPTCRE_NV}regexes vulnerable to quadratic backtracking #146333