Skip to content

gh-146333: Fix quadratic regex backtracking in configparser option parsing#146399

Merged
encukou merged 2 commits intopython:mainfrom
joshuaswanson:fix/configparser-regex-backtracking
Apr 7, 2026
Merged

gh-146333: Fix quadratic regex backtracking in configparser option parsing#146399
encukou merged 2 commits intopython:mainfrom
joshuaswanson:fix/configparser-regex-backtracking

Conversation

@joshuaswanson
Copy link
Copy Markdown
Contributor

@joshuaswanson joshuaswanson commented Mar 25, 2026

The _OPT_TMPL and _OPT_NV_TMPL regexes have quadratic backtracking when a line contains many spaces between non-delimiter characters. The lazy .*? in the option group and the \s* before the delimiter overlap on whitespace, so the engine tries every possible split point.

The fix removes \s* before the delimiter. This is safe because the option name is already stripped via .rstrip() in _handle_option (line 1160), and the value is stripped via .strip() (line 1169).

Before: x + 40000 spaces + y takes ~86 seconds
After: ~0.004 seconds

@joshuaswanson joshuaswanson requested a review from jaraco as a code owner March 25, 2026 00:09
@python-cla-bot
Copy link
Copy Markdown

python-cla-bot bot commented Mar 25, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

# Compiled regular expression for matching sections
SECTCRE = re.compile(_SECT_TMPL, re.VERBOSE)
# Compiled regular expression for matching options with typical separators
OPTCRE = re.compile(_OPT_TMPL.format(delim="=|:"), re.VERBOSE)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is safe because the option name is already stripped via .rstrip() in _handle_option (line 1160), and the value is stripped via .strip() (line 1169).

The regexes are publicly exposed, this breaks it for people who use them directly.

@joshuaswanson joshuaswanson force-pushed the fix/configparser-regex-backtracking branch from fe7efda to 85407ee Compare March 25, 2026 12:06
@joshuaswanson
Copy link
Copy Markdown
Contributor Author

Good point, thanks. Updated to keep the regexes unchanged. Instead, _handle_option now checks for delimiter presence before matching. If no delimiter is found, it skips the regex entirely and either treats the line as a valueless option (when allow_no_value=True) or reports a parsing error. All 341 existing tests pass.

@encukou
Copy link
Copy Markdown
Member

encukou commented Mar 25, 2026

Overriding OPTCRE is mentioned (though discouraged) in the docs. IMO, that needs to remain usable for specifying different delimiters. We shouldn't skip it.

Would it work to add negative lookahead, (?!{delim}), to <option>?


@joshuaswanson, please don't force-push to CPython PR branches -- it makes the changes a little harder to follow for reviewers, and every PR gets squashed anyway.

@joshuaswanson
Copy link
Copy Markdown
Contributor Author

Won't force-push again, sorry about that.

The simple (?!{delim}) on .*? didn't eliminate the backtracking on its own because whitespace characters aren't delimiters, so .*? and \s* still overlapped on spaces. Took a bit more work to get it right.

The fix restructures the option group to (?:(?!{delim})\S)*(?:\s+(?:(?!{delim})\S)+)* which matches words separated by whitespace, where each word is non-delimiter non-space characters. Option can never have trailing whitespace, so there's no overlap with \s*. Captured groups are identical and all 341 existing tests pass.

@encukou
Copy link
Copy Markdown
Member

encukou commented Apr 7, 2026

This looks good, thank you!

@encukou encukou merged commit 7e0a0be into python:main Apr 7, 2026
51 checks passed
@vstinner
Copy link
Copy Markdown
Member

vstinner commented Apr 7, 2026

The issue was tagged with "security". Should we backport this change to all supported Python versions (3.10-3.14)?

@encukou encukou added needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Apr 9, 2026
@miss-islington-app
Copy link
Copy Markdown

Thanks @joshuaswanson for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

@encukou
Copy link
Copy Markdown
Member

encukou commented Apr 9, 2026

Yes.

@miss-islington-app
Copy link
Copy Markdown

Thanks @joshuaswanson for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington-app
Copy link
Copy Markdown

Thanks @joshuaswanson for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

@miss-islington-app
Copy link
Copy Markdown

Thanks @joshuaswanson for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link
Copy Markdown

Thanks @joshuaswanson for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

@miss-islington-app
Copy link
Copy Markdown

Sorry, @joshuaswanson and @encukou, I could not cleanly backport this to 3.10 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 7e0a0be4097f9d29d66fe23f5af86f18a34ed7dd 3.10

@miss-islington-app
Copy link
Copy Markdown

Sorry, @joshuaswanson and @encukou, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 7e0a0be4097f9d29d66fe23f5af86f18a34ed7dd 3.11

@miss-islington-app
Copy link
Copy Markdown

Sorry, @joshuaswanson and @encukou, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 7e0a0be4097f9d29d66fe23f5af86f18a34ed7dd 3.12

@miss-islington-app
Copy link
Copy Markdown

Sorry, @joshuaswanson and @encukou, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 7e0a0be4097f9d29d66fe23f5af86f18a34ed7dd 3.13

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Apr 9, 2026
…ion parsing (pythonGH-146399)

Use negative lookahead in option regex to prevent backtracking, and to avoid changing logic outside the regexes (since people could use the regex directly).
(cherry picked from commit 7e0a0be)

Co-authored-by: Joshua Swanson <22283299+joshuaswanson@users.noreply.github.com>
@bedevere-app
Copy link
Copy Markdown

bedevere-app bot commented Apr 9, 2026

GH-148287 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Apr 9, 2026
parser = configparser.RawConfigParser()
content = "[section]\n" + "x" + " " * 40000 + "y" + "\n"
# This should complete almost instantly. Before the fix,
# it would take over a minute due to catastrophic backtracking.
Copy link
Copy Markdown
Member

@StanFromIreland StanFromIreland Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my system it takes ~8 seconds, I suggest increasing it in backports to a larger number of whitespace, as 8 seconds is short enough that it could be missed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants