Skip to content

gh-152100: Move re compiler optimizations to Lib/re/_optimizer.py#152154

Merged
serhiy-storchaka merged 1 commit into
python:mainfrom
serhiy-storchaka:re-optimizer
Jun 25, 2026
Merged

gh-152100: Move re compiler optimizations to Lib/re/_optimizer.py#152154
serhiy-storchaka merged 1 commit into
python:mainfrom
serhiy-storchaka:re-optimizer

Conversation

@serhiy-storchaka

Copy link
Copy Markdown
Member

Move the compile-time optimizations out of _compiler.py into a new Lib/re/_optimizer.py: _optimize_charset, _compile_charset, _simple, _compile_info, the literal/charset prefix helpers, _combine_flags and the related constants. _compiler.py keeps only the bytecode emitter and imports them.

The dependency is now one-directional (_compiler_optimizer_constants/_sre/_parser). There is no behavior change: the compiled bytecode is identical and test_re passes unchanged (one test repointed _generate_overlap_table to its new home).

This is groundwork for follow-up compile-side optimizations under gh-152100 that would otherwise accumulate in _compiler.py.

🤖 Generated with Claude Code

Move the compile-time optimizations (_optimize_charset, _compile_charset,
_simple, _compile_info and the literal/charset prefix helpers) out of
_compiler.py into a new Lib/re/_optimizer.py.  _compiler.py keeps only the
bytecode emitter and imports them.  This is groundwork for a follow-up
optimization; there is no behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@serhiy-storchaka serhiy-storchaka enabled auto-merge (squash) June 25, 2026 07:13
@serhiy-storchaka serhiy-storchaka merged commit a00464b into python:main Jun 25, 2026
94 of 96 checks passed

@eendebakpt eendebakpt left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the PR was just merged. Two minor comments anyway

Comment thread Lib/re/_optimizer.py
from ._constants import *

_CHARSET_ALL = [(NEGATE, None)]
_UNIT_CODES = {LITERAL, NOT_LITERAL, ANY, IN, CATEGORY}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are moving around quite some code anyway, should we change to

Suggested change
_UNIT_CODES = {LITERAL, NOT_LITERAL, ANY, IN, CATEGORY}
_UNIT_CODES = frozenset({LITERAL, NOT_LITERAL, ANY, IN, CATEGORY})

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a set before. frozenset does not have any advantage over set.

Comment thread Lib/re/_optimizer.py

"""Internal support module for sre.

Optimization passes used by the compiler: character-set optimization

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _compile_charset is not listed here (but it is imported from this module).

Since these things can get outdated, I am also fine with leaving out this docstring

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be updated in follow up anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants