gh-113304: Add pos/endpos parameters to re module functions#113306
Open
adamsilkey wants to merge 11 commits intopython:mainfrom
Open
gh-113304: Add pos/endpos parameters to re module functions#113306adamsilkey wants to merge 11 commits intopython:mainfrom
adamsilkey wants to merge 11 commits intopython:mainfrom
Conversation
The current docs for Pattern.search/match/matchall/finditer/findall
imply that `pos`/`endpos` are positional arguments only. But, in
fact, they support keyword assignment, as seen:
>>> import re
>>> pattern = re.compile('abc')
>>> pattern.search('012abc678', pos=3)
<re.Match object; span=(3, 6), match='abc'>
>>> pattern.search('012abc678', endpos=6)
<re.Match object; span=(3, 6), match='abc'>
>>> pattern.search('012abc678', pos=3, endpos=6)
<re.Match object; span=(3, 6), match='abc'>
The interactive help also shows this:
>>> help(pattern.search)
Help on built-in function search:
search(string, pos=0, endpos=9223372036854775807) method of re.Pattern
instance
Scan through string looking for a match, and return a corresponding
match object instance.
Return None if no position in the string matches.
(END)
This commit updates the signatures of the affected methods in the doc
to reflect.
Add special characters section to docs to enable finding via the table of contents and make discoverability easier.
This commit adds the `pos` and `endpos` parameters to the following
top-level `re` module functions:
- `re.match()`
- `re.fullmatch()`
- `re.search()`
- `re.findall()`
- `re.finditer()`
Prior to this commit, the `pos` and `endpos` parameters were only
available to users by first compiling a pattern using `re.compile`.
Adding these optional arguments standardizes the behavior between
the two and prevents users from being forced to compile if they wish
to use the `pos`/`endpos` arguments.
Rationale:
There are a number of methods in the Python Regex Pattern class
that support optional positional arguments (pos/endpos):
- `Pattern.match(string[, pos[, endpos]])`
- `Pattern.fullmatch(string[, pos[, endpos]])`
- `Pattern.search(string[, pos[, endpos]])`
- `Pattern.findall(string[, pos[, endpos]])`
- `Pattern.finditer(string[, pos[, endpos]])`
Additionally, Python provides access to these pattern methods as
top-level convenience functions in the module itself:
- `re.search()`
- `re.match()`
- `re.fullmatch()`
- `re.findall()`
- `re.finditer()`
However, these top-level convenience functions do not support the
optional arguments. If anyone wants to utilize the optional arguments,
they must first compile a pattern with `re.compile()` and then call
the method with the optional arguments.
But all the top-level convenience functions do is compile the pattern,
and then execute the pattern, as seen in the commit diff.
Looking at the underlying C Code for these methods, the method defines
`pos` and `endpos` as `0` and `PY_SSIZE_T_MAX` respectively. It only
changes the values if the arg parser detects the presence of either
`pos` or `endpos`.
Here is an example from the match function:
```c
static PyObject *
_sre_SRE_Pattern_match(PatternObject *self, PyTypeObject *cls, PyObject
*const *args, Py_ssize_t nargs, PyObject *kwnames)
{
(...)
Py_ssize_t pos = 0;
Py_ssize_t endpos = PY_SSIZE_T_MAX;
(...)
pos = ival;
(...)
endpos = ival;
(...)
return_value = _sre_SRE_Pattern_match_impl(self, cls, string, pos, endpos);
```
- Add new header section describing the string indexing arguments - Update function signatures to reflect changes
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This commit adds the
posandendposparameters to the followingtop-level
remodule functions:re.match()re.fullmatch()re.search()re.findall()re.finditer()Prior to this commit, the
posandendposparameters were onlyavailable to users by first compiling a pattern using
re.compile.Adding these optional arguments standardizes the behavior between
the two and prevents users from being forced to compile if they wish
to use the
pos/endposarguments.Additionally, this commit:
pos/endposRationale
There are a number of methods in the Python Regex Pattern class
that support optional string indexing parameters (pos/endpos):
Pattern.match(string[, pos[, endpos]])Pattern.fullmatch(string[, pos[, endpos]])Pattern.search(string[, pos[, endpos]])Pattern.findall(string[, pos[, endpos]])Pattern.finditer(string[, pos[, endpos]])Additionally, Python provides access to these Pattern methods as
top-level convenience functions in the module itself:
re.match()re.fullmatch()re.search()re.findall()re.finditer()However, these top-level convenience functions do not support the
optional arguments. If anyone wants to utilize the optional parameters,
they must first compile a pattern with
re.compile()and then callthe method with the optional arguments.
But all the top-level convenience functions do is compile the pattern,
and then execute the pattern, as seen here:
Looking at the underlying C Code for these methods, the method defines
posandendposas0andPY_SSIZE_T_MAXrespectively. It onlychanges the values if the arg parser detects the presence of either
posorendpos.Here is an example from the match function, indentation adjusted
for readability:
This commit adds
pos=0andendpos=sys.maxsizeto match theinternal behavior of the underlying C code.
Additional Documentation Updates
Add Special Characters section to re docs
Add special characters section to docs to enable finding via the
table of contents and make discoverability easier.
Update Pattern method signatures to reflect actual behavior
The current docs for Pattern.search/match/matchall/finditer/findall
imply that
pos/endposare positional arguments only. But, infact, they support keyword assignment, as seen:
The interactive help also shows this:
📚 Documentation preview 📚: https://cpython-previews--113306.org.readthedocs.build/