bpo-29995: re.escape() now escapes only special characters.#1007
Conversation
|
@serhiy-storchaka, thanks for your PR! By analyzing the history of the files in this pull request, we identified @birkenfeld, @tiran and @ezio-melotti to be potential reviewers. |
bz2
left a comment
There was a problem hiding this comment.
Change looks good to me, as random third party that has written a function like this in the past.
Bunch of teeny inline comments, none of which should block merge.
The automated coverage report failure is bogus, seems to be percentage change from deleting code plus random noise.
| # SPECIAL_CHARS | ||
| # closing ')', '}' and ']' | ||
| # '-' (a range in character set) | ||
| # '#' (comment) and WHITESPACE (ignored) in verbose mode |
There was a problem hiding this comment.
For my own reference, re.VERBOSE does only look at ascii whitespace:
>>> re.compile("a \f\n\r\v\t\u3000", re.DEBUG|re.VERBOSE)
LITERAL 97
LITERAL 12288
re.compile('a \x0c\n\r\x0b\t\u3000', re.VERBOSE|re.DEBUG)
There was a problem hiding this comment.
SPECIAL_CHARS and WHITESPACE are constants in the sre_parse module. WHITESPACE contains only ascii whitespace.
| self.assertMatch(re.escape(p), p) | ||
| for c in '-.]{}': | ||
| self.assertEqual(re.escape(c)[:1], '\\') | ||
| literal_chars = (string.ascii_letters + string.digits + |
There was a problem hiding this comment.
This redefines literal_chars to the same as it was before. Either delete, or split the test, or hoist to a suite level constant?
There was a problem hiding this comment.
Oh, I copied this definition rather than move!
| Library | ||
| ------- | ||
|
|
||
| - bpo-29995: re.escape() now escapes only special characters. |
There was a problem hiding this comment.
To "now escapes only regex special characters" just for clarity?
| for i in b'-.]{}': | ||
| b = bytes([i]) | ||
| self.assertEqual(re.escape(b)[:1], b'\\') | ||
| literal_chars = ((string.ascii_letters + string.digits).encode() + |
There was a problem hiding this comment.
To literal_bytes = self.LITERAL_CHARS.encode() if hoisting above.
There was a problem hiding this comment.
Hmm, this looks a good idea.
| @@ -786,13 +786,17 @@ form. | |||
|
|
|||
| .. function:: escape(string) | |||
There was a problem hiding this comment.
To escape(pattern) to match implementation naming?
| .. function:: escape(string) | ||
|
|
||
| Escape all the characters in pattern except ASCII letters, numbers and ``'_'``. | ||
| Escape special characters in a string. |
There was a problem hiding this comment.
Not sure what the style on naming for functions that take either str or bytes is, but can see argument for "...characters in string or bytes pattern".
There was a problem hiding this comment.
I want to avoid the word "pattern". The argument of escape() is not a pattern, it is an arbitrary string, and escape() makes a pattern from it.
The name "string" is often used for text and byte strings in this module. It is the name of parameters in many functions and the name of the attribute in the match object.
There was a problem hiding this comment.
The normal way to write this would be "escape special characters in pattern", since pattern is the formal argument name in the source. That is, I agree with your previous comment that string should be replaced by pattern. The existing docs even use 'pattern' in the body of the text, so I'm not sure how we ended up with 'string' as the formal argument name in the docs.
There was a problem hiding this comment.
Opened bpo-30045 for renaming the parameter to "string".
There was a problem hiding this comment.
In #1048 the string parameter name was replaced by pattern and it is referred as *pattern* in the description.
|
Mentioning bpo-29995 to create a link to it. |
|
This change is likely implicated in pypa/setuptools#1015. |
|
@serhiy-storchaka Is the intention that tests like these that are asserting a certain result from |
|
I think that it would be better to not depend on exact result of Or test the pattern indirectly. For example, it should match Note that compiling the pattern |
No description provided.