@@ -10,41 +10,101 @@ This module implements regular expression operations. Regular expression
1010syntax supported is a subset of CPython ``re `` module (and actually is
1111a subset of POSIX extended regular expressions).
1212
13- Supported operators are:
13+ Supported operators and special sequences are:
1414
15- ``'.' ``
15+ ``. ``
1616 Match any character.
1717
18- ``' [...]' ``
18+ ``[...] ``
1919 Match set of characters. Individual characters and ranges are supported,
2020 including negated sets (e.g. ``[^a-c] ``).
2121
22- ``'^' ``
22+ ``^ ``
23+ Match the start of the string.
2324
24- ``'$' ``
25+ ``$ ``
26+ Match the end of the string.
2527
26- ``'?' ``
28+ ``? ``
29+ Match zero or one of the previous sub-pattern.
2730
28- ``'*' ``
31+ ``* ``
32+ Match zero or more of the previous sub-pattern.
2933
30- ``'+' ``
34+ ``+ ``
35+ Match one or more of the previous sub-pattern.
3136
32- ``'??' ``
37+ ``?? ``
38+ Non-greedy version of ``? ``, match zero or one, with the preference
39+ for zero.
3340
34- ``'*?' ``
41+ ``*? ``
42+ Non-greedy version of ``* ``, match zero or more, with the preference
43+ for the shortest match.
3544
36- ``'+?' ``
45+ ``+? ``
46+ Non-greedy version of ``+ ``, match one or more, with the preference
47+ for the shortest match.
3748
38- ``'|' ``
49+ ``| ``
50+ Match either the left-hand side or the right-hand side sub-patterns of
51+ this operator.
3952
40- ``' (...)' ``
53+ ``(...) ``
4154 Grouping. Each group is capturing (a substring it captures can be accessed
4255 with `match.group() ` method).
4356
44- **NOT SUPPORTED **: Counted repetitions (``{m,n} ``), more advanced assertions
45- (``\b ``, ``\B ``), named groups (``(?P<name>...) ``), non-capturing groups
46- (``(?:...) ``), etc.
57+ ``\d ``
58+ Matches digit. Equivalent to ``[0-9] ``.
4759
60+ ``\D ``
61+ Matches non-digit. Equivalent to ``[^0-9] ``.
62+
63+ ``\s ``
64+ Matches whitespace. Equivalent to ``[ \t-\r] ``.
65+
66+ ``\S ``
67+ Matches non-whitespace. Equivalent to ``[^ \t-\r] ``.
68+
69+ ``\w ``
70+ Matches "word characters" (ASCII only). Equivalent to ``[A-Za-z0-9_] ``.
71+
72+ ``\W ``
73+ Matches non "word characters" (ASCII only). Equivalent to ``[^A-Za-z0-9_] ``.
74+
75+ ``\ ``
76+ Escape character. Any other character following the backslash, except
77+ for those listed above, is taken literally. For example, ``\* `` is
78+ equivalent to literal ``* `` (not treated as the ``* `` operator).
79+ Note that ``\r ``, ``\n ``, etc. are not handled specially, and will be
80+ equivalent to literal letters ``r ``, ``n ``, etc. Due to this, it's
81+ not recommended to use raw Python strings (``r"" ``) for regular
82+ expressions. For example, ``r"\r\n" `` when used as the regular
83+ expression is equivalent to ``"rn" ``. To match CR character followed
84+ by LF, use ``"\r\n" ``.
85+
86+ **NOT SUPPORTED **:
87+
88+ * counted repetitions (``{m,n} ``)
89+ * named groups (``(?P<name>...) ``)
90+ * non-capturing groups (``(?:...) ``)
91+ * more advanced assertions (``\b ``, ``\B ``)
92+ * special character escapes like ``\r ``, ``\n `` - use Python's own escaping
93+ instead
94+ * etc.
95+
96+ Example::
97+
98+ import ure
99+
100+ # As ure doesn't support escapes itself, use of r"" strings is not
101+ # recommended.
102+ regex = ure.compile("[\r\n]")
103+
104+ regex.split("line1\rline2\nline3\r\n")
105+
106+ # Result:
107+ # ['line1', 'line2', 'line3', '', '']
48108
49109Functions
50110---------
@@ -64,6 +124,22 @@ Functions
64124 string for first position which matches regex (which still may be
65125 0 if regex is anchored).
66126
127+ .. function :: sub(regex_str, replace, string, count=0, flags=0)
128+
129+ Compile *regex_str * and search for it in *string *, replacing all matches
130+ with *replace *, and returning the new string.
131+
132+ *replace * can be a string or a function. If it is a string then escape
133+ sequences of the form ``\<number> `` and ``\g<number> `` can be used to
134+ expand to the corresponding group (or an empty string for unmatched groups).
135+ If *replace * is a function then it must take a single argument (the match)
136+ and should return a replacement string.
137+
138+ If *count * is specified and non-zero then substitution will stop after
139+ this many substitutions are made. The *flags * argument is ignored.
140+
141+ Note: availability of this function depends on MicroPython port.
142+
67143.. data :: DEBUG
68144
69145 Flag value, display debug information about compiled expression.
@@ -79,8 +155,10 @@ Compiled regular expression. Instances of this class are created using
79155
80156.. method :: regex.match(string)
81157 regex.search(string)
158+ regex.sub(replace, string, count=0, flags=0)
82159
83- Similar to the module-level functions :meth: `match ` and :meth: `search `.
160+ Similar to the module-level functions :meth: `match `, :meth: `search `
161+ and :meth: `sub `.
84162 Using methods is (much) more efficient if the same regex is applied to
85163 multiple strings.
86164
@@ -93,9 +171,31 @@ Compiled regular expression. Instances of this class are created using
93171Match objects
94172-------------
95173
96- Match objects as returned by `match() ` and `search() ` methods.
174+ Match objects as returned by `match() ` and `search() ` methods, and passed
175+ to the replacement function in `sub() `.
97176
98177.. method :: match.group([index])
99178
100179 Return matching (sub)string. *index * is 0 for entire match,
101180 1 and above for each capturing group. Only numeric groups are supported.
181+
182+ .. method :: match.groups()
183+
184+ Return a tuple containing all the substrings of the groups of the match.
185+
186+ Note: availability of this method depends on MicroPython port.
187+
188+ .. method :: match.start([index])
189+ match.end([index])
190+
191+ Return the index in the original string of the start or end of the
192+ substring group that was matched. *index * defaults to the entire
193+ group, otherwise it will select a group.
194+
195+ Note: availability of these methods depends on MicroPython port.
196+
197+ .. method :: match.span([index])
198+
199+ Returns the 2-tuple ``(match.start(index), match.end(index)) ``.
200+
201+ Note: availability of this method depends on MicroPython port.
0 commit comments