Skip to content

Commit 96e9fb9

Browse files
committed
regular expression posting for the discussion forum
1 parent 701e8f4 commit 96e9fb9

File tree

2 files changed

+46
-2
lines changed

2 files changed

+46
-2
lines changed
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
One topic we won't have time to dive deeply in is regular expressions.
2+
This is a shorthand syntax for pattern matching in strings. In
3+
python, the "re" module provides support for regular expressions.
4+
Here's an example:
5+
6+
7+
import re
8+
strings = [r"<a>this is my string</a>",
9+
r"<b>this is a different string</b>"]
10+
11+
# this is the pattern that we will match -- it has 3 groups
12+
re_test = r"<(\w*)>(.*)</(\w*)>"
13+
14+
for s in strings:
15+
a = re.search(re_test, s)
16+
if not a == None:
17+
if a.group(1) == a.group(3):
18+
# we found a match
19+
print "string in '{}' tags is: {}".format(a.group(1), a.group(2))
20+
21+
22+
This will find XML-like tags, <tag>this is text in the tag</tag> and
23+
extra the text associated with each tag.
24+
25+
If you remove the "*" after the "\w", it will restrict itself to
26+
single-character tags.
27+
28+
29+
This webpage:
30+
31+
http://txt2re.com/
32+
33+
will help you design a regular expression for whatever type of
34+
operation you want to do.
35+
36+
37+
Games are even devised around finding complex regexs:
38+
39+
http://xkcd.com/1313/
40+
41+
(note the hover text there has a regular expression that supposedly
42+
will correctly match all the winners of all US Presidental elections,
43+
but not the losers) -- anyone what to try it?

examples/python-snippets/regex.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
import re
22

33
strings = [r"<a>this is my string</a>",
4-
r"<b>this is a different string</b>"]
4+
r"<b>this is a different string</b>",
5+
r"<tag>multicharacter tag</tag>"]
56

67
# this is the pattern that we will match -- it has 3 groups
7-
re_test = r"<(\w)>(.*)</(\w)>"
8+
re_test = r"<(\w*)>(.*)</(\w*)>"
89

910

1011
for s in strings:

0 commit comments

Comments
 (0)