Skip to content

Commit 1586ff7

Browse files
committed
runny babbit
1 parent b777385 commit 1586ff7

File tree

10 files changed

+621
-6
lines changed

10 files changed

+621
-6
lines changed

bin/chapters.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ war
3333
anagram
3434
hangman
3535
first_bank_of_change
36+
runny_babbit
3637
markov_chain
3738
hamming_chain
3839
morse

book.md

Lines changed: 277 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6190,7 +6190,278 @@ The `plural` version of each name is made by adding `s` except for `penny`, so l
61906190
Finally lines 39-43 are left to formatting the report to the user, being sure to provide feedback that includes the original `value` ("If you give me ...") and an enumerated list of all the possible ways we could make change. The test suite does not bother to check the order in which you return the combinations, only that the correct number are present and they are in the correct format.
61916191
\newpage
61926192

6193-
# Chapter 36: Markov Chain
6193+
# Chapter 36: Runny Babbit
6194+
6195+
Are you familiar with Spoonerisms where the initial consonant sounds of two words are switched? According to Wikipedia, they get their name from William Archibald Spooner who did this often. The author Shel Silverstein wrote a wonderful book called _Runny Babbit_ ("bunny rabbit") based on this. So, let's write a Python program called `runny_babbit.py` that will read some text or an input file given as a single positional argument and finds neighboring words with initial consonant sounds to swap. As we'll need to look at pairs of words and in such as way that it will make it difficult to remember the original formatting of the text, let's also take a `-w|--width` (default `70`) to format the output text to a maximum width.
6196+
6197+
As usual, the program should show usage with no arguments or for `-h|--help`:
6198+
6199+
````
6200+
$ ./runny_babbit.py
6201+
usage: runny_babbit.py [-h] [-w int] str
6202+
runny_babbit.py: error: the following arguments are required: str
6203+
$ ./runny_babbit.py -h
6204+
usage: runny_babbit.py [-h] [-w int] str
6205+
6206+
Introduce Spoonerisms
6207+
6208+
positional arguments:
6209+
str Input text or file
6210+
6211+
optional arguments:
6212+
-h, --help show this help message and exit
6213+
-w int, --width int Output text width (default: 70)
6214+
````
6215+
6216+
It should handle text from the command line:
6217+
6218+
````
6219+
$ ./runny_babbit.py 'the bunny rabbit'
6220+
the runny babbit
6221+
````
6222+
6223+
Or a named file:
6224+
6225+
````
6226+
$ cat input1.txt
6227+
The bunny rabbit is cute.
6228+
$ ./runny_babbit.py input1.txt
6229+
The runny babbit is cute.
6230+
````
6231+
6232+
We'll use a set of "stop" words to prevent the switching of sounds when one of the words is in the following list:
6233+
6234+
before behind between beyond but by concerning despite down
6235+
during following for from into like near plus since that the
6236+
through throughout to towards which with within without
6237+
6238+
Hints:
6239+
6240+
* You'll need to consider all the words in the input as pairs, like `[(0, 1), (1, 2)]` up to `n` (number of words) etc. How can you create such a list where instead of `0` and `1` you have the actual words, e.g., `[('The', 'bunny'), ('bunny', 'rabbit')]`?
6241+
* There are several exercises where we try to break words into initial consonant sounds and whatever else that follows. Can you reuse code from elsewhere? I'd recommend using regular expressions!
6242+
* Be sure you don't use a word more than once in a swap. E.g., in the phrase "the brown, wooden box", we'd skip "the" and consider the other two pairs of words `('brown', 'wooden')` and `('wooden', 'box')`. If we swap the first pair to make `('wown', 'brooden')`, we would not want to consider the next pair because 'wooden' has already been used.
6243+
* Use the `textwrap` module to handle the formatting of the ouput text to a maximum `--width`
6244+
\newpage
6245+
6246+
## Solution
6247+
6248+
````
6249+
1 #!/usr/bin/env python3
6250+
2 """Spoonerisms"""
6251+
3
6252+
4 import argparse
6253+
5 import os
6254+
6 import re
6255+
7 import string
6256+
8 import textwrap
6257+
9
6258+
10
6259+
11 # --------------------------------------------------
6260+
12 def get_args():
6261+
13 """Get command-line arguments"""
6262+
14
6263+
15 parser = argparse.ArgumentParser(
6264+
16 description='Introduce Spoonerisms',
6265+
17 formatter_class=argparse.ArgumentDefaultsHelpFormatter)
6266+
18
6267+
19 parser.add_argument('text',
6268+
20 metavar='str',
6269+
21 help='Input text or file')
6270+
22
6271+
23 parser.add_argument('-w',
6272+
24 '--width',
6273+
25 help='Output text width',
6274+
26 metavar='int',
6275+
27 type=int,
6276+
28 default=70)
6277+
29
6278+
30 args = parser.parse_args()
6279+
31
6280+
32 if os.path.isfile(args.text):
6281+
33 args.text = open(args.text).read()
6282+
34
6283+
35 return args
6284+
36
6285+
37
6286+
38 # --------------------------------------------------
6287+
39 def main():
6288+
40 """Make a jazz noise here"""
6289+
41
6290+
42 args = get_args()
6291+
43 text = args.text
6292+
44 words = text.split()
6293+
45 pairs = []
6294+
46
6295+
47 for k in range(len(words) - 1):
6296+
48 pairs.append((words[k], words[k+1]))
6297+
49
6298+
50 vowels = 'aeiouAEIOU'
6299+
51 consonants = ''.join([c for c in string.ascii_letters if c not in vowels])
6300+
52 regex = re.compile('^([' + consonants + ']+)([' + vowels + '].*)')
6301+
53 stop = set('before behind between beyond but by concerning'
6302+
54 'despite down during following for from into like near'
6303+
55 'plus since that the through throughout to towards'
6304+
56 'which with within without'.split())
6305+
57 skip = set()
6306+
58
6307+
59 for i, pair in enumerate(pairs):
6308+
60 w1, w2 = pair
6309+
61 if set([w1.lower(), w2.lower()]).intersection(stop):
6310+
62 continue
6311+
63
6312+
64 i1, i2 = i, i + 1
6313+
65 if i1 in skip or i2 in skip:
6314+
66 continue
6315+
67
6316+
68 m1 = regex.search(w1)
6317+
69 m2 = regex.search(w2)
6318+
70 if m1 and m2:
6319+
71 prefix1, suffix1 = m1.groups()
6320+
72 prefix2, suffix2 = m2.groups()
6321+
73 words[i1] = prefix2 + suffix1
6322+
74 words[i2] = prefix1 + suffix2
6323+
75 skip.add(i1)
6324+
76 skip.add(i2)
6325+
77
6326+
78 print('\n'.join(textwrap.wrap(' '.join(words), width=args.width)))
6327+
79
6328+
80 # --------------------------------------------------
6329+
81 if __name__ == '__main__':
6330+
82 main()
6331+
````
6332+
6333+
\newpage
6334+
6335+
## Discussion
6336+
6337+
![Also definitely not copyright infringement.](images/runny_babbit.png)
6338+
6339+
For this exercise, I thought I might move the logic to read an optionally named input *file* into the `get_args` function so that by the time I call `args = get_args()` the `args.text` really is just whatever "text" I need to consider, regardless if the source was the command line or a file. If I'm using `input1.txt`, then I essentially have this:
6340+
6341+
````
6342+
>>> text = open('input1.txt').read()
6343+
>>> text
6344+
'The bunny rabbit is cute.\n'
6345+
````
6346+
6347+
I need all the pairs of words, so that means I first need all the "words" which I'll get by naively using `str.split` (that is, I won't worry about punctation and such):
6348+
6349+
````
6350+
>>> words = text.split()
6351+
>>> words
6352+
['The', 'bunny', 'rabbit', 'is', 'cute.']
6353+
````
6354+
6355+
Now I need all *pairs* of words which I can get by going from the zeroth word to the second to last word:
6356+
6357+
````
6358+
>>> pairs = []
6359+
>>> for k in range(len(words) - 1):
6360+
... pairs.append((words[k], words[k+1]))
6361+
...
6362+
>>> pairs
6363+
[('The', 'bunny'), ('bunny', 'rabbit'), ('rabbit', 'is'), ('is', 'cute.')
6364+
````
6365+
6366+
I need to find all the pairs where both words start with some consonant sounds and where neither of them is in my stop list, which I'll create like so:
6367+
6368+
````
6369+
>>> stop = set('before behind between beyond but by concerning'
6370+
... 'despite down during following for from into like near'
6371+
... 'plus since that the through throughout to towards'
6372+
... 'which with within without'.split())
6373+
````
6374+
6375+
How will I find words that start with consonants? I can easily list all the vowels:
6376+
6377+
````
6378+
>>> vowels = 'aeiouAEIOU'
6379+
````
6380+
6381+
And then create the complement from `string.ascii_lowercase`:
6382+
6383+
````
6384+
>>> import string
6385+
>>> consonants = ''.join([c for c in string.ascii_letters if c not in vowels])
6386+
>>> consonants
6387+
'bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ'
6388+
````
6389+
6390+
And then build a regular expression that looks for the start of a string `^` followed by a character class of all the `consonants` followed by the character class of `vowels` maybe followed by something else. I'll use parentheses `()` to capture both parts:
6391+
6392+
````
6393+
>>> import re
6394+
>>> regex = re.compile('^([' + consonants + ']+)([' + vowels + '].*)')
6395+
>>> regex.search('chair')
6396+
<re.Match object; span=(0, 5), match='chair'>
6397+
>>> regex.search('chair').groups()
6398+
('ch', 'air')
6399+
````
6400+
6401+
Now I can iterate over the `pairs`. First I check if the either of the words is in the `stop` set by using the `set.intersection` function. For the first pair `('The', 'bunny')` we see there is an intersection:
6402+
6403+
````
6404+
>>> w1 = 'The'
6405+
>>> w2 = 'bunny'
6406+
>>> set([w1.lower(), w2.lower()]).intersection(stop)
6407+
{'the'}
6408+
````
6409+
6410+
For the next pair, there is not:
6411+
6412+
````
6413+
>>> w1 = 'bunny'
6414+
>>> w2 = 'rabbit'
6415+
>>> set([w1.lower(), w2.lower()]).intersection(stop)
6416+
set()
6417+
````
6418+
6419+
The next check in my code is whether I've previously determined that I need to skip these words, so I have to know their positions in the original list. I decided to use `enumerate` over the `words` to get the number of the pair which will equal the position of the first word of each tuple in the original list of `words`.
6420+
6421+
Next I need to see if *both* words match my regular expression:
6422+
6423+
````
6424+
>>> m1 = regex.search(w1)
6425+
>>> m2 = regex.search(w2)
6426+
>>> m1
6427+
<re.Match object; span=(0, 5), match='bunny'>
6428+
>>> m2
6429+
<re.Match object; span=(0, 6), match='rabbit'>
6430+
````
6431+
6432+
They do! So I can use their `groups` to get the parts of each word to swap:
6433+
6434+
````
6435+
>>> m1.groups()
6436+
('b', 'unny')
6437+
>>> m2.groups()
6438+
('r', 'abbit')
6439+
>>> prefix1, suffix1 = m1.groups()
6440+
>>> prefix2, suffix2 = m2.groups()
6441+
````
6442+
6443+
This is the 2nd pair, so `i` would be equal to `1` in the actual code. I can use this to go mutate the `words` at positions `i` and `i + 1`:
6444+
6445+
````
6446+
>>> i = 1
6447+
>>> words[i] = prefix2 + suffix1
6448+
>>> words[i + 1] = prefix1 + suffix2
6449+
>>> words
6450+
['The', 'runny', 'babbit', 'is', 'cute.']
6451+
````
6452+
6453+
I need to be sure to add those positions to the `skip` set I created for the check that I discussed just above.
6454+
6455+
Finally we need to `print` the `words` back out, joining them on a blank and using `textwrap.wrap` with the `--width` argument to make it pretty:
6456+
6457+
````
6458+
>>> import textwrap
6459+
>>> print('\n'.join(textwrap.wrap(' '.join(words), width=70)))
6460+
The runny babbit is cute.
6461+
````
6462+
\newpage
6463+
6464+
# Chapter 37: Markov Chain
61946465

61956466
Write a Python program called `markov.py` that takes one or more text files as positional arguments for training. Use the `-n|--num_words` argument (default `2`) to find clusters of words and the words that follow them, e.g., in "The Bustle" by Emily Dickinson:
61966467

@@ -6451,7 +6722,7 @@ But there will be spaces in between each word, so I account for them by adding o
64516722
At this point, the `words` list needs to be turned into text. It would be ugly to just `print` out one long string, so I use the `textwrap.wrap` to break up the long string into lines that are no longer than the given `text_width`. That function returns a list of lines that need to be joined on newlines to print.
64526723
\newpage
64536724

6454-
# Chapter 37: Hamming Chain
6725+
# Chapter 38: Hamming Chain
64556726

64566727
Write a Python program called `chain.py` that takes a `-s|--start` word and searches a `-w|--wordlist` argument (default `/usr/local/share/dict`) for words no more than `-d|--max_distance` Hamming distance for some number of `-i|--iteration` (default `20`). Be sure to accept a `-S|--seed` for `random.seed`.
64576728

@@ -6654,7 +6925,7 @@ Failed to find more words!
66546925

66556926
\newpage
66566927

6657-
# Chapter 38: Morse Encoder/Decoder
6928+
# Chapter 39: Morse Encoder/Decoder
66586929

66596930
Write a Python program called `morse.py` that will encrypt/decrypt text to/from Morse code. The program should expect a single positional argument which is either the name of a file to read for the input or the character `-` to indicate reading from STDIN. The program should also take a `-c|--coding` option to indicate use of the `itu` or standard `morse` tables, `-o|--outfile` for writing the output (default STDOUT), and a `-d|--decode` flag to indicate that the action is to decode the input (the default is to encode it).
66606931

@@ -6872,7 +7143,7 @@ THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
68727143

68737144
\newpage
68747145

6875-
# Chapter 39: ROT13 (Rotate 13)
7146+
# Chapter 40: ROT13 (Rotate 13)
68767147

68777148
Write a Python program called `rot13.py` that will encrypt/decrypt input text by shifting the text by a given `-s|--shift` argument or will move each character halfway through the alphabet, e.g., "a" becomes "n," "b" becomes "o," etc. The text to rotate should be provided as a single positional argument to your program and can either be a text file, text on the command line, or `-` to indicate STDIN so that you can round-trip data through your program to ensure you are encrypting and decrypting properly.
68787149

@@ -7037,7 +7308,7 @@ The quick brown fox jumps over the lazy dog.
70377308

70387309
\newpage
70397310

7040-
# Chapter 40: Tranpose ABC Notation
7311+
# Chapter 41: Tranpose ABC Notation
70417312

70427313
Write a Python program called `transpose.py` that will read a file in ABC notation (https://en.wikipedia.org/wiki/ABC_notation) and transpose the melody line up or down by a given `-s|--shift` argument. Like the `rot13` exercise, it might be helpful to think of the space of notes (`ABCDEFG`) as a list which you can roll through. For instance, if you have the note `c` and want to transpose up a (minor) third (`-s 3`), you would make the new note `e`; similarly if you have the note `F` and you go up a (major) third, you get `A`. You will not need to worry about the actual number of semitones that you are being asked to shift, as the previous example showed that we might be shifting by a major/minor/augmented/diminished/pure interval. The purpose of the exercise is simply to practice with lists.
70437314

@@ -7233,7 +7504,7 @@ aba agE | g2g gab | cba agE |1 gED DEg :|2 gED DBG |]
72337504

72347505
\newpage
72357506

7236-
# Chapter 41: Word Search
7507+
# Chapter 42: Word Search
72377508

72387509
Write a Python program called `search.py` that takes a file name as the single positional argument and finds the words hidden in the puzzle grid.
72397510

images/runny_babbit.png

13.5 KB
Loading

playful_python.pdf

25.5 KB
Binary file not shown.

runny_babbit/Makefile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.PHONY: test
2+
3+
test:
4+
pytest -xv test.py

runny_babbit/README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Runny Babbit
2+
3+
Are you familiar with Spoonerisms where the initial consonant sounds of two words are switched? According to Wikipedia, they get their name from William Archibald Spooner who did this often. The author Shel Silverstein wrote a wonderful book called _Runny Babbit_ ("bunny rabbit") based on this. So, let's write a Python program called `runny_babbit.py` that will read some text or an input file given as a single positional argument and finds neighboring words with initial consonant sounds to swap. As we'll need to look at pairs of words and in such as way that it will make it difficult to remember the original formatting of the text, let's also take a `-w|--width` (default `70`) to format the output text to a maximum width.
4+
5+
As usual, the program should show usage with no arguments or for `-h|--help`:
6+
7+
````
8+
$ ./runny_babbit.py
9+
usage: runny_babbit.py [-h] [-w int] str
10+
runny_babbit.py: error: the following arguments are required: str
11+
$ ./runny_babbit.py -h
12+
usage: runny_babbit.py [-h] [-w int] str
13+
14+
Introduce Spoonerisms
15+
16+
positional arguments:
17+
str Input text or file
18+
19+
optional arguments:
20+
-h, --help show this help message and exit
21+
-w int, --width int Output text width (default: 70)
22+
````
23+
24+
It should handle text from the command line:
25+
26+
````
27+
$ ./runny_babbit.py 'the bunny rabbit'
28+
the runny babbit
29+
````
30+
31+
Or a named file:
32+
33+
````
34+
$ cat input1.txt
35+
The bunny rabbit is cute.
36+
$ ./runny_babbit.py input1.txt
37+
The runny babbit is cute.
38+
````
39+
40+
We'll use a set of "stop" words to prevent the switching of sounds when one of the words is in the following list:
41+
42+
before behind between beyond but by concerning despite down
43+
during following for from into like near plus since that the
44+
through throughout to towards which with within without
45+
46+
Hints:
47+
48+
* You'll need to consider all the words in the input as pairs, like `[(0, 1), (1, 2)]` up to `n` (number of words) etc. How can you create such a list where instead of `0` and `1` you have the actual words, e.g., `[('The', 'bunny'), ('bunny', 'rabbit')]`?
49+
* There are several exercises where we try to break words into initial consonant sounds and whatever else that follows. Can you reuse code from elsewhere? I'd recommend using regular expressions!
50+
* Be sure you don't use a word more than once in a swap. E.g., in the phrase "the brown, wooden box", we'd skip "the" and consider the other two pairs of words `('brown', 'wooden')` and `('wooden', 'box')`. If we swap the first pair to make `('wown', 'brooden')`, we would not want to consider the next pair because 'wooden' has already been used.
51+
* Use the `textwrap` module to handle the formatting of the ouput text to a maximum `--width`

0 commit comments

Comments
 (0)