You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: book.md
+277-6Lines changed: 277 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6190,7 +6190,278 @@ The `plural` version of each name is made by adding `s` except for `penny`, so l
6190
6190
Finally lines 39-43 are left to formatting the report to the user, being sure to provide feedback that includes the original `value` ("If you give me ...") and an enumerated list of all the possible ways we could make change. The test suite does not bother to check the order in which you return the combinations, only that the correct number are present and they are in the correct format.
6191
6191
\newpage
6192
6192
6193
-
# Chapter 36: Markov Chain
6193
+
# Chapter 36: Runny Babbit
6194
+
6195
+
Are you familiar with Spoonerisms where the initial consonant sounds of two words are switched? According to Wikipedia, they get their name from William Archibald Spooner who did this often. The author Shel Silverstein wrote a wonderful book called _Runny Babbit_ ("bunny rabbit") based on this. So, let's write a Python program called `runny_babbit.py` that will read some text or an input file given as a single positional argument and finds neighboring words with initial consonant sounds to swap. As we'll need to look at pairs of words and in such as way that it will make it difficult to remember the original formatting of the text, let's also take a `-w|--width` (default `70`) to format the output text to a maximum width.
6196
+
6197
+
As usual, the program should show usage with no arguments or for `-h|--help`:
6198
+
6199
+
````
6200
+
$ ./runny_babbit.py
6201
+
usage: runny_babbit.py [-h] [-w int] str
6202
+
runny_babbit.py: error: the following arguments are required: str
6203
+
$ ./runny_babbit.py -h
6204
+
usage: runny_babbit.py [-h] [-w int] str
6205
+
6206
+
Introduce Spoonerisms
6207
+
6208
+
positional arguments:
6209
+
str Input text or file
6210
+
6211
+
optional arguments:
6212
+
-h, --help show this help message and exit
6213
+
-w int, --width int Output text width (default: 70)
6214
+
````
6215
+
6216
+
It should handle text from the command line:
6217
+
6218
+
````
6219
+
$ ./runny_babbit.py 'the bunny rabbit'
6220
+
the runny babbit
6221
+
````
6222
+
6223
+
Or a named file:
6224
+
6225
+
````
6226
+
$ cat input1.txt
6227
+
The bunny rabbit is cute.
6228
+
$ ./runny_babbit.py input1.txt
6229
+
The runny babbit is cute.
6230
+
````
6231
+
6232
+
We'll use a set of "stop" words to prevent the switching of sounds when one of the words is in the following list:
6233
+
6234
+
before behind between beyond but by concerning despite down
6235
+
during following for from into like near plus since that the
6236
+
through throughout to towards which with within without
6237
+
6238
+
Hints:
6239
+
6240
+
* You'll need to consider all the words in the input as pairs, like `[(0, 1), (1, 2)]` up to `n` (number of words) etc. How can you create such a list where instead of `0` and `1` you have the actual words, e.g., `[('The', 'bunny'), ('bunny', 'rabbit')]`?
6241
+
* There are several exercises where we try to break words into initial consonant sounds and whatever else that follows. Can you reuse code from elsewhere? I'd recommend using regular expressions!
6242
+
* Be sure you don't use a word more than once in a swap. E.g., in the phrase "the brown, wooden box", we'd skip "the" and consider the other two pairs of words `('brown', 'wooden')` and `('wooden', 'box')`. If we swap the first pair to make `('wown', 'brooden')`, we would not want to consider the next pair because 'wooden' has already been used.
6243
+
* Use the `textwrap` module to handle the formatting of the ouput text to a maximum `--width`

6338
+
6339
+
For this exercise, I thought I might move the logic to read an optionally named input *file* into the `get_args` function so that by the time I call `args = get_args()` the `args.text` really is just whatever "text" I need to consider, regardless if the source was the command line or a file. If I'm using `input1.txt`, then I essentially have this:
6340
+
6341
+
````
6342
+
>>> text = open('input1.txt').read()
6343
+
>>> text
6344
+
'The bunny rabbit is cute.\n'
6345
+
````
6346
+
6347
+
I need all the pairs of words, so that means I first need all the "words" which I'll get by naively using `str.split` (that is, I won't worry about punctation and such):
6348
+
6349
+
````
6350
+
>>> words = text.split()
6351
+
>>> words
6352
+
['The', 'bunny', 'rabbit', 'is', 'cute.']
6353
+
````
6354
+
6355
+
Now I need all *pairs* of words which I can get by going from the zeroth word to the second to last word:
I need to find all the pairs where both words start with some consonant sounds and where neither of them is in my stop list, which I'll create like so:
6367
+
6368
+
````
6369
+
>>> stop = set('before behind between beyond but by concerning'
6370
+
... 'despite down during following for from into like near'
6371
+
... 'plus since that the through throughout to towards'
6372
+
... 'which with within without'.split())
6373
+
````
6374
+
6375
+
How will I find words that start with consonants? I can easily list all the vowels:
6376
+
6377
+
````
6378
+
>>> vowels = 'aeiouAEIOU'
6379
+
````
6380
+
6381
+
And then create the complement from `string.ascii_lowercase`:
6382
+
6383
+
````
6384
+
>>> import string
6385
+
>>> consonants = ''.join([c for c in string.ascii_letters if c not in vowels])
6386
+
>>> consonants
6387
+
'bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ'
6388
+
````
6389
+
6390
+
And then build a regular expression that looks for the start of a string `^` followed by a character class of all the `consonants` followed by the character class of `vowels` maybe followed by something else. I'll use parentheses `()` to capture both parts:
Now I can iterate over the `pairs`. First I check if the either of the words is in the `stop` set by using the `set.intersection` function. For the first pair `('The', 'bunny')` we see there is an intersection:
The next check in my code is whether I've previously determined that I need to skip these words, so I have to know their positions in the original list. I decided to use `enumerate` over the `words` to get the number of the pair which will equal the position of the first word of each tuple in the original list of `words`.
6420
+
6421
+
Next I need to see if *both* words match my regular expression:
6422
+
6423
+
````
6424
+
>>> m1 = regex.search(w1)
6425
+
>>> m2 = regex.search(w2)
6426
+
>>> m1
6427
+
<re.Match object; span=(0, 5), match='bunny'>
6428
+
>>> m2
6429
+
<re.Match object; span=(0, 6), match='rabbit'>
6430
+
````
6431
+
6432
+
They do! So I can use their `groups` to get the parts of each word to swap:
6433
+
6434
+
````
6435
+
>>> m1.groups()
6436
+
('b', 'unny')
6437
+
>>> m2.groups()
6438
+
('r', 'abbit')
6439
+
>>> prefix1, suffix1 = m1.groups()
6440
+
>>> prefix2, suffix2 = m2.groups()
6441
+
````
6442
+
6443
+
This is the 2nd pair, so `i` would be equal to `1` in the actual code. I can use this to go mutate the `words` at positions `i` and `i + 1`:
6444
+
6445
+
````
6446
+
>>> i = 1
6447
+
>>> words[i] = prefix2 + suffix1
6448
+
>>> words[i + 1] = prefix1 + suffix2
6449
+
>>> words
6450
+
['The', 'runny', 'babbit', 'is', 'cute.']
6451
+
````
6452
+
6453
+
I need to be sure to add those positions to the `skip` set I created for the check that I discussed just above.
6454
+
6455
+
Finally we need to `print` the `words` back out, joining them on a blank and using `textwrap.wrap` with the `--width` argument to make it pretty:
Write a Python program called `markov.py` that takes one or more text files as positional arguments for training. Use the `-n|--num_words` argument (default `2`) to find clusters of words and the words that follow them, e.g., in "The Bustle" by Emily Dickinson:
6196
6467
@@ -6451,7 +6722,7 @@ But there will be spaces in between each word, so I account for them by adding o
6451
6722
At this point, the `words` list needs to be turned into text. It would be ugly to just `print` out one long string, so I use the `textwrap.wrap` to break up the long string into lines that are no longer than the given `text_width`. That function returns a list of lines that need to be joined on newlines to print.
6452
6723
\newpage
6453
6724
6454
-
# Chapter 37: Hamming Chain
6725
+
# Chapter 38: Hamming Chain
6455
6726
6456
6727
Write a Python program called `chain.py` that takes a `-s|--start` word and searches a `-w|--wordlist` argument (default `/usr/local/share/dict`) for words no more than `-d|--max_distance` Hamming distance for some number of `-i|--iteration` (default `20`). Be sure to accept a `-S|--seed` for `random.seed`.
6457
6728
@@ -6654,7 +6925,7 @@ Failed to find more words!
6654
6925
6655
6926
\newpage
6656
6927
6657
-
# Chapter 38: Morse Encoder/Decoder
6928
+
# Chapter 39: Morse Encoder/Decoder
6658
6929
6659
6930
Write a Python program called `morse.py` that will encrypt/decrypt text to/from Morse code. The program should expect a single positional argument which is either the name of a file to read for the input or the character `-` to indicate reading from STDIN. The program should also take a `-c|--coding` option to indicate use of the `itu` or standard `morse` tables, `-o|--outfile` for writing the output (default STDOUT), and a `-d|--decode` flag to indicate that the action is to decode the input (the default is to encode it).
6660
6931
@@ -6872,7 +7143,7 @@ THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
6872
7143
6873
7144
\newpage
6874
7145
6875
-
# Chapter 39: ROT13 (Rotate 13)
7146
+
# Chapter 40: ROT13 (Rotate 13)
6876
7147
6877
7148
Write a Python program called `rot13.py` that will encrypt/decrypt input text by shifting the text by a given `-s|--shift` argument or will move each character halfway through the alphabet, e.g., "a" becomes "n," "b" becomes "o," etc. The text to rotate should be provided as a single positional argument to your program and can either be a text file, text on the command line, or `-` to indicate STDIN so that you can round-trip data through your program to ensure you are encrypting and decrypting properly.
6878
7149
@@ -7037,7 +7308,7 @@ The quick brown fox jumps over the lazy dog.
7037
7308
7038
7309
\newpage
7039
7310
7040
-
# Chapter 40: Tranpose ABC Notation
7311
+
# Chapter 41: Tranpose ABC Notation
7041
7312
7042
7313
Write a Python program called `transpose.py` that will read a file in ABC notation (https://en.wikipedia.org/wiki/ABC_notation) and transpose the melody line up or down by a given `-s|--shift` argument. Like the `rot13` exercise, it might be helpful to think of the space of notes (`ABCDEFG`) as a list which you can roll through. For instance, if you have the note `c` and want to transpose up a (minor) third (`-s 3`), you would make the new note `e`; similarly if you have the note `F` and you go up a (major) third, you get `A`. You will not need to worry about the actual number of semitones that you are being asked to shift, as the previous example showed that we might be shifting by a major/minor/augmented/diminished/pure interval. The purpose of the exercise is simply to practice with lists.
7043
7314
@@ -7233,7 +7504,7 @@ aba agE | g2g gab | cba agE |1 gED DEg :|2 gED DBG |]
7233
7504
7234
7505
\newpage
7235
7506
7236
-
# Chapter 41: Word Search
7507
+
# Chapter 42: Word Search
7237
7508
7238
7509
Write a Python program called `search.py` that takes a file name as the single positional argument and finds the words hidden in the puzzle grid.
Are you familiar with Spoonerisms where the initial consonant sounds of two words are switched? According to Wikipedia, they get their name from William Archibald Spooner who did this often. The author Shel Silverstein wrote a wonderful book called _Runny Babbit_ ("bunny rabbit") based on this. So, let's write a Python program called `runny_babbit.py` that will read some text or an input file given as a single positional argument and finds neighboring words with initial consonant sounds to swap. As we'll need to look at pairs of words and in such as way that it will make it difficult to remember the original formatting of the text, let's also take a `-w|--width` (default `70`) to format the output text to a maximum width.
4
+
5
+
As usual, the program should show usage with no arguments or for `-h|--help`:
6
+
7
+
````
8
+
$ ./runny_babbit.py
9
+
usage: runny_babbit.py [-h] [-w int] str
10
+
runny_babbit.py: error: the following arguments are required: str
11
+
$ ./runny_babbit.py -h
12
+
usage: runny_babbit.py [-h] [-w int] str
13
+
14
+
Introduce Spoonerisms
15
+
16
+
positional arguments:
17
+
str Input text or file
18
+
19
+
optional arguments:
20
+
-h, --help show this help message and exit
21
+
-w int, --width int Output text width (default: 70)
22
+
````
23
+
24
+
It should handle text from the command line:
25
+
26
+
````
27
+
$ ./runny_babbit.py 'the bunny rabbit'
28
+
the runny babbit
29
+
````
30
+
31
+
Or a named file:
32
+
33
+
````
34
+
$ cat input1.txt
35
+
The bunny rabbit is cute.
36
+
$ ./runny_babbit.py input1.txt
37
+
The runny babbit is cute.
38
+
````
39
+
40
+
We'll use a set of "stop" words to prevent the switching of sounds when one of the words is in the following list:
41
+
42
+
before behind between beyond but by concerning despite down
43
+
during following for from into like near plus since that the
44
+
through throughout to towards which with within without
45
+
46
+
Hints:
47
+
48
+
* You'll need to consider all the words in the input as pairs, like `[(0, 1), (1, 2)]` up to `n` (number of words) etc. How can you create such a list where instead of `0` and `1` you have the actual words, e.g., `[('The', 'bunny'), ('bunny', 'rabbit')]`?
49
+
* There are several exercises where we try to break words into initial consonant sounds and whatever else that follows. Can you reuse code from elsewhere? I'd recommend using regular expressions!
50
+
* Be sure you don't use a word more than once in a swap. E.g., in the phrase "the brown, wooden box", we'd skip "the" and consider the other two pairs of words `('brown', 'wooden')` and `('wooden', 'box')`. If we swap the first pair to make `('wown', 'brooden')`, we would not want to consider the next pair because 'wooden' has already been used.
51
+
* Use the `textwrap` module to handle the formatting of the ouput text to a maximum `--width`
0 commit comments