You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: exercises/Exercises.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# awk introduction
2
2
3
-
> Exercise related files are available from [exercises folder of learn_gnuawk repo](https://github.com/learnbyexample/learn_gnuawk/tree/master/exercises).
3
+
> Exercise related files are available from [exercises folder of learn_gnuawk repo](https://github.com/learnbyexample/learn_gnuawk/tree/master/exercises). For solutions, see [Exercise_solutions.md](https://github.com/learnbyexample/learn_gnuawk/blob/master/exercises/Exercise_solutions.md).
4
4
5
5
**a)** For the input file `addr.txt`, display all lines containing `is`.
Copy file name to clipboardExpand all lines: gnu_awk.md
+42-27Lines changed: 42 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,7 +66,7 @@ Resources mentioned in Acknowledgements section are available under original lic
66
66
67
67
## Book version
68
68
69
-
1.1
69
+
1.2
70
70
See [Version_changes.md](https://github.com/learnbyexample/learn_gnuawk/blob/master/Version_changes.md) to track changes across book versions.
71
71
72
72
# Installation and Documentation
@@ -386,7 +386,7 @@ Next chapter is dedicated solely for regular expressions. The features introduce
386
386
387
387
## Exercises
388
388
389
-
> Exercise related files are available from [exercises folder of learn_gnuawk repo](https://github.com/learnbyexample/learn_gnuawk/tree/master/exercises).
389
+
> Exercise related files are available from [exercises folder of learn_gnuawk repo](https://github.com/learnbyexample/learn_gnuawk/tree/master/exercises). All the exercises are also collated together in one place at [Exercises.md](https://github.com/learnbyexample/learn_gnuawk/blob/master/exercises/Exercises.md). For solutions, see [Exercise_solutions.md](https://github.com/learnbyexample/learn_gnuawk/blob/master/exercises/Exercise_solutions.md).
390
390
391
391
**a)** For the input file `addr.txt`, display all lines containing `is`.
392
392
@@ -490,48 +490,50 @@ spared no one
490
490
grasped
491
491
```
492
492
493
-
## Line Anchors
493
+
## String Anchors
494
494
495
-
In the examples seen so far, the regexp was a simple string value without any special characters. Also, the regexp pattern evaluated to `true` if it was found anywhere in the string. Instead of matching anywhere in the line, restrictions can be specified. These restrictions are made possible by assigning special meaning to certain characters and escape sequences. The characters with special meaning are known as **metacharacters** in regular expressions parlance. In case you need to match those characters literally, you need to escape them with a `\` (discussed in [Matching the metacharacters](#matching-the-metacharacters) section).
495
+
In the examples seen so far, the regexp was a simple string value without any special characters. Also, the regexp pattern evaluated to `true` if it was found anywhere in the string. Instead of matching anywhere in the string, restrictions can be specified. These restrictions are made possible by assigning special meaning to certain characters and escape sequences. The characters with special meaning are known as **metacharacters** in regular expressions parlance. In case you need to match those characters literally, you need to escape them with a `\` (discussed in [Matching the metacharacters](#matching-the-metacharacters) section).
496
496
497
-
There are two line anchors:
497
+
There are two string anchors:
498
498
499
-
*`^` metacharacter restricts the matching to start of line
500
-
*`$` metacharacter restricts the matching to end of line
499
+
*`^` metacharacter restricts the matching to the start of string
500
+
*`$` metacharacter restricts the matching to the end of string
501
501
502
502
```bash
503
-
$ #lines starting with 'sp'
503
+
$ #string starting with 'sp'
504
504
$ printf'spared no one\ngrasped\nspar\n'| awk '/^sp/'
505
505
spared no one
506
506
spar
507
507
508
-
$ #lines ending with 'ar'
508
+
$ #string ending with 'ar'
509
509
$ printf'spared no one\ngrasped\nspar\n'| awk '/ar$/'
510
510
spar
511
511
512
-
$ # change only whole line 'spar'
513
-
$ # can also use: awk '/^spar$/{$0 = 123} 1'
512
+
$ # change only whole string 'spar'
513
+
$ # can also use: awk '/^spar$/{$0 = 123} 1' or awk '$0=="spar"{$0 = 123} 1'
514
514
$ printf'spared no one\ngrasped\nspar\n'| awk '{sub(/^spar$/, "123")} 1'
515
515
spared no one
516
516
grasped
517
517
123
518
518
```
519
519
520
-
The anchors can be used by themselves as a pattern. Helps to insert text at start or end of line, emulating string concatenation operations. These might not feel like useful capability, but combined with other features they become quite a handy tool.
520
+
The anchors can be used by themselves as a pattern. Helps to insert text at the start or end of string, emulating string concatenation operations. These might not feel like useful capability, but combined with other features they become quite a handy tool.
521
521
522
522
```bash
523
523
$ printf'spared no one\ngrasped\nspar\n'| awk '{gsub(/^/, "* ")} 1'
524
524
* spared no one
525
525
* grasped
526
526
* spar
527
527
528
-
$ # append only if line doesn't contain space characters
528
+
$ # append only if string doesn't contain space characters
529
529
$ printf'spared no one\ngrasped\nspar\n'| awk '!/ /{gsub(/$/, ".")} 1'
530
530
spared no one
531
531
grasped.
532
532
spar.
533
533
```
534
534
535
+
> See also [Behavior of ^ and $ when string contains newline](#behavior-of--and--when-string-contains-newline) section.
536
+
535
537
## Word Anchors
536
538
537
539
The second type of restriction is word anchors. A word character is any alphabet (irrespective of case), digit and the underscore character. You might wonder why there are digits and underscores as well, why not only alphabets? This comes from variable and function naming conventions — typically alphabets, digits and underscores are allowed. So, the definition is more programming oriented than natural language.
@@ -604,7 +606,7 @@ c:o:p:p:e:r
604
606
Before seeing next regexp feature, it is good to note that sometimes using logical operators is easier to read and maintain compared to doing everything with regexp.
605
607
606
608
```bash
607
-
$ #lines starting with 'b' but not containing 'at'
609
+
$ #string starting with 'b' but not containing 'at'
608
610
$ awk '/^b/ && !/at/' table.txt
609
611
blue cake mug shirt -7
610
612
@@ -621,12 +623,11 @@ Many a times, you'd want to search for multiple terms. In a conditional expressi
621
623
Alternation is similar to using `||` operator between two regexps. Having a single regexp helps to write terser code and `||` cannot be used when substitution is required.
622
624
623
625
```bash
624
-
$ #lines with whole word 'par' or lines ending with 's'
626
+
$ #match whole word 'par' or string ending with 's'
625
627
$ # same as: awk '/\<par\>/ || /s$/'
626
628
$ awk '/\<par\>|s$/' word_anchors.txt
627
629
sub par
628
630
two spare computers
629
-
630
631
$ # replace 'cat' or 'dog' or 'fox' with '--'
631
632
$ echo'cats dog bee parrot foxed'| awk '{gsub(/cat|dog|fox/, "--")} 1'
632
633
--s -- bee parrot --ed
@@ -691,7 +692,7 @@ part time
691
692
692
693
You have seen a few metacharacters and escape sequences that help to compose a regular expression. To match the metacharacters literally, i.e. to remove their special meaning, prefix those characters with a `\` character. To indicate a literal `\` character, use `\\`.
693
694
694
-
Unlike `grep` and `sed`, the line anchors have to be always escaped to match them literally as there is no BRE mode in `awk`. They do not lose their special meaning when not used in their customary positions.
695
+
Unlike `grep` and `sed`, the string anchors have to be always escaped to match them literally as there is no BRE mode in `awk`. They do not lose their special meaning when not used in their customary positions.
695
696
696
697
```bash
697
698
$ # awk '/b^2/' will not work even though ^ isn't being used as anchor
$ echo'no so in to do on'| awk '{gsub(/\<[sot][on]\>/, "X")} 1'
932
933
no X in X do X
933
934
934
-
$ #lines made up of letters 'o' and 'n', line length at least 2
935
+
$ #strings made up of letters 'o' and 'n', string length at least 2
935
936
$ # /usr/share/dict/words contains dictionary words, one word per line
936
937
$ awk '/^[on]{2,}$/' /usr/share/dict/words
937
938
no
@@ -1109,7 +1110,7 @@ universe: 42
1109
1110
> If a metacharacter is specified by ASCII value, it will still act as the metacharacter. Undefined escape sequences will result in a warning and treated as the character it escapes.
1110
1111
1111
1112
```bash
1112
-
$ # \x5e is ^ character, acts as line anchor here
1113
+
$ # \x5e is ^ character, acts as string anchor here
1113
1114
$ printf'cute\ncot\ncat\ncoat\n'| awk '/\x5eco/'
1114
1115
cot
1115
1116
coat
@@ -1171,7 +1172,7 @@ $ # duplicate first column value as final column
## Behavior of ^ and $ when string contains newline
4774
+
4775
+
In some regular expression implementations, `^` matches the start of a line and `$` matches the end of a line (with newline as the line separator). In `awk`, these anchors always match the start of the entire string and end of the entire string respectively. This comes into play when `RS` is other than the newline character, or if you have a string value containing newline characters.
4776
+
4777
+
```bash
4778
+
$ # 'apple\n' doesn't match as there's newline character
The word boundary `\y` matches both start and end of word locations. Whereas, `\<` and `\>` match exactly the start and end of word locations respectively. This leads to cases where you have to choose which of these word boundaries to use depending on results desired. Consider `I have 12, he has 2!` as sample text, shown below as an image with vertical bars marking the word boundaries. The last character `!` doesn't have end of word boundary as it is not a word character.
0 commit comments