diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..3facb5c --- /dev/null +++ b/404.html @@ -0,0 +1,31 @@ +Page not found - CLI text processing with GNU Coreutils

Document not found (404)

This URL is invalid, sorry. Please use the navigation bar or search to continue.

\ No newline at end of file diff --git a/Exercise_solutions.html b/Exercise_solutions.html new file mode 100644 index 0000000..07764fc --- /dev/null +++ b/Exercise_solutions.html @@ -0,0 +1,1163 @@ +Exercise solutions - CLI text processing with GNU Coreutils

Exercise solutions


cat and tac

1) The given sample data has empty lines at the start and end of the input. Also, there are multiple empty lines between the paragraphs. How would you get the output shown below?

# note that there's an empty line at the end of the output
+$ printf '\n\n\ndragon\n\n\n\nunicorn\nbee\n\n\n' | cat -sb
+
+     1  dragon
+
+     2  unicorn
+     3  bee
+
+

2) Pass appropriate arguments to the cat command to get the output shown below.

$ cat greeting.txt
+Hi there
+Have a nice day
+
+$ echo '42 apples and 100 bananas' | cat - greeting.txt
+42 apples and 100 bananas
+Hi there
+Have a nice day
+

3) What does the -v option of the cat command do?

Displays nonprinting characters using the caret notation.

4) Which options of the cat command do the following stand in for?

  • -e option is equivalent to -vE
  • -t option is equivalent to -vT
  • -A option is equivalent to -vET

5) Will the two commands shown below produce the same output? If not, why not?

$ cat fruits.txt ip.txt | tac
+
+$ tac fruits.txt ip.txt
+

No. The first command concatenates the input files before reversing the content linewise. With the second command, each file content will be reversed separately.

6) Reverse the contents of blocks.txt file as shown below, considering ---- as the separator.

$ cat blocks.txt
+----
+apple--banana
+mango---fig
+----
+3.14
+-42
+1000
+----
+sky blue
+dark green
+----
+hi hello
+
+$ tac -bs '----' blocks.txt
+----
+hi hello
+----
+sky blue
+dark green
+----
+3.14
+-42
+1000
+----
+apple--banana
+mango---fig
+

7) For the blocks.txt file, write solutions to display only the last such group and last two groups.

# can also use: tac -bs '----' blocks.txt | awk '/----/ && ++c==2{exit} 1'
+$ tac blocks.txt | sed '/----/q' | tac
+----
+hi hello
+
+$ tac -bs '----' blocks.txt | awk '/----/ && ++c==3{exit} 1' | tac -bs '----'
+----
+sky blue
+dark green
+----
+hi hello
+

8) Reverse the contents of items.txt as shown below. Consider digits at the start of lines as the separator.

$ cat items.txt
+1) fruits
+apple 5
+banana 10
+2) colors
+green
+sky blue
+3) magical beasts
+dragon 3
+unicorn 42
+
+$ tac -brs '^[0-9]' items.txt
+3) magical beasts
+dragon 3
+unicorn 42
+2) colors
+green
+sky blue
+1) fruits
+apple 5
+banana 10
+

head and tail

1) Use appropriate commands and shell features to get the output shown below.

$ printf 'carpet\njeep\nbus\n'
+carpet
+jeep
+bus
+
+# use the above 'printf' command for input data
+$ c=$(printf 'carpet\njeep\nbus\n' | head -c3)
+$ echo "$c"
+car
+

2) How would you display all the input lines except the first one?

$ printf 'apple\nfig\ncarpet\njeep\nbus\n' | tail -n +2
+fig
+carpet
+jeep
+bus
+

3) Which command would you use to get the output shown below?

$ cat fruits.txt
+banana
+papaya
+mango
+$ cat blocks.txt
+----
+apple--banana
+mango---fig
+----
+3.14
+-42
+1000
+----
+sky blue
+dark green
+----
+hi hello
+
+$ head -n2 fruits.txt blocks.txt
+==> fruits.txt <==
+banana
+papaya
+
+==> blocks.txt <==
+----
+apple--banana
+

4) Use a combination of head and tail commands to get the 11th to 14th characters from the given input.

# can also use: tail -c +11 | head -c4
+$ printf 'apple\nfig\ncarpet\njeep\nbus\n' | head -c14 | tail -c +11
+carp
+

5) Extract the starting six bytes from the input files ip.txt and fruits.txt.

$ head -q -c6 ip.txt fruits.txt
+it is banana
+

6) Extract the last six bytes from the input files fruits.txt and ip.txt.

$ tail -q -c6 fruits.txt ip.txt
+mango
+erish
+

7) For the input file ip.txt, display except the last 5 lines.

$ head -n -5 ip.txt
+it is a warm and cozy day
+listen to what I say
+go play in the park
+come back before the sky turns dark
+

8) Display the third line from the given stdin data. Consider the NUL character as the line separator.

$ printf 'apple\0fig\0carpet\0jeep\0bus\0' | head -z -n3 | tail -z -n1
+carpet
+

tr

1) What's wrong with the following command?

$ echo 'apple#banana#cherry' | tr # :
+tr: missing operand
+Try 'tr --help' for more information.
+
+$ echo 'apple#banana#cherry' | tr '#' ':'
+apple:banana:cherry
+

As a good practice, always quote the arguments passed to the tr command to avoid conflict with shell metacharacters. Unless of course, you need the shell to interpret them.

2) Retain only alphabets, digits and whitespace characters.

$ printf 'Apple_42  cool,blue\tDragon:army\n' | tr -dc '[:alnum:][:space:]'
+Apple42  coolblue       Dragonarmy
+

3) Similar to rot13, figure out a way to shift digits such that the same logic can be used both ways.

$ echo '4780 89073' | tr '0-9' '5-90-4'
+9235 34528
+
+$ echo '9235 34528' | tr '0-9' '5-90-4'
+4780 89073
+

4) Figure out the logic based on the given input and output data. Hint: use two ranges for the first set and only 6 characters in the second set.

$ echo 'apple banana cherry damson etrog' | tr 'a-ep-z' '12345X'
+1XXl5 21n1n1 3h5XXX 41mXon 5XXog
+

5) Which option would you use to truncate the first set so that it matches the length of the second set?

The -t option is needed for this.

6) What does the * notation do in the second set?

The [c*n] notation repeats a character c by n times. You can specify n in decimal or octal formats. If n is omitted, the character c is repeated as many times as needed to equalize the length of the sets.

7) Change : to - and ; to the newline character.

$ echo 'tea:coffee;brown:teal;dragon:unicorn' | tr ':;' '-\n'
+tea-coffee
+brown-teal
+dragon-unicorn
+

8) Convert all characters to * except digit and newline characters.

$ echo 'ajsd45_sdg2Khnf4v_54as' | tr -c '0-9\n' '*'
+****45****2****4**54**
+

9) Change consecutive repeated punctuation characters to a single punctuation character.

$ echo '""hi..."", good morning!!!!' | tr -s '[:punct:]'
+"hi.", good morning!
+

10) Figure out the logic based on the given input and output data.

$ echo 'Aapple    noon     banana!!!!!' | tr -cs 'a-z\n' ':'
+:apple:noon:banana:
+

11) The books.txt file has items separated by one or more : characters. Change this separator to a single newline character as shown below.

$ cat books.txt
+Cradle:::Mage Errant::The Weirkey Chronicles
+Mother of Learning::Eight:::::Dear Spellbook:Ascendant
+Mark of the Fool:Super Powereds:::Ends of Magic
+
+$ <books.txt tr -s ':' '\n'
+Cradle
+Mage Errant
+The Weirkey Chronicles
+Mother of Learning
+Eight
+Dear Spellbook
+Ascendant
+Mark of the Fool
+Super Powereds
+Ends of Magic
+

cut

1) Display only the third field.

$ printf 'tea\tcoffee\tchocolate\tfruit\n' | cut -f3
+chocolate
+

2) Display the second and fifth fields. Consider , as the field separator.

$ echo 'tea,coffee,chocolate,ice cream,fruit' | cut -d, -f2,5
+coffee,fruit
+

3) Why does the below command not work as expected? What other tools can you use in such cases?

cut ignores all repeated fields and the output is always presented in the ascending order.

# not working as expected
+$ echo 'apple,banana,cherry,fig' | cut -d, -f3,1,3
+apple,cherry
+
+# expected output
+$ echo 'apple,banana,cherry,fig' | awk -F, -v OFS=, '{print $3, $1, $3}'
+cherry,apple,cherry
+

4) Display except the second field in the format shown below. Can you construct two different solutions?

# solution 1
+$ echo 'apple,banana,cherry,fig' | cut -d, --output-delimiter=' ' -f1,3-
+apple cherry fig
+
+# solution 2
+$ echo '2,3,4,5,6,7,8' | cut -d, --output-delimiter=' ' --complement -f2
+2 4 5 6 7 8
+

5) Extract the first three characters from the input lines as shown below. Can you also use the head command for this purpose? If not, why not?

$ printf 'apple\nbanana\ncherry\nfig\n' | cut -c-3
+app
+ban
+che
+fig
+

head cannot be used because it acts on the input as a whole, whereas cut works line wise.

6) Display only the first and third fields of the scores.csv input file, with tab as the output field separator.

$ cat scores.csv
+Name,Maths,Physics,Chemistry
+Ith,100,100,100
+Cy,97,98,95
+Lin,78,83,80
+
+$ cut -d, --output-delimiter=$'\t' -f1,3 scores.csv
+Name    Physics
+Ith     100
+Cy      98
+Lin     83
+

7) The given input data uses one or more : characters as the field separator. Assume that no field content will have the : character. Display except the second field, with : as the output field separator.

$ cat books.txt
+Cradle:::Mage Errant::The Weirkey Chronicles
+Mother of Learning::Eight:::::Dear Spellbook:Ascendant
+Mark of the Fool:Super Powereds:::Ends of Magic
+
+$ <books.txt tr -s ':' | cut --complement -d':' --output-delimiter=' : ' -f2
+Cradle : The Weirkey Chronicles
+Mother of Learning : Dear Spellbook : Ascendant
+Mark of the Fool : Ends of Magic
+

8) Which option would you use to not display lines that do not contain the input delimiter character?

You can use the -s option to suppress such lines.

9) Modify the command to get the expected output shown below.

$ printf 'apple\nbanana\ncherry\n' | cut -c-3 --output-delimiter=:
+app
+ban
+che
+
+$ printf 'apple\nbanana\ncherry\n' | cut -c1,2,3 --output-delimiter=:
+a:p:p
+b:a:n
+c:h:e
+

10) Figure out the logic based on the given input and output data.

$ printf 'apple\0fig\0carpet\0jeep\0' | cut -z --complement -c-2 | cat -v
+ple^@g^@rpet^@ep^@
+

seq

1) Generate numbers from 42 to 45 in ascending order.

$ seq 42 45
+42
+43
+44
+45
+

2) Why does the command shown below produce no output?

You have to explicitly provide a negative step value to generate numbers in descending order.

# no output
+$ seq 45 42
+
+# expected output
+$ seq 45 -1 42
+45
+44
+43
+42
+

3) Generate numbers from 25 to 10 in descending order, with a step value of 5.

$ seq 25 -5 10
+25
+20
+15
+10
+

4) Is the sequence shown below possible to generate with seq? If so, how?

$ seq -w -s, 01.5 6
+01.5,02.5,03.5,04.5,05.5
+

5) Modify the command shown below to customize the output numbering format.

$ seq 30.14 3.36 40.72
+30.14
+33.50
+36.86
+40.22
+
+$ seq -f'%.3e' 30.14 3.36 40.72
+3.014e+01
+3.350e+01
+3.686e+01
+4.022e+01
+

shuf

1) What's wrong with the given command?

shuf doesn't accept multiple input files. You can use cat to concatenate them first.

$ shuf --random-source=greeting.txt fruits.txt books.txt
+shuf: extra operand ‘books.txt’
+Try 'shuf --help' for more information.
+
+# expected output
+$ cat fruits.txt books.txt | shuf --random-source=greeting.txt
+banana
+Cradle:::Mage Errant::The Weirkey Chronicles
+Mother of Learning::Eight:::::Dear Spellbook:Ascendant
+papaya
+Mark of the Fool:Super Powereds:::Ends of Magic
+mango
+

2) What do the -r and -n options do? Why are they often used together?

The -r option helps if you want to allow input lines to be repeated. This option is usually paired with -n to limit the number of lines in the output. Otherwise, shuf -r will produce output lines indefinitely.

3) What does the following command do?

The -e option is useful to specify multiple input lines as arguments to the command.

$ shuf -e apple banana cherry fig mango
+cherry
+banana
+mango
+fig
+apple
+

4) Which option would you use to generate random numbers? Given an example.

The -i option helps generate random positive integers.

$ shuf -n3 -i 100-200
+128
+177
+193
+

5) How would you generate 5 random numbers between 0.125 and 0.789 with a step value of 0.023?

$ seq 0.125 0.023 0.789 | shuf -n5
+0.378
+0.631
+0.447
+0.746
+0.723
+

paste

1) What's the default delimiter character added by the paste command? Which option would you use to customize this separator?

Tab. You can use the -d option to change the delimiter between the columns.

2) Will the following two commands produce equivalent output? If not, why not?

$ paste -d, <(seq 3) <(printf '%s\n' item_{1..3})
+1,item_1
+2,item_2
+3,item_3
+
+$ printf '%s\n' {1..3},item_{1..3}
+1,item_1
+1,item_2
+1,item_3
+2,item_1
+2,item_2
+2,item_3
+3,item_1
+3,item_2
+3,item_3
+

The outputs are not equivalent because brace expansion creates all combinations when multiple braces are used.

3) Combine the two data sources as shown below.

$ printf '1)\n2)\n3)'
+1)
+2)
+3)
+$ cat fruits.txt
+banana
+papaya
+mango
+
+$ paste -d '' <(printf '1)\n2)\n3)') fruits.txt
+1)banana
+2)papaya
+3)mango
+

4) Interleave the contents of fruits.txt and books.txt.

$ paste -d'\n' fruits.txt books.txt
+banana
+Cradle:::Mage Errant::The Weirkey Chronicles
+papaya
+Mother of Learning::Eight:::::Dear Spellbook:Ascendant
+mango
+Mark of the Fool:Super Powereds:::Ends of Magic
+

5) Generate numbers 1 to 9 in two different formats as shown below.

$ seq 9 | paste -d: - - -
+1:2:3
+4:5:6
+7:8:9
+
+$ paste -d' : ' <(seq 3) /dev/null /dev/null <(seq 4 6) /dev/null /dev/null <(seq 7 9)
+1 : 4 : 7
+2 : 5 : 8
+3 : 6 : 9
+

6) Combine the contents of fruits.txt and colors.txt as shown below.

$ cat fruits.txt
+banana
+papaya
+mango
+$ cat colors.txt
+deep blue
+light orange
+blue delight
+
+$ paste -d'\n' fruits.txt colors.txt | paste -sd,
+banana,deep blue,papaya,light orange,mango,blue delight
+

pr

1) What does the -t option do?

The -t option turns off the pagination features like headers and trailers.

2) Generate numbers 1 to 16 in two different formats as shown below.

$ seq -w 16 | pr -4ats,
+01,02,03,04
+05,06,07,08
+09,10,11,12
+13,14,15,16
+
+$ seq -w 16 | pr -4ts,
+01,05,09,13
+02,06,10,14
+03,07,11,15
+04,08,12,16
+

3) How'd you solve the issue shown below?

$ seq 100 | pr -37ats,
+pr: page width too narrow
+
+$ seq 100 | pr -J -w73 -37ats,
+1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37
+38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74
+75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
+

(N-1)*length(separator) + N is the minimum page width you need, where N is the number of columns required. So, for 37 columns and a separator of length 1, you'll need a minimum width of 73. -J option ensures input lines aren't truncated.

4) Combine the contents of fruits.txt and colors.txt in two different formats as shown below.

$ cat fruits.txt
+banana
+papaya
+mango
+$ cat colors.txt
+deep blue
+light orange
+blue delight
+
+$ pr -mts' : ' fruits.txt colors.txt
+banana : deep blue
+papaya : light orange
+mango : blue delight
+
+$ pr -n:2 -mts, fruits.txt colors.txt
+ 1:banana,deep blue
+ 2:papaya,light orange
+ 3:mango,blue delight
+

5) What does the -d option do?

You can use the -d option to double space the input contents. That is, every newline character is doubled.


fold and fmt

1) What's the default wrap length of the fold and fmt commands?

80 bytes and 93% of 75 columns respectively.

2) Fold the given stdin data at 9 bytes.

$ echo 'hi hello, how are you?' | fold -w9
+hi hello,
+ how are 
+you?
+

3) Figure out the logic based on the given input and output data using the fold command.

$ cat ip.txt
+it is a warm and cozy day
+listen to what I say
+go play in the park
+come back before the sky turns dark
+
+There are so many delights to cherish
+Apple, Banana and Cherry
+Bread, Butter and Jelly
+Try them all before you perish
+
+$ head -n2 ip.txt | fold -sw10
+it is a 
+warm and 
+cozy day
+listen to 
+what I say
+

4) What does the fold -b option do?

The -b option will cause fold to treat tab, backspace, and carriage return characters as if they were a single byte character.

5) How'd you get the expected output shown below?

# wrong output
+$ echo 'fig appleseed mango pomegranate' | fold -sw7
+fig 
+applese
+ed 
+mango 
+pomegra
+nate
+
+# expected output
+$ echo 'fig appleseed mango pomegranate' | fmt -w7
+fig
+appleseed
+mango
+pomegranate
+

6) What do the options -s and -u of the fmt command do?

By default, the fmt command joins lines together that are shorter than the specified width. The -s option will disable this behavior.

The -u option changes multiple spaces to a single space. Excess spacing between sentences will be changed to two spaces.


sort

1) Default sort doesn't work for numbers. Which option would you use to get the expected output shown below?

$ printf '100\n10\n20\n3000\n2.45\n' | sort -n
+2.45
+10
+20
+100
+3000
+

2) Which sort option will help you ignore case? LC_ALL=C is used here to avoid differences due to locale.

$ printf 'Super\nover\nRUNE\ntea\n' | LC_ALL=C sort -f
+over
+RUNE
+Super
+tea
+

3) The -n option doesn't work for all sorts of numbers. Which sort option would you use to get the expected output shown below?

# wrong output
+$ printf '+120\n-1.53\n3.14e+4\n42.1e-2' | sort -n
+-1.53
++120
+3.14e+4
+42.1e-2
+
+# expected output
+$ printf '+120\n-1.53\n3.14e+4\n42.1e-2' | sort -g
+-1.53
+42.1e-2
++120
+3.14e+4
+

4) What do the -V and -h options do?

The -V option is useful when you have a mix of alphabets and digits. It also helps when you want to treat digits after a decimal point as whole numbers, for example 1.10 should be greater than 1.2.

Commands like du (disk usage) have the -h and --si options to display numbers with SI suffixes like k, K, M, G and so on. In such cases, you can use sort -h to order them.

5) Is there a difference between shuf and sort -R?

The sort -R option will display the output in random order. Unlike shuf, this option will always place identical lines next to each other due to the implementation.

6) Sort the scores.csv file numerically in ascending order using the contents of the second field. Header line should be preserved as the first line as shown below.

$ cat scores.csv
+Name,Maths,Physics,Chemistry
+Ith,100,100,100
+Cy,97,98,95
+Lin,78,83,80
+
+$ (sed -u '1q' ; sort -t, -k2,2n) < scores.csv
+Name,Maths,Physics,Chemistry
+Lin,78,83,80
+Cy,97,98,95
+Ith,100,100,100
+

7) Sort the contents of duplicates.csv by the fourth column numbers in descending order. Retain only the first copy of lines with the same number.

$ cat duplicates.csv
+brown,toy,bread,42
+dark red,ruby,rose,111
+blue,ruby,water,333
+dark red,sky,rose,555
+yellow,toy,flower,333
+white,sky,bread,111
+light red,purse,rose,333
+
+$ sort -t, -k4,4nr -u duplicates.csv
+dark red,sky,rose,555
+blue,ruby,water,333
+dark red,ruby,rose,111
+brown,toy,bread,42
+

8) Sort the contents of duplicates.csv by the third column item. Use the fourth column numbers as the tie-breaker.

$ sort -t, -k3,3 -k4,4n duplicates.csv
+brown,toy,bread,42
+white,sky,bread,111
+yellow,toy,flower,333
+dark red,ruby,rose,111
+light red,purse,rose,333
+dark red,sky,rose,555
+blue,ruby,water,333
+

9) What does the -s option provide?

The -s option is useful to retain the original order of input lines when two or more lines are deemed equal.

10) Sort the given input based on the numbers inside the brackets.

$ printf '(-3.14)\n[45]\n(12.5)\n{14093}' | sort -k1.2n
+(-3.14)
+(12.5)
+[45]
+{14093}
+

11) What do the -c, -C and -m options do?

The -c option helps you spot the first unsorted entry in the given input. The uppercase -C option is similar but only affects the exit status. Note that these options will not work for multiple inputs.

The -m option is useful if you have one or more sorted input files and need a single sorted output file. This would be faster than normal sorting.


uniq

1) Will uniq throw an error if the input is not sorted? What do you think will be the output for the following input?

uniq doesn't necessarily require the input to be sorted. Adjacent lines are used for comparison purposes.

$ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | uniq
+red
+green
+red
+blue
+

2) Are there differences between sort -u file and sort file | uniq?

Yes. For example, you may need to sort based on some specific criteria and then identify duplicates based on the entire line contents. Here's an example:

# can't use sort -n -u here
+$ printf '2 balls\n13 pens\n2 pins\n13 pens\n' | sort -n | uniq
+2 balls
+2 pins
+13 pens
+

3) What are the differences between sort -u and uniq -u options, if any?

sort -u retains the first copy of duplicates that are deemed to be equal. uniq -u retains only the unique copies (i.e. not even a single copy of the duplicates will be part of the output).

4) Filter the third column items from duplicates.csv. Construct three solutions to display only unique items, duplicate items and all duplicates.

$ cat duplicates.csv
+brown,toy,bread,42
+dark red,ruby,rose,111
+blue,ruby,water,333
+dark red,sky,rose,555
+yellow,toy,flower,333
+white,sky,bread,111
+light red,purse,rose,333
+
+# unique
+$ cut -d, -f3 duplicates.csv | sort | uniq -u
+flower
+water
+
+# duplicates
+$ cut -d, -f3 duplicates.csv | sort | uniq -d
+bread
+rose
+
+# all duplicates
+$ cut -d, -f3 duplicates.csv | sort | uniq -D
+bread
+bread
+rose
+rose
+rose
+

5) What does the --group option do? What customization features are available?

The --group options allows you to visually separate groups of similar lines with an empty line. This option can accept four values — separate, prepend, append and both. The default is separate, which adds a newline character between the groups. prepend will add a newline before the first group as well and append will add a newline after the last group. both combines the prepend and append behavior.

6) Count the number of times input lines are repeated and display the results in the format shown below.

$ s='brown\nbrown\nbrown\ngreen\nbrown\nblue\nblue'
+$ printf '%b' "$s" | sort | uniq -c | sort -n
+      1 green
+      2 blue
+      4 brown
+

7) For the input file f1.txt, retain only unique entries based on the first two characters of each line. For example, abcd and ab12 should be considered as duplicates and neither of them will be part of the output.

$ cat f1.txt
+3) cherry
+1) apple
+2) banana
+1) almond
+4) mango
+2) berry
+3) chocolate
+1) apple
+5) cherry
+
+$ sort f1.txt | uniq -u -w2
+4) mango
+5) cherry
+

8) For the input file f1.txt, display only the duplicate items without considering the first two characters of each line. For example, abcd and 12cd should be considered as duplicates. Assume that the third character of each line is always a space character.

$ sort -k2 f1.txt | uniq -d -f1
+1) apple
+3) cherry
+

9) What does the -s option do?

The -s option allows you to skip the first N characters (calculated as bytes).

10) Filter only unique lines, but ignore differences due to case.

$ printf 'cat\nbat\nCAT\nCar\nBat\nmat\nMat' | sort -f | uniq -iu
+Car
+

comm

1) Get the common lines between the s1.txt and s2.txt files. Assume that their contents are already sorted.

$ paste s1.txt s2.txt
+apple   banana
+coffee  coffee
+fig     eclair
+honey   fig
+mango   honey
+pasta   milk
+sugar   tea
+tea     yeast
+
+$ comm -12 s1.txt s2.txt
+coffee
+fig
+honey
+tea
+

2) Display lines present in s1.txt but not s2.txt and vice versa.

# lines unique to the first file
+$ comm -23 s1.txt s2.txt
+apple
+mango
+pasta
+sugar
+
+# lines unique to the second file
+$ comm -13 s1.txt s2.txt
+banana
+eclair
+milk
+yeast
+

3) Display lines unique to the s1.txt file and the common lines when compared to the s2.txt file. Use ==> to separate the output columns.

$ comm -2 --output-delimiter='==>' s1.txt s2.txt
+apple
+==>coffee
+==>fig
+==>honey
+mango
+pasta
+sugar
+==>tea
+

4) What does the --total option do?

Gives you the count of lines for each of the three columns.

5) Will the comm command fail if there are repeated lines in the input files? If not, what'd be the expected output for the command shown below?

The number of duplicate lines in the common column will be minimum of the duplicate occurrences between the two files. Rest of the duplicate lines, if any, will be considered as unique to the file having the excess lines.

$ cat s3.txt
+apple
+apple
+guava
+honey
+tea
+tea
+tea
+
+$ comm -23 s3.txt s1.txt
+apple
+guava
+tea
+tea
+

join

info Assume that the input files are already sorted for these exercises.

1) Use appropriate options to get the expected outputs shown below.

# no output
+$ join <(printf 'apple 2\nfig 5') <(printf 'Fig 10\nmango 4')
+
+# expected output 1
+$ join -i <(printf 'apple 2\nfig 5') <(printf 'Fig 10\nmango 4')
+fig 5 10
+
+# expected output 2
+$ join -i -a1 -a2 <(printf 'apple 2\nfig 5') <(printf 'Fig 10\nmango 4')
+apple 2
+fig 5 10
+mango 4
+

2) Use the join command to display only the non-matching lines based on the first field.

$ cat j1.txt
+apple   2
+fig     5
+lemon   10
+tomato  22
+$ cat j2.txt
+almond  33
+fig     115
+mango   20
+pista   42
+
+# first field items present in j1.txt but not j2.txt
+$ join -v1 j1.txt j2.txt
+apple 2
+lemon 10
+tomato 22
+
+# first field items present in j2.txt but not j1.txt
+$ join -v2 j1.txt j2.txt
+almond 33
+mango 20
+pista 42
+

3) Filter lines from j1.txt and j2.txt that match the items from s1.txt.

$ cat s1.txt
+apple
+coffee
+fig
+honey
+mango
+pasta
+sugar
+tea
+
+# note that sort -m is used since the input files are already sorted
+$ join s1.txt <(sort -m j1.txt j2.txt)
+apple 2
+fig 115
+fig 5
+mango 20
+

4) Join the marks_1.csv and marks_2.csv files to get the expected output shown below.

$ cat marks_1.csv
+Name,Biology,Programming
+Er,92,77
+Ith,100,100
+Lin,92,100
+Sil,86,98
+$ cat marks_2.csv
+Name,Maths,Physics,Chemistry
+Cy,97,98,95
+Ith,100,100,100
+Lin,78,83,80
+
+$ join -t, --header marks_1.csv marks_2.csv
+Name,Biology,Programming,Maths,Physics,Chemistry
+Ith,100,100,100,100,100
+Lin,92,100,78,83,80
+

5) By default, the first field is used to combine the lines. Which options are helpful if you want to change the key field to be used for joining?

You can use -1 and -2 options followed by a field number to specify a different field number. You can use the -j option if the field number is the same for both the files.

6) Join the marks_1.csv and marks_2.csv files to get the expected output with specific fields as shown below.

$ join -t, --header -o 1.1,1.3,2.2,1.2 marks_1.csv marks_2.csv
+Name,Programming,Maths,Biology
+Ith,100,100,100
+Lin,100,78,92
+

7) Join the marks_1.csv and marks_2.csv files to get the expected output shown below. Use 50 as the filler data.

$ join -t, --header -o auto -a1 -a2 -e '50' marks_1.csv marks_2.csv
+Name,Biology,Programming,Maths,Physics,Chemistry
+Cy,50,50,97,98,95
+Er,92,77,50,50,50
+Ith,100,100,100,100,100
+Lin,92,100,78,83,80
+Sil,86,98,50,50,50
+

8) When you use the -o auto option, what'd happen to the extra fields compared to those in the first lines of the input data?

If you use auto as the argument for the -o option, first line of both the input files will be used to determine the number of output fields. If the other lines have extra fields, they will be discarded.

9) From the input files j3.txt and j4.txt, filter only the lines are unique — i.e. lines that are not common to these files. Assume that the input files do not have duplicate entries.

$ cat j3.txt
+almond
+apple pie
+cold coffee
+honey
+mango shake
+pasta
+sugar
+tea
+$ cat j4.txt
+apple
+banana shake
+coffee
+fig
+honey
+mango shake
+milk
+tea
+yeast
+
+$ join -t '' -v1 -v2 j3.txt j4.txt
+almond
+apple
+apple pie
+banana shake
+coffee
+cold coffee
+fig
+milk
+pasta
+sugar
+yeast
+

10) From the input files j3.txt and j4.txt, filter only the lines are common to these files.

$ join -t '' j3.txt j4.txt
+honey
+mango shake
+tea
+

nl

1) nl and cat -n are always equivalent for numbering lines. True or False?

True if there are no empty lines in the input data. cat -b and nl are always equivalent.

2) What does the -n option do?

You can use the -n option to customize the number formatting. The available styles are:

  • rn right justified with space fillers (default)
  • rz right justified with leading zeros
  • ln left justified with space fillers

3) Use nl to produce the two expected outputs shown below.

$ cat greeting.txt
+Hi there
+Have a nice day
+
+# expected output 1
+$ nl -w3 -n'rz' greeting.txt
+001     Hi there
+002     Have a nice day
+
+# expected output 2
+$ nl -w3 -n'rz' -s') ' greeting.txt
+001) Hi there
+002) Have a nice day
+

4) Figure out the logic based on the given input and output data.

$ cat s1.txt
+apple
+coffee
+fig
+honey
+mango
+pasta
+sugar
+tea
+
+$ nl -w2 -s'. ' -v15 -i-2 s1.txt
+15. apple
+13. coffee
+11. fig
+ 9. honey
+ 7. mango
+ 5. pasta
+ 3. sugar
+ 1. tea
+

5) What are the three types of sections supported by nl?

nl recognizes three types of sections with the following default patterns:

  • \:\:\: as header
  • \:\: as body
  • \: as footer

These special lines will be replaced with an empty line after numbering. The numbering will be reset at the start of every section unless the -p option is used.

6) Only number the lines that start with ---- in the format shown below.

$ cat blocks.txt
+----
+apple--banana
+mango---fig
+----
+3.14
+-42
+1000
+----
+sky blue
+dark green
+----
+hi hello
+
+$ nl -w2 -s') ' -bp'^----' blocks.txt
+ 1) ----
+    apple--banana
+    mango---fig
+ 2) ----
+    3.14
+    -42
+    1000
+ 3) ----
+    sky blue
+    dark green
+ 4) ----
+    hi hello
+

7) For the blocks.txt file, determine the logic to produce the expected output shown below.

$ nl -w1 -s'. ' -d'--' blocks.txt
+
+1. apple--banana
+2. mango---fig
+
+1. 3.14
+2. -42
+3. 1000
+
+1. sky blue
+2. dark green
+
+1. hi hello
+

8) What does the -l option do?

The -l option controls how many consecutive empty lines should be considered as a single entry. Only the last empty line of such groupings will be numbered.

9) Figure out the logic based on the given input and output data.

$ cat all_sections.txt
+\:\:\:
+Header
+teal
+\:\:
+Hi there
+How are you
+\:\:
+banana
+papaya
+mango
+\:
+Footer
+
+$ nl -p -w2 -s') ' -ha all_sections.txt
+
+ 1) Header
+ 2) teal
+
+ 3) Hi there
+ 4) How are you
+
+ 5) banana
+ 6) papaya
+ 7) mango
+
+    Footer
+

wc

1) Save the number of lines in the greeting.txt input file to the lines shell variable.

$ lines=$(wc -l <greeting.txt)
+$ echo "$lines"
+2
+

2) What do you think will be the output of the following command?

Word count is based on whitespace separation.

$ echo 'dragons:2 ; unicorns:10' | wc -w
+3
+

3) Use appropriate options and arguments to get the output as shown below. Also, why is the line count showing as 2 instead of 3 for the stdin data?

Line count is based on the number of newline characters. So, if the last line of the input doesn't end with the newline character, it won't be counted.

$ printf 'apple\nbanana\ncherry' | wc -lc greeting.txt -
+      2      25 greeting.txt
+      2      19 -
+      4      44 total
+

4) Use appropriate options and arguments to get the output shown below.

$ printf 'greeting.txt\0scores.csv' | wc --files0-from=-
+2 6 25 greeting.txt
+4 4 70 scores.csv
+6 10 95 total
+

5) What is the difference between wc -c and wc -m options? And which option would you use to get the longest line length?

The -c option is useful to get the byte count. Use the -m option instead of -c if the input has multibyte characters.

# byte count
+$ printf 'αλεπού' | wc -c
+12
+
+# character count
+$ printf 'αλεπού' | wc -m
+6
+

You can use the -L option to report the length of the longest line in the input (excluding the newline character of a line).

6) Calculate the number of comma separated words from the scores.csv file.

$ cat scores.csv
+Name,Maths,Physics,Chemistry
+Ith,100,100,100
+Cy,97,98,95
+Lin,78,83,80
+
+$ <scores.csv tr ',' ' ' | wc -w
+16
+

split

info Remove the output files after every exercise.

1) Split the s1.txt file 3 lines at a time.

$ split -l3 s1.txt
+
+$ head xa?
+==> xaa <==
+apple
+coffee
+fig
+
+==> xab <==
+honey
+mango
+pasta
+
+==> xac <==
+sugar
+tea
+
+$ rm xa?
+

2) Use appropriate options to get the output shown below.

$ echo 'apple,banana,cherry,dates' | split -t, -l1
+
+$ head xa?
+==> xaa <==
+apple,
+==> xab <==
+banana,
+==> xac <==
+cherry,
+==> xad <==
+dates
+
+$ rm xa?
+

3) What do the -b and -C options do?

The -b option allows you to split the input by the number of bytes. This option also accepts suffixes such as K for 1024 bytes, KB for 1000 bytes, M for 1024 * 1024 bytes and so on.

The -C option is similar to the -b option, but it will try to break on line boundaries if possible. The break will happen before the given byte limit. If a line exceeds the given limit, it will be broken down into multiple parts.

4) Display the 2nd chunk of the ip.txt file after splitting it 4 times as shown below.

$ split -nl/2/4 ip.txt
+come back before the sky turns dark
+
+There are so many delights to cherish
+

5) What does the r prefix do when used with the -n option?

This creates output files with interleaved lines.

6) Split the ip.txt file 2 lines at a time. Customize the output filenames as shown below.

$ split -l2 -a1 -d --additional-suffix='.txt' ip.txt ip_
+
+$ head ip_*
+==> ip_0.txt <==
+it is a warm and cozy day
+listen to what I say
+
+==> ip_1.txt <==
+go play in the park
+come back before the sky turns dark
+
+==> ip_2.txt <==
+
+There are so many delights to cherish
+
+==> ip_3.txt <==
+Apple, Banana and Cherry
+Bread, Butter and Jelly
+
+==> ip_4.txt <==
+Try them all before you perish
+
+$ rm ip_*
+

7) Which option would you use to prevent empty files in the output?

The -e option prevents empty files in the output.

8) Split the items.txt file 5 lines at a time. Additionally, remove lines starting with a digit character as shown below.

$ cat items.txt
+1) fruits
+apple 5
+banana 10
+2) colors
+green
+sky blue
+3) magical beasts
+dragon 3
+unicorn 42
+
+$ split -l5 --filter='grep -v "^[0-9]" > $FILE' items.txt
+
+$ head xa?
+==> xaa <==
+apple 5
+banana 10
+green
+
+==> xab <==
+sky blue
+dragon 3
+unicorn 42
+
+$ rm xa?
+

csplit

info Remove the output files after every exercise.

1) Split the blocks.txt file such that the first 7 lines are in the first file and the rest are in the second file as shown below.

$ csplit -q blocks.txt 8
+
+$ head xx*
+==> xx00 <==
+----
+apple--banana
+mango---fig
+----
+3.14
+-42
+1000
+
+==> xx01 <==
+----
+sky blue
+dark green
+----
+hi hello
+
+$ rm xx*
+

2) Split the input file items.txt such that the text before a line containing colors is part of the first file and the rest are part of the second file as shown below.

$ csplit -q items.txt '/colors/'
+
+$ head xx*
+==> xx00 <==
+1) fruits
+apple 5
+banana 10
+
+==> xx01 <==
+2) colors
+green
+sky blue
+3) magical beasts
+dragon 3
+unicorn 42
+
+$ rm xx*
+

3) Split the input file items.txt such that the line containing magical and all the lines that come after are part of the single output file.

$ csplit -q items.txt '%magical%'
+
+$ cat xx00
+3) magical beasts
+dragon 3
+unicorn 42
+
+$ rm xx00
+

4) Split the input file items.txt such that the line containing colors as well the line that comes after are part of the first output file.

$ csplit -q items.txt '/colors/2'
+
+$ head xx*
+==> xx00 <==
+1) fruits
+apple 5
+banana 10
+2) colors
+green
+
+==> xx01 <==
+sky blue
+3) magical beasts
+dragon 3
+unicorn 42
+
+$ rm xx*
+

5) Split the input file items.txt on the line that comes before a line containing magical. Generate only a single output file as shown below.

$ csplit -q items.txt '%magical%-1'
+
+$ cat xx00
+sky blue
+3) magical beasts
+dragon 3
+unicorn 42
+
+$ rm xx00
+

6) Split the input file blocks.txt on the 4th occurrence of a line starting with the - character. Generate only a single output file as shown below.

$ csplit -q blocks.txt '%^-%' '{3}'
+
+$ cat xx00
+----
+sky blue
+dark green
+----
+hi hello
+
+$ rm xx00
+

7) For the input file blocks.txt, determine the logic to produce the expected output shown below.

$ csplit -qz --suppress-matched blocks.txt '/----/' '{*}'
+
+$ head xx*
+==> xx00 <==
+apple--banana
+mango---fig
+
+==> xx01 <==
+3.14
+-42
+1000
+
+==> xx02 <==
+sky blue
+dark green
+
+==> xx03 <==
+hi hello
+
+$ rm xx*
+

8) What does the -k option do?

By default, csplit will remove the created output files if there's an error or a signal that causes the command to stop. You can use the -k option to keep such files. One use case is line number based splitting with the {*} modifier.

$ seq 7 | csplit -q - 4 '{*}'
+csplit: ‘4’: line number out of range on repetition 1
+$ ls xx*
+ls: cannot access 'xx*': No such file or directory
+
+# -k option will allow you to retain the created files
+$ seq 7 | csplit -qk - 4 '{*}'
+csplit: ‘4’: line number out of range on repetition 1
+$ head xx*
+==> xx00 <==
+1
+2
+3
+
+==> xx01 <==
+4
+5
+6
+7
+
+$ rm xx*
+

9) Split the books.txt file on every line as shown below.

# can also use: split -l1 -d -a1 books.txt row_
+$ csplit -qkz -f'row_' -n1 books.txt 1 '{*}'
+csplit: ‘1’: line number out of range on repetition 3
+
+$ head row_*
+==> row_0 <==
+Cradle:::Mage Errant::The Weirkey Chronicles
+
+==> row_1 <==
+Mother of Learning::Eight:::::Dear Spellbook:Ascendant
+
+==> row_2 <==
+Mark of the Fool:Super Powereds:::Ends of Magic
+
+$ rm row_*
+

10) Split the items.txt file on lines starting with a digit character. Matching lines shouldn't be part of the output and the files should be named group_0.txt, group_1.txt and so on.

$ csplit -qz --suppress-matched -q -f'group_' -b'%d.txt' items.txt '/^[0-9]/' '{*}'
+
+$ head group_*
+==> group_0.txt <==
+apple 5
+banana 10
+
+==> group_1.txt <==
+green
+sky blue
+
+==> group_2.txt <==
+dragon 3
+unicorn 42
+
+$ rm group_*
+

expand and unexpand

1) The items.txt file has space separated words. Convert the spaces to be aligned at 10 column widths as shown below.

$ cat items.txt
+1) fruits
+apple 5
+banana 10
+2) colors
+green
+sky blue
+3) magical beasts
+dragon 3
+unicorn 42
+
+$ <items.txt tr ' ' '\t' | expand -t 10
+1)        fruits
+apple     5
+banana    10
+2)        colors
+green
+sky       blue
+3)        magical   beasts
+dragon    3
+unicorn   42
+

2) What does the expand -i option do?

The -i option converts only the tab characters present at the start of a line. The first occurrence of a character that is not tab or space characters will stop the expansion.

3) Expand the first tab character to stop at the 10th column and the second one at the 16th column. Rest of the tabs should be converted to a single space character.

$ printf 'app\tfix\tjoy\tmap\ttap\n' | expand -t 10,16
+app       fix   joy map tap
+
+$ printf 'appleseed\tfig\tjoy\n' | expand -t 10,16
+appleseed fig   joy
+
+$ printf 'a\tb\tc\td\te\n' | expand -t 10,16
+a         b     c d e
+

4) Will the following code give back the original input? If not, is there an option that can help?

By default, the unexpand command converts only the initial blank characters (space or tab) to tabs. The -a option will allow you to convert all sequences of two or more blanks at tab boundaries.

$ printf 'a\tb\tc\n' | expand | unexpand
+a       b       c
+
+$ printf 'a\tb\tc\n' | expand | unexpand | cat -T
+a       b       c
+
+$ printf 'a\tb\tc\n' | expand | unexpand -a | cat -T
+a^Ib^Ic
+

5) How do the + and / prefix modifiers affect the -t option?

If you prefix a / character to the last width, the remaining tab characters will use multiple of this position instead of a single space default.

If you use + instead of / as the prefix for the last width, the multiple calculation will use the second last width as an offset.


basename and dirname

1) Is the following command valid? If so, what would be the output?

Yes, it is valid. Multiple slashes will be considered as a single slash. Any trailing slashes will be removed before determining the portion to be extracted.

$ basename -s.txt ~///test.txt///
+test
+

2) Given the file path in the shell variable p, how'd you obtain the outputs shown below?

$ p='~/projects/square_tictactoe/python/game.py'
+$ dirname $(dirname "$p")
+~/projects/square_tictactoe
+
+$ p='/backups/jan_2021.tar.gz'
+$ dirname $(dirname "$p")
+/
+

3) What would be the output of the basename command if the input has no leading directory component or only has the / character?

If there's no leading directory component or if slash alone is the input, the argument will be returned as is after removing any trailing slashes.

$ basename filename.txt
+filename.txt
+$ basename /
+/
+

4) For the paths stored in the shell variable p, how'd you obtain the outputs shown below?

$ p='/a/b/ip.txt /c/d/e/f/op.txt'
+
+# expected output 1
+$ basename -s'.txt' $p
+ip
+op
+
+# expected output 2
+$ dirname $p
+/a/b
+/c/d/e/f
+

5) Given the file path in the shell variable p, how'd you obtain the outputs shown below?

$ p='~/projects/python/square_tictactoe/game.py'
+$ basename $(dirname "$p")
+square_tictactoe
+
+$ p='/backups/aug_2024/ip.tar.gz'
+$ basename $(dirname "$p")
+aug_2024
+
\ No newline at end of file diff --git a/FontAwesome/css/font-awesome.css b/FontAwesome/css/font-awesome.css new file mode 100644 index 0000000..540440c --- /dev/null +++ b/FontAwesome/css/font-awesome.css @@ -0,0 +1,4 @@ +/*! + * Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome + * License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License) + */@font-face{font-family:'FontAwesome';src:url('../fonts/fontawesome-webfont.eot?v=4.7.0');src:url('../fonts/fontawesome-webfont.eot?#iefix&v=4.7.0') format('embedded-opentype'),url('../fonts/fontawesome-webfont.woff2?v=4.7.0') format('woff2'),url('../fonts/fontawesome-webfont.woff?v=4.7.0') format('woff'),url('../fonts/fontawesome-webfont.ttf?v=4.7.0') format('truetype'),url('../fonts/fontawesome-webfont.svg?v=4.7.0#fontawesomeregular') format('svg');font-weight:normal;font-style:normal}.fa{display:inline-block;font:normal normal normal 14px/1 FontAwesome;font-size:inherit;text-rendering:auto;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.fa-lg{font-size:1.33333333em;line-height:.75em;vertical-align:-15%}.fa-2x{font-size:2em}.fa-3x{font-size:3em}.fa-4x{font-size:4em}.fa-5x{font-size:5em}.fa-fw{width:1.28571429em;text-align:center}.fa-ul{padding-left:0;margin-left:2.14285714em;list-style-type:none}.fa-ul>li{position:relative}.fa-li{position:absolute;left:-2.14285714em;width:2.14285714em;top:.14285714em;text-align:center}.fa-li.fa-lg{left:-1.85714286em}.fa-border{padding:.2em .25em .15em;border:solid .08em #eee;border-radius:.1em}.fa-pull-left{float:left}.fa-pull-right{float:right}.fa.fa-pull-left{margin-right:.3em}.fa.fa-pull-right{margin-left:.3em}.pull-right{float:right}.pull-left{float:left}.fa.pull-left{margin-right:.3em}.fa.pull-right{margin-left:.3em}.fa-spin{-webkit-animation:fa-spin 2s infinite linear;animation:fa-spin 2s infinite linear}.fa-pulse{-webkit-animation:fa-spin 1s infinite steps(8);animation:fa-spin 1s infinite steps(8)}@-webkit-keyframes fa-spin{0%{-webkit-transform:rotate(0deg);transform:rotate(0deg)}100%{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}@keyframes fa-spin{0%{-webkit-transform:rotate(0deg);transform:rotate(0deg)}100%{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}.fa-rotate-90{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=1)";-webkit-transform:rotate(90deg);-ms-transform:rotate(90deg);transform:rotate(90deg)}.fa-rotate-180{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2)";-webkit-transform:rotate(180deg);-ms-transform:rotate(180deg);transform:rotate(180deg)}.fa-rotate-270{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=3)";-webkit-transform:rotate(270deg);-ms-transform:rotate(270deg);transform:rotate(270deg)}.fa-flip-horizontal{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=0, mirror=1)";-webkit-transform:scale(-1, 1);-ms-transform:scale(-1, 1);transform:scale(-1, 1)}.fa-flip-vertical{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2, mirror=1)";-webkit-transform:scale(1, -1);-ms-transform:scale(1, -1);transform:scale(1, -1)}:root .fa-rotate-90,:root .fa-rotate-180,:root .fa-rotate-270,:root .fa-flip-horizontal,:root .fa-flip-vertical{filter:none}.fa-stack{position:relative;display:inline-block;width:2em;height:2em;line-height:2em;vertical-align:middle}.fa-stack-1x,.fa-stack-2x{position:absolute;left:0;width:100%;text-align:center}.fa-stack-1x{line-height:inherit}.fa-stack-2x{font-size:2em}.fa-inverse{color:#fff}.fa-glass:before{content:"\f000"}.fa-music:before{content:"\f001"}.fa-search:before{content:"\f002"}.fa-envelope-o:before{content:"\f003"}.fa-heart:before{content:"\f004"}.fa-star:before{content:"\f005"}.fa-star-o:before{content:"\f006"}.fa-user:before{content:"\f007"}.fa-film:before{content:"\f008"}.fa-th-large:before{content:"\f009"}.fa-th:before{content:"\f00a"}.fa-th-list:before{content:"\f00b"}.fa-check:before{content:"\f00c"}.fa-remove:before,.fa-close:before,.fa-times:before{content:"\f00d"}.fa-search-plus:before{content:"\f00e"}.fa-search-minus:before{content:"\f010"}.fa-power-off:before{content:"\f011"}.fa-signal:before{content:"\f012"}.fa-gear:before,.fa-cog:before{content:"\f013"}.fa-trash-o:before{content:"\f014"}.fa-home:before{content:"\f015"}.fa-file-o:before{content:"\f016"}.fa-clock-o:before{content:"\f017"}.fa-road:before{content:"\f018"}.fa-download:before{content:"\f019"}.fa-arrow-circle-o-down:before{content:"\f01a"}.fa-arrow-circle-o-up:before{content:"\f01b"}.fa-inbox:before{content:"\f01c"}.fa-play-circle-o:before{content:"\f01d"}.fa-rotate-right:before,.fa-repeat:before{content:"\f01e"}.fa-refresh:before{content:"\f021"}.fa-list-alt:before{content:"\f022"}.fa-lock:before{content:"\f023"}.fa-flag:before{content:"\f024"}.fa-headphones:before{content:"\f025"}.fa-volume-off:before{content:"\f026"}.fa-volume-down:before{content:"\f027"}.fa-volume-up:before{content:"\f028"}.fa-qrcode:before{content:"\f029"}.fa-barcode:before{content:"\f02a"}.fa-tag:before{content:"\f02b"}.fa-tags:before{content:"\f02c"}.fa-book:before{content:"\f02d"}.fa-bookmark:before{content:"\f02e"}.fa-print:before{content:"\f02f"}.fa-camera:before{content:"\f030"}.fa-font:before{content:"\f031"}.fa-bold:before{content:"\f032"}.fa-italic:before{content:"\f033"}.fa-text-height:before{content:"\f034"}.fa-text-width:before{content:"\f035"}.fa-align-left:before{content:"\f036"}.fa-align-center:before{content:"\f037"}.fa-align-right:before{content:"\f038"}.fa-align-justify:before{content:"\f039"}.fa-list:before{content:"\f03a"}.fa-dedent:before,.fa-outdent:before{content:"\f03b"}.fa-indent:before{content:"\f03c"}.fa-video-camera:before{content:"\f03d"}.fa-photo:before,.fa-image:before,.fa-picture-o:before{content:"\f03e"}.fa-pencil:before{content:"\f040"}.fa-map-marker:before{content:"\f041"}.fa-adjust:before{content:"\f042"}.fa-tint:before{content:"\f043"}.fa-edit:before,.fa-pencil-square-o:before{content:"\f044"}.fa-share-square-o:before{content:"\f045"}.fa-check-square-o:before{content:"\f046"}.fa-arrows:before{content:"\f047"}.fa-step-backward:before{content:"\f048"}.fa-fast-backward:before{content:"\f049"}.fa-backward:before{content:"\f04a"}.fa-play:before{content:"\f04b"}.fa-pause:before{content:"\f04c"}.fa-stop:before{content:"\f04d"}.fa-forward:before{content:"\f04e"}.fa-fast-forward:before{content:"\f050"}.fa-step-forward:before{content:"\f051"}.fa-eject:before{content:"\f052"}.fa-chevron-left:before{content:"\f053"}.fa-chevron-right:before{content:"\f054"}.fa-plus-circle:before{content:"\f055"}.fa-minus-circle:before{content:"\f056"}.fa-times-circle:before{content:"\f057"}.fa-check-circle:before{content:"\f058"}.fa-question-circle:before{content:"\f059"}.fa-info-circle:before{content:"\f05a"}.fa-crosshairs:before{content:"\f05b"}.fa-times-circle-o:before{content:"\f05c"}.fa-check-circle-o:before{content:"\f05d"}.fa-ban:before{content:"\f05e"}.fa-arrow-left:before{content:"\f060"}.fa-arrow-right:before{content:"\f061"}.fa-arrow-up:before{content:"\f062"}.fa-arrow-down:before{content:"\f063"}.fa-mail-forward:before,.fa-share:before{content:"\f064"}.fa-expand:before{content:"\f065"}.fa-compress:before{content:"\f066"}.fa-plus:before{content:"\f067"}.fa-minus:before{content:"\f068"}.fa-asterisk:before{content:"\f069"}.fa-exclamation-circle:before{content:"\f06a"}.fa-gift:before{content:"\f06b"}.fa-leaf:before{content:"\f06c"}.fa-fire:before{content:"\f06d"}.fa-eye:before{content:"\f06e"}.fa-eye-slash:before{content:"\f070"}.fa-warning:before,.fa-exclamation-triangle:before{content:"\f071"}.fa-plane:before{content:"\f072"}.fa-calendar:before{content:"\f073"}.fa-random:before{content:"\f074"}.fa-comment:before{content:"\f075"}.fa-magnet:before{content:"\f076"}.fa-chevron-up:before{content:"\f077"}.fa-chevron-down:before{content:"\f078"}.fa-retweet:before{content:"\f079"}.fa-shopping-cart:before{content:"\f07a"}.fa-folder:before{content:"\f07b"}.fa-folder-open:before{content:"\f07c"}.fa-arrows-v:before{content:"\f07d"}.fa-arrows-h:before{content:"\f07e"}.fa-bar-chart-o:before,.fa-bar-chart:before{content:"\f080"}.fa-twitter-square:before{content:"\f081"}.fa-facebook-square:before{content:"\f082"}.fa-camera-retro:before{content:"\f083"}.fa-key:before{content:"\f084"}.fa-gears:before,.fa-cogs:before{content:"\f085"}.fa-comments:before{content:"\f086"}.fa-thumbs-o-up:before{content:"\f087"}.fa-thumbs-o-down:before{content:"\f088"}.fa-star-half:before{content:"\f089"}.fa-heart-o:before{content:"\f08a"}.fa-sign-out:before{content:"\f08b"}.fa-linkedin-square:before{content:"\f08c"}.fa-thumb-tack:before{content:"\f08d"}.fa-external-link:before{content:"\f08e"}.fa-sign-in:before{content:"\f090"}.fa-trophy:before{content:"\f091"}.fa-github-square:before{content:"\f092"}.fa-upload:before{content:"\f093"}.fa-lemon-o:before{content:"\f094"}.fa-phone:before{content:"\f095"}.fa-square-o:before{content:"\f096"}.fa-bookmark-o:before{content:"\f097"}.fa-phone-square:before{content:"\f098"}.fa-twitter:before{content:"\f099"}.fa-facebook-f:before,.fa-facebook:before{content:"\f09a"}.fa-github:before{content:"\f09b"}.fa-unlock:before{content:"\f09c"}.fa-credit-card:before{content:"\f09d"}.fa-feed:before,.fa-rss:before{content:"\f09e"}.fa-hdd-o:before{content:"\f0a0"}.fa-bullhorn:before{content:"\f0a1"}.fa-bell:before{content:"\f0f3"}.fa-certificate:before{content:"\f0a3"}.fa-hand-o-right:before{content:"\f0a4"}.fa-hand-o-left:before{content:"\f0a5"}.fa-hand-o-up:before{content:"\f0a6"}.fa-hand-o-down:before{content:"\f0a7"}.fa-arrow-circle-left:before{content:"\f0a8"}.fa-arrow-circle-right:before{content:"\f0a9"}.fa-arrow-circle-up:before{content:"\f0aa"}.fa-arrow-circle-down:before{content:"\f0ab"}.fa-globe:before{content:"\f0ac"}.fa-wrench:before{content:"\f0ad"}.fa-tasks:before{content:"\f0ae"}.fa-filter:before{content:"\f0b0"}.fa-briefcase:before{content:"\f0b1"}.fa-arrows-alt:before{content:"\f0b2"}.fa-group:before,.fa-users:before{content:"\f0c0"}.fa-chain:before,.fa-link:before{content:"\f0c1"}.fa-cloud:before{content:"\f0c2"}.fa-flask:before{content:"\f0c3"}.fa-cut:before,.fa-scissors:before{content:"\f0c4"}.fa-copy:before,.fa-files-o:before{content:"\f0c5"}.fa-paperclip:before{content:"\f0c6"}.fa-save:before,.fa-floppy-o:before{content:"\f0c7"}.fa-square:before{content:"\f0c8"}.fa-navicon:before,.fa-reorder:before,.fa-bars:before{content:"\f0c9"}.fa-list-ul:before{content:"\f0ca"}.fa-list-ol:before{content:"\f0cb"}.fa-strikethrough:before{content:"\f0cc"}.fa-underline:before{content:"\f0cd"}.fa-table:before{content:"\f0ce"}.fa-magic:before{content:"\f0d0"}.fa-truck:before{content:"\f0d1"}.fa-pinterest:before{content:"\f0d2"}.fa-pinterest-square:before{content:"\f0d3"}.fa-google-plus-square:before{content:"\f0d4"}.fa-google-plus:before{content:"\f0d5"}.fa-money:before{content:"\f0d6"}.fa-caret-down:before{content:"\f0d7"}.fa-caret-up:before{content:"\f0d8"}.fa-caret-left:before{content:"\f0d9"}.fa-caret-right:before{content:"\f0da"}.fa-columns:before{content:"\f0db"}.fa-unsorted:before,.fa-sort:before{content:"\f0dc"}.fa-sort-down:before,.fa-sort-desc:before{content:"\f0dd"}.fa-sort-up:before,.fa-sort-asc:before{content:"\f0de"}.fa-envelope:before{content:"\f0e0"}.fa-linkedin:before{content:"\f0e1"}.fa-rotate-left:before,.fa-undo:before{content:"\f0e2"}.fa-legal:before,.fa-gavel:before{content:"\f0e3"}.fa-dashboard:before,.fa-tachometer:before{content:"\f0e4"}.fa-comment-o:before{content:"\f0e5"}.fa-comments-o:before{content:"\f0e6"}.fa-flash:before,.fa-bolt:before{content:"\f0e7"}.fa-sitemap:before{content:"\f0e8"}.fa-umbrella:before{content:"\f0e9"}.fa-paste:before,.fa-clipboard:before{content:"\f0ea"}.fa-lightbulb-o:before{content:"\f0eb"}.fa-exchange:before{content:"\f0ec"}.fa-cloud-download:before{content:"\f0ed"}.fa-cloud-upload:before{content:"\f0ee"}.fa-user-md:before{content:"\f0f0"}.fa-stethoscope:before{content:"\f0f1"}.fa-suitcase:before{content:"\f0f2"}.fa-bell-o:before{content:"\f0a2"}.fa-coffee:before{content:"\f0f4"}.fa-cutlery:before{content:"\f0f5"}.fa-file-text-o:before{content:"\f0f6"}.fa-building-o:before{content:"\f0f7"}.fa-hospital-o:before{content:"\f0f8"}.fa-ambulance:before{content:"\f0f9"}.fa-medkit:before{content:"\f0fa"}.fa-fighter-jet:before{content:"\f0fb"}.fa-beer:before{content:"\f0fc"}.fa-h-square:before{content:"\f0fd"}.fa-plus-square:before{content:"\f0fe"}.fa-angle-double-left:before{content:"\f100"}.fa-angle-double-right:before{content:"\f101"}.fa-angle-double-up:before{content:"\f102"}.fa-angle-double-down:before{content:"\f103"}.fa-angle-left:before{content:"\f104"}.fa-angle-right:before{content:"\f105"}.fa-angle-up:before{content:"\f106"}.fa-angle-down:before{content:"\f107"}.fa-desktop:before{content:"\f108"}.fa-laptop:before{content:"\f109"}.fa-tablet:before{content:"\f10a"}.fa-mobile-phone:before,.fa-mobile:before{content:"\f10b"}.fa-circle-o:before{content:"\f10c"}.fa-quote-left:before{content:"\f10d"}.fa-quote-right:before{content:"\f10e"}.fa-spinner:before{content:"\f110"}.fa-circle:before{content:"\f111"}.fa-mail-reply:before,.fa-reply:before{content:"\f112"}.fa-github-alt:before{content:"\f113"}.fa-folder-o:before{content:"\f114"}.fa-folder-open-o:before{content:"\f115"}.fa-smile-o:before{content:"\f118"}.fa-frown-o:before{content:"\f119"}.fa-meh-o:before{content:"\f11a"}.fa-gamepad:before{content:"\f11b"}.fa-keyboard-o:before{content:"\f11c"}.fa-flag-o:before{content:"\f11d"}.fa-flag-checkered:before{content:"\f11e"}.fa-terminal:before{content:"\f120"}.fa-code:before{content:"\f121"}.fa-mail-reply-all:before,.fa-reply-all:before{content:"\f122"}.fa-star-half-empty:before,.fa-star-half-full:before,.fa-star-half-o:before{content:"\f123"}.fa-location-arrow:before{content:"\f124"}.fa-crop:before{content:"\f125"}.fa-code-fork:before{content:"\f126"}.fa-unlink:before,.fa-chain-broken:before{content:"\f127"}.fa-question:before{content:"\f128"}.fa-info:before{content:"\f129"}.fa-exclamation:before{content:"\f12a"}.fa-superscript:before{content:"\f12b"}.fa-subscript:before{content:"\f12c"}.fa-eraser:before{content:"\f12d"}.fa-puzzle-piece:before{content:"\f12e"}.fa-microphone:before{content:"\f130"}.fa-microphone-slash:before{content:"\f131"}.fa-shield:before{content:"\f132"}.fa-calendar-o:before{content:"\f133"}.fa-fire-extinguisher:before{content:"\f134"}.fa-rocket:before{content:"\f135"}.fa-maxcdn:before{content:"\f136"}.fa-chevron-circle-left:before{content:"\f137"}.fa-chevron-circle-right:before{content:"\f138"}.fa-chevron-circle-up:before{content:"\f139"}.fa-chevron-circle-down:before{content:"\f13a"}.fa-html5:before{content:"\f13b"}.fa-css3:before{content:"\f13c"}.fa-anchor:before{content:"\f13d"}.fa-unlock-alt:before{content:"\f13e"}.fa-bullseye:before{content:"\f140"}.fa-ellipsis-h:before{content:"\f141"}.fa-ellipsis-v:before{content:"\f142"}.fa-rss-square:before{content:"\f143"}.fa-play-circle:before{content:"\f144"}.fa-ticket:before{content:"\f145"}.fa-minus-square:before{content:"\f146"}.fa-minus-square-o:before{content:"\f147"}.fa-level-up:before{content:"\f148"}.fa-level-down:before{content:"\f149"}.fa-check-square:before{content:"\f14a"}.fa-pencil-square:before{content:"\f14b"}.fa-external-link-square:before{content:"\f14c"}.fa-share-square:before{content:"\f14d"}.fa-compass:before{content:"\f14e"}.fa-toggle-down:before,.fa-caret-square-o-down:before{content:"\f150"}.fa-toggle-up:before,.fa-caret-square-o-up:before{content:"\f151"}.fa-toggle-right:before,.fa-caret-square-o-right:before{content:"\f152"}.fa-euro:before,.fa-eur:before{content:"\f153"}.fa-gbp:before{content:"\f154"}.fa-dollar:before,.fa-usd:before{content:"\f155"}.fa-rupee:before,.fa-inr:before{content:"\f156"}.fa-cny:before,.fa-rmb:before,.fa-yen:before,.fa-jpy:before{content:"\f157"}.fa-ruble:before,.fa-rouble:before,.fa-rub:before{content:"\f158"}.fa-won:before,.fa-krw:before{content:"\f159"}.fa-bitcoin:before,.fa-btc:before{content:"\f15a"}.fa-file:before{content:"\f15b"}.fa-file-text:before{content:"\f15c"}.fa-sort-alpha-asc:before{content:"\f15d"}.fa-sort-alpha-desc:before{content:"\f15e"}.fa-sort-amount-asc:before{content:"\f160"}.fa-sort-amount-desc:before{content:"\f161"}.fa-sort-numeric-asc:before{content:"\f162"}.fa-sort-numeric-desc:before{content:"\f163"}.fa-thumbs-up:before{content:"\f164"}.fa-thumbs-down:before{content:"\f165"}.fa-youtube-square:before{content:"\f166"}.fa-youtube:before{content:"\f167"}.fa-xing:before{content:"\f168"}.fa-xing-square:before{content:"\f169"}.fa-youtube-play:before{content:"\f16a"}.fa-dropbox:before{content:"\f16b"}.fa-stack-overflow:before{content:"\f16c"}.fa-instagram:before{content:"\f16d"}.fa-flickr:before{content:"\f16e"}.fa-adn:before{content:"\f170"}.fa-bitbucket:before{content:"\f171"}.fa-bitbucket-square:before{content:"\f172"}.fa-tumblr:before{content:"\f173"}.fa-tumblr-square:before{content:"\f174"}.fa-long-arrow-down:before{content:"\f175"}.fa-long-arrow-up:before{content:"\f176"}.fa-long-arrow-left:before{content:"\f177"}.fa-long-arrow-right:before{content:"\f178"}.fa-apple:before{content:"\f179"}.fa-windows:before{content:"\f17a"}.fa-android:before{content:"\f17b"}.fa-linux:before{content:"\f17c"}.fa-dribbble:before{content:"\f17d"}.fa-skype:before{content:"\f17e"}.fa-foursquare:before{content:"\f180"}.fa-trello:before{content:"\f181"}.fa-female:before{content:"\f182"}.fa-male:before{content:"\f183"}.fa-gittip:before,.fa-gratipay:before{content:"\f184"}.fa-sun-o:before{content:"\f185"}.fa-moon-o:before{content:"\f186"}.fa-archive:before{content:"\f187"}.fa-bug:before{content:"\f188"}.fa-vk:before{content:"\f189"}.fa-weibo:before{content:"\f18a"}.fa-renren:before{content:"\f18b"}.fa-pagelines:before{content:"\f18c"}.fa-stack-exchange:before{content:"\f18d"}.fa-arrow-circle-o-right:before{content:"\f18e"}.fa-arrow-circle-o-left:before{content:"\f190"}.fa-toggle-left:before,.fa-caret-square-o-left:before{content:"\f191"}.fa-dot-circle-o:before{content:"\f192"}.fa-wheelchair:before{content:"\f193"}.fa-vimeo-square:before{content:"\f194"}.fa-turkish-lira:before,.fa-try:before{content:"\f195"}.fa-plus-square-o:before{content:"\f196"}.fa-space-shuttle:before{content:"\f197"}.fa-slack:before{content:"\f198"}.fa-envelope-square:before{content:"\f199"}.fa-wordpress:before{content:"\f19a"}.fa-openid:before{content:"\f19b"}.fa-institution:before,.fa-bank:before,.fa-university:before{content:"\f19c"}.fa-mortar-board:before,.fa-graduation-cap:before{content:"\f19d"}.fa-yahoo:before{content:"\f19e"}.fa-google:before{content:"\f1a0"}.fa-reddit:before{content:"\f1a1"}.fa-reddit-square:before{content:"\f1a2"}.fa-stumbleupon-circle:before{content:"\f1a3"}.fa-stumbleupon:before{content:"\f1a4"}.fa-delicious:before{content:"\f1a5"}.fa-digg:before{content:"\f1a6"}.fa-pied-piper-pp:before{content:"\f1a7"}.fa-pied-piper-alt:before{content:"\f1a8"}.fa-drupal:before{content:"\f1a9"}.fa-joomla:before{content:"\f1aa"}.fa-language:before{content:"\f1ab"}.fa-fax:before{content:"\f1ac"}.fa-building:before{content:"\f1ad"}.fa-child:before{content:"\f1ae"}.fa-paw:before{content:"\f1b0"}.fa-spoon:before{content:"\f1b1"}.fa-cube:before{content:"\f1b2"}.fa-cubes:before{content:"\f1b3"}.fa-behance:before{content:"\f1b4"}.fa-behance-square:before{content:"\f1b5"}.fa-steam:before{content:"\f1b6"}.fa-steam-square:before{content:"\f1b7"}.fa-recycle:before{content:"\f1b8"}.fa-automobile:before,.fa-car:before{content:"\f1b9"}.fa-cab:before,.fa-taxi:before{content:"\f1ba"}.fa-tree:before{content:"\f1bb"}.fa-spotify:before{content:"\f1bc"}.fa-deviantart:before{content:"\f1bd"}.fa-soundcloud:before{content:"\f1be"}.fa-database:before{content:"\f1c0"}.fa-file-pdf-o:before{content:"\f1c1"}.fa-file-word-o:before{content:"\f1c2"}.fa-file-excel-o:before{content:"\f1c3"}.fa-file-powerpoint-o:before{content:"\f1c4"}.fa-file-photo-o:before,.fa-file-picture-o:before,.fa-file-image-o:before{content:"\f1c5"}.fa-file-zip-o:before,.fa-file-archive-o:before{content:"\f1c6"}.fa-file-sound-o:before,.fa-file-audio-o:before{content:"\f1c7"}.fa-file-movie-o:before,.fa-file-video-o:before{content:"\f1c8"}.fa-file-code-o:before{content:"\f1c9"}.fa-vine:before{content:"\f1ca"}.fa-codepen:before{content:"\f1cb"}.fa-jsfiddle:before{content:"\f1cc"}.fa-life-bouy:before,.fa-life-buoy:before,.fa-life-saver:before,.fa-support:before,.fa-life-ring:before{content:"\f1cd"}.fa-circle-o-notch:before{content:"\f1ce"}.fa-ra:before,.fa-resistance:before,.fa-rebel:before{content:"\f1d0"}.fa-ge:before,.fa-empire:before{content:"\f1d1"}.fa-git-square:before{content:"\f1d2"}.fa-git:before{content:"\f1d3"}.fa-y-combinator-square:before,.fa-yc-square:before,.fa-hacker-news:before{content:"\f1d4"}.fa-tencent-weibo:before{content:"\f1d5"}.fa-qq:before{content:"\f1d6"}.fa-wechat:before,.fa-weixin:before{content:"\f1d7"}.fa-send:before,.fa-paper-plane:before{content:"\f1d8"}.fa-send-o:before,.fa-paper-plane-o:before{content:"\f1d9"}.fa-history:before{content:"\f1da"}.fa-circle-thin:before{content:"\f1db"}.fa-header:before{content:"\f1dc"}.fa-paragraph:before{content:"\f1dd"}.fa-sliders:before{content:"\f1de"}.fa-share-alt:before{content:"\f1e0"}.fa-share-alt-square:before{content:"\f1e1"}.fa-bomb:before{content:"\f1e2"}.fa-soccer-ball-o:before,.fa-futbol-o:before{content:"\f1e3"}.fa-tty:before{content:"\f1e4"}.fa-binoculars:before{content:"\f1e5"}.fa-plug:before{content:"\f1e6"}.fa-slideshare:before{content:"\f1e7"}.fa-twitch:before{content:"\f1e8"}.fa-yelp:before{content:"\f1e9"}.fa-newspaper-o:before{content:"\f1ea"}.fa-wifi:before{content:"\f1eb"}.fa-calculator:before{content:"\f1ec"}.fa-paypal:before{content:"\f1ed"}.fa-google-wallet:before{content:"\f1ee"}.fa-cc-visa:before{content:"\f1f0"}.fa-cc-mastercard:before{content:"\f1f1"}.fa-cc-discover:before{content:"\f1f2"}.fa-cc-amex:before{content:"\f1f3"}.fa-cc-paypal:before{content:"\f1f4"}.fa-cc-stripe:before{content:"\f1f5"}.fa-bell-slash:before{content:"\f1f6"}.fa-bell-slash-o:before{content:"\f1f7"}.fa-trash:before{content:"\f1f8"}.fa-copyright:before{content:"\f1f9"}.fa-at:before{content:"\f1fa"}.fa-eyedropper:before{content:"\f1fb"}.fa-paint-brush:before{content:"\f1fc"}.fa-birthday-cake:before{content:"\f1fd"}.fa-area-chart:before{content:"\f1fe"}.fa-pie-chart:before{content:"\f200"}.fa-line-chart:before{content:"\f201"}.fa-lastfm:before{content:"\f202"}.fa-lastfm-square:before{content:"\f203"}.fa-toggle-off:before{content:"\f204"}.fa-toggle-on:before{content:"\f205"}.fa-bicycle:before{content:"\f206"}.fa-bus:before{content:"\f207"}.fa-ioxhost:before{content:"\f208"}.fa-angellist:before{content:"\f209"}.fa-cc:before{content:"\f20a"}.fa-shekel:before,.fa-sheqel:before,.fa-ils:before{content:"\f20b"}.fa-meanpath:before{content:"\f20c"}.fa-buysellads:before{content:"\f20d"}.fa-connectdevelop:before{content:"\f20e"}.fa-dashcube:before{content:"\f210"}.fa-forumbee:before{content:"\f211"}.fa-leanpub:before{content:"\f212"}.fa-sellsy:before{content:"\f213"}.fa-shirtsinbulk:before{content:"\f214"}.fa-simplybuilt:before{content:"\f215"}.fa-skyatlas:before{content:"\f216"}.fa-cart-plus:before{content:"\f217"}.fa-cart-arrow-down:before{content:"\f218"}.fa-diamond:before{content:"\f219"}.fa-ship:before{content:"\f21a"}.fa-user-secret:before{content:"\f21b"}.fa-motorcycle:before{content:"\f21c"}.fa-street-view:before{content:"\f21d"}.fa-heartbeat:before{content:"\f21e"}.fa-venus:before{content:"\f221"}.fa-mars:before{content:"\f222"}.fa-mercury:before{content:"\f223"}.fa-intersex:before,.fa-transgender:before{content:"\f224"}.fa-transgender-alt:before{content:"\f225"}.fa-venus-double:before{content:"\f226"}.fa-mars-double:before{content:"\f227"}.fa-venus-mars:before{content:"\f228"}.fa-mars-stroke:before{content:"\f229"}.fa-mars-stroke-v:before{content:"\f22a"}.fa-mars-stroke-h:before{content:"\f22b"}.fa-neuter:before{content:"\f22c"}.fa-genderless:before{content:"\f22d"}.fa-facebook-official:before{content:"\f230"}.fa-pinterest-p:before{content:"\f231"}.fa-whatsapp:before{content:"\f232"}.fa-server:before{content:"\f233"}.fa-user-plus:before{content:"\f234"}.fa-user-times:before{content:"\f235"}.fa-hotel:before,.fa-bed:before{content:"\f236"}.fa-viacoin:before{content:"\f237"}.fa-train:before{content:"\f238"}.fa-subway:before{content:"\f239"}.fa-medium:before{content:"\f23a"}.fa-yc:before,.fa-y-combinator:before{content:"\f23b"}.fa-optin-monster:before{content:"\f23c"}.fa-opencart:before{content:"\f23d"}.fa-expeditedssl:before{content:"\f23e"}.fa-battery-4:before,.fa-battery:before,.fa-battery-full:before{content:"\f240"}.fa-battery-3:before,.fa-battery-three-quarters:before{content:"\f241"}.fa-battery-2:before,.fa-battery-half:before{content:"\f242"}.fa-battery-1:before,.fa-battery-quarter:before{content:"\f243"}.fa-battery-0:before,.fa-battery-empty:before{content:"\f244"}.fa-mouse-pointer:before{content:"\f245"}.fa-i-cursor:before{content:"\f246"}.fa-object-group:before{content:"\f247"}.fa-object-ungroup:before{content:"\f248"}.fa-sticky-note:before{content:"\f249"}.fa-sticky-note-o:before{content:"\f24a"}.fa-cc-jcb:before{content:"\f24b"}.fa-cc-diners-club:before{content:"\f24c"}.fa-clone:before{content:"\f24d"}.fa-balance-scale:before{content:"\f24e"}.fa-hourglass-o:before{content:"\f250"}.fa-hourglass-1:before,.fa-hourglass-start:before{content:"\f251"}.fa-hourglass-2:before,.fa-hourglass-half:before{content:"\f252"}.fa-hourglass-3:before,.fa-hourglass-end:before{content:"\f253"}.fa-hourglass:before{content:"\f254"}.fa-hand-grab-o:before,.fa-hand-rock-o:before{content:"\f255"}.fa-hand-stop-o:before,.fa-hand-paper-o:before{content:"\f256"}.fa-hand-scissors-o:before{content:"\f257"}.fa-hand-lizard-o:before{content:"\f258"}.fa-hand-spock-o:before{content:"\f259"}.fa-hand-pointer-o:before{content:"\f25a"}.fa-hand-peace-o:before{content:"\f25b"}.fa-trademark:before{content:"\f25c"}.fa-registered:before{content:"\f25d"}.fa-creative-commons:before{content:"\f25e"}.fa-gg:before{content:"\f260"}.fa-gg-circle:before{content:"\f261"}.fa-tripadvisor:before{content:"\f262"}.fa-odnoklassniki:before{content:"\f263"}.fa-odnoklassniki-square:before{content:"\f264"}.fa-get-pocket:before{content:"\f265"}.fa-wikipedia-w:before{content:"\f266"}.fa-safari:before{content:"\f267"}.fa-chrome:before{content:"\f268"}.fa-firefox:before{content:"\f269"}.fa-opera:before{content:"\f26a"}.fa-internet-explorer:before{content:"\f26b"}.fa-tv:before,.fa-television:before{content:"\f26c"}.fa-contao:before{content:"\f26d"}.fa-500px:before{content:"\f26e"}.fa-amazon:before{content:"\f270"}.fa-calendar-plus-o:before{content:"\f271"}.fa-calendar-minus-o:before{content:"\f272"}.fa-calendar-times-o:before{content:"\f273"}.fa-calendar-check-o:before{content:"\f274"}.fa-industry:before{content:"\f275"}.fa-map-pin:before{content:"\f276"}.fa-map-signs:before{content:"\f277"}.fa-map-o:before{content:"\f278"}.fa-map:before{content:"\f279"}.fa-commenting:before{content:"\f27a"}.fa-commenting-o:before{content:"\f27b"}.fa-houzz:before{content:"\f27c"}.fa-vimeo:before{content:"\f27d"}.fa-black-tie:before{content:"\f27e"}.fa-fonticons:before{content:"\f280"}.fa-reddit-alien:before{content:"\f281"}.fa-edge:before{content:"\f282"}.fa-credit-card-alt:before{content:"\f283"}.fa-codiepie:before{content:"\f284"}.fa-modx:before{content:"\f285"}.fa-fort-awesome:before{content:"\f286"}.fa-usb:before{content:"\f287"}.fa-product-hunt:before{content:"\f288"}.fa-mixcloud:before{content:"\f289"}.fa-scribd:before{content:"\f28a"}.fa-pause-circle:before{content:"\f28b"}.fa-pause-circle-o:before{content:"\f28c"}.fa-stop-circle:before{content:"\f28d"}.fa-stop-circle-o:before{content:"\f28e"}.fa-shopping-bag:before{content:"\f290"}.fa-shopping-basket:before{content:"\f291"}.fa-hashtag:before{content:"\f292"}.fa-bluetooth:before{content:"\f293"}.fa-bluetooth-b:before{content:"\f294"}.fa-percent:before{content:"\f295"}.fa-gitlab:before{content:"\f296"}.fa-wpbeginner:before{content:"\f297"}.fa-wpforms:before{content:"\f298"}.fa-envira:before{content:"\f299"}.fa-universal-access:before{content:"\f29a"}.fa-wheelchair-alt:before{content:"\f29b"}.fa-question-circle-o:before{content:"\f29c"}.fa-blind:before{content:"\f29d"}.fa-audio-description:before{content:"\f29e"}.fa-volume-control-phone:before{content:"\f2a0"}.fa-braille:before{content:"\f2a1"}.fa-assistive-listening-systems:before{content:"\f2a2"}.fa-asl-interpreting:before,.fa-american-sign-language-interpreting:before{content:"\f2a3"}.fa-deafness:before,.fa-hard-of-hearing:before,.fa-deaf:before{content:"\f2a4"}.fa-glide:before{content:"\f2a5"}.fa-glide-g:before{content:"\f2a6"}.fa-signing:before,.fa-sign-language:before{content:"\f2a7"}.fa-low-vision:before{content:"\f2a8"}.fa-viadeo:before{content:"\f2a9"}.fa-viadeo-square:before{content:"\f2aa"}.fa-snapchat:before{content:"\f2ab"}.fa-snapchat-ghost:before{content:"\f2ac"}.fa-snapchat-square:before{content:"\f2ad"}.fa-pied-piper:before{content:"\f2ae"}.fa-first-order:before{content:"\f2b0"}.fa-yoast:before{content:"\f2b1"}.fa-themeisle:before{content:"\f2b2"}.fa-google-plus-circle:before,.fa-google-plus-official:before{content:"\f2b3"}.fa-fa:before,.fa-font-awesome:before{content:"\f2b4"}.fa-handshake-o:before{content:"\f2b5"}.fa-envelope-open:before{content:"\f2b6"}.fa-envelope-open-o:before{content:"\f2b7"}.fa-linode:before{content:"\f2b8"}.fa-address-book:before{content:"\f2b9"}.fa-address-book-o:before{content:"\f2ba"}.fa-vcard:before,.fa-address-card:before{content:"\f2bb"}.fa-vcard-o:before,.fa-address-card-o:before{content:"\f2bc"}.fa-user-circle:before{content:"\f2bd"}.fa-user-circle-o:before{content:"\f2be"}.fa-user-o:before{content:"\f2c0"}.fa-id-badge:before{content:"\f2c1"}.fa-drivers-license:before,.fa-id-card:before{content:"\f2c2"}.fa-drivers-license-o:before,.fa-id-card-o:before{content:"\f2c3"}.fa-quora:before{content:"\f2c4"}.fa-free-code-camp:before{content:"\f2c5"}.fa-telegram:before{content:"\f2c6"}.fa-thermometer-4:before,.fa-thermometer:before,.fa-thermometer-full:before{content:"\f2c7"}.fa-thermometer-3:before,.fa-thermometer-three-quarters:before{content:"\f2c8"}.fa-thermometer-2:before,.fa-thermometer-half:before{content:"\f2c9"}.fa-thermometer-1:before,.fa-thermometer-quarter:before{content:"\f2ca"}.fa-thermometer-0:before,.fa-thermometer-empty:before{content:"\f2cb"}.fa-shower:before{content:"\f2cc"}.fa-bathtub:before,.fa-s15:before,.fa-bath:before{content:"\f2cd"}.fa-podcast:before{content:"\f2ce"}.fa-window-maximize:before{content:"\f2d0"}.fa-window-minimize:before{content:"\f2d1"}.fa-window-restore:before{content:"\f2d2"}.fa-times-rectangle:before,.fa-window-close:before{content:"\f2d3"}.fa-times-rectangle-o:before,.fa-window-close-o:before{content:"\f2d4"}.fa-bandcamp:before{content:"\f2d5"}.fa-grav:before{content:"\f2d6"}.fa-etsy:before{content:"\f2d7"}.fa-imdb:before{content:"\f2d8"}.fa-ravelry:before{content:"\f2d9"}.fa-eercast:before{content:"\f2da"}.fa-microchip:before{content:"\f2db"}.fa-snowflake-o:before{content:"\f2dc"}.fa-superpowers:before{content:"\f2dd"}.fa-wpexplorer:before{content:"\f2de"}.fa-meetup:before{content:"\f2e0"}.sr-only{position:absolute;width:1px;height:1px;padding:0;margin:-1px;overflow:hidden;clip:rect(0, 0, 0, 0);border:0}.sr-only-focusable:active,.sr-only-focusable:focus{position:static;width:auto;height:auto;margin:0;overflow:visible;clip:auto} diff --git a/FontAwesome/fonts/FontAwesome.ttf b/FontAwesome/fonts/FontAwesome.ttf new file mode 100644 index 0000000..35acda2 Binary files /dev/null and b/FontAwesome/fonts/FontAwesome.ttf differ diff --git a/FontAwesome/fonts/fontawesome-webfont.eot b/FontAwesome/fonts/fontawesome-webfont.eot new file mode 100644 index 0000000..e9f60ca Binary files /dev/null and b/FontAwesome/fonts/fontawesome-webfont.eot differ diff --git a/FontAwesome/fonts/fontawesome-webfont.svg b/FontAwesome/fonts/fontawesome-webfont.svg new file mode 100644 index 0000000..855c845 --- /dev/null +++ b/FontAwesome/fonts/fontawesome-webfont.svg @@ -0,0 +1,2671 @@ + + + + +Created by FontForge 20120731 at Mon Oct 24 17:37:40 2016 + By ,,, +Copyright Dave Gandy 2016. All rights reserved. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/FontAwesome/fonts/fontawesome-webfont.ttf b/FontAwesome/fonts/fontawesome-webfont.ttf new file mode 100644 index 0000000..35acda2 Binary files /dev/null and b/FontAwesome/fonts/fontawesome-webfont.ttf differ diff --git a/FontAwesome/fonts/fontawesome-webfont.woff b/FontAwesome/fonts/fontawesome-webfont.woff new file mode 100644 index 0000000..400014a Binary files /dev/null and b/FontAwesome/fonts/fontawesome-webfont.woff differ diff --git a/FontAwesome/fonts/fontawesome-webfont.woff2 b/FontAwesome/fonts/fontawesome-webfont.woff2 new file mode 100644 index 0000000..4d13fc6 Binary files /dev/null and b/FontAwesome/fonts/fontawesome-webfont.woff2 differ diff --git a/LICENSE b/LICENSE deleted file mode 100644 index 36b1c65..0000000 --- a/LICENSE +++ /dev/null @@ -1,21 +0,0 @@ -MIT License - -Copyright (c) 2021 Sundeep Agarwal - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. diff --git a/README.md b/README.md deleted file mode 100644 index ef9440d..0000000 --- a/README.md +++ /dev/null @@ -1,2 +0,0 @@ -# cli_text_processing_coreutils -Command line text processing with GNU Coreutils diff --git a/ayu-highlight.css b/ayu-highlight.css new file mode 100644 index 0000000..32c9432 --- /dev/null +++ b/ayu-highlight.css @@ -0,0 +1,78 @@ +/* +Based off of the Ayu theme +Original by Dempfi (https://github.com/dempfi/ayu) +*/ + +.hljs { + display: block; + overflow-x: auto; + background: #191f26; + color: #e6e1cf; +} + +.hljs-comment, +.hljs-quote { + color: #5c6773; + font-style: italic; +} + +.hljs-variable, +.hljs-template-variable, +.hljs-attribute, +.hljs-attr, +.hljs-regexp, +.hljs-link, +.hljs-selector-id, +.hljs-selector-class { + color: #ff7733; +} + +.hljs-number, +.hljs-meta, +.hljs-builtin-name, +.hljs-literal, +.hljs-type, +.hljs-params { + color: #ffee99; +} + +.hljs-string, +.hljs-bullet { + color: #b8cc52; +} + +.hljs-title, +.hljs-built_in, +.hljs-section { + color: #ffb454; +} + +.hljs-keyword, +.hljs-selector-tag, +.hljs-symbol { + color: #ff7733; +} + +.hljs-name { + color: #36a3d9; +} + +.hljs-tag { + color: #00568d; +} + +.hljs-emphasis { + font-style: italic; +} + +.hljs-strong { + font-weight: bold; +} + +.hljs-addition { + color: #91b362; +} + +.hljs-deletion { + color: #d96c75; +} diff --git a/basename-dirname.html b/basename-dirname.html new file mode 100644 index 0000000..37be55c --- /dev/null +++ b/basename-dirname.html @@ -0,0 +1,119 @@ +basename and dirname - CLI text processing with GNU Coreutils

basename and dirname

These handy commands allow you to extract filenames and directory portions of the given paths. You could also use Parameter Expansion or cut, sed, awk, etc for such purposes. The advantage is that these commands will also handle corner cases like trailing slashes and there are handy features like removing file extensions.

Extract filename from paths

By default, the basename command will remove the leading directory component from the given path argument. Any trailing slashes will be removed before determining the portion to be extracted.

$ basename /home/learnbyexample/example_files/scores.csv
+scores.csv
+
+# quote the arguments when needed
+$ basename 'path with spaces/report.log'
+report.log
+
+# one or more trailing slashes will not affect the output
+$ basename /home/learnbyexample/example_files/
+example_files
+

If there's no leading directory component or if slash alone is the input, the argument will be returned as is after removing any trailing slashes.

$ basename filename.txt
+filename.txt
+$ basename /
+/
+

Remove file extension

You can use the -s option to remove a suffix from the filename. Usually used to remove the file extension.

$ basename -s'.csv' /home/learnbyexample/example_files/scores.csv
+scores
+
+$ basename -s'_2' final_report.txt_2
+final_report.txt
+
+$ basename -s'.tar.gz' /backups/jan_2021.tar.gz
+jan_2021
+
+$ basename -s'.txt' purchases.txt.txt
+purchases.txt
+
+# -s will be ignored if it would have resulted in an empty output
+$ basename -s'report' /backups/report
+report
+

You can also pass the suffix to be removed after the path argument, but the -s option is preferred as it makes the intention clearer and works for multiple path arguments.

$ basename example_files/scores.csv .csv
+scores
+

Remove filename from path

By default, the dirname command removes the trailing path component (after removing any trailing slashes).

$ dirname /home/learnbyexample/example_files/scores.csv
+/home/learnbyexample/example_files
+
+# one or more trailing slashes will not affect the output
+$ dirname /home/learnbyexample/example_files/
+/home/learnbyexample
+

Multiple arguments

The dirname command accepts multiple path arguments by default. The basename command requires -a or -s (which implies -a) to work with multiple arguments.

$ basename -a /backups/jan_2021.tar.gz /home/learnbyexample/report.log
+jan_2021.tar.gz
+report.log
+
+# -a is implied when the -s option is used
+$ basename -s'.txt' logs/purchases.txt logs/report.txt
+purchases
+report
+
+# dirname accepts multiple path arguments by default
+$ dirname /home/learnbyexample/example_files/scores.csv ../report/backups/
+/home/learnbyexample/example_files
+../report
+

Combining basename and dirname

You can use shell features like command substitution to combine the effects of the basename and dirname commands.

# extract the second last path component
+$ basename $(dirname /home/learnbyexample/example_files/scores.csv)
+example_files
+

NUL separator

Use the -z option if you want to use NUL character as the output path separator.


+$ basename -zs'.txt' logs/purchases.txt logs/report.txt | cat -v
+purchases^@report^@
+
+$ basename -z logs/purchases.txt | cat -v
+purchases.txt^@
+
+$ dirname -z example_files/scores.csv ../report/backups/ | cat -v
+example_files^@../report^@
+

Exercises

1) Is the following command valid? If so, what would be the output?

$ basename -s.txt ~///test.txt///
+

2) Given the file path in the shell variable p, how'd you obtain the outputs shown below?

$ p='~/projects/square_tictactoe/python/game.py'
+##### add your solution here
+~/projects/square_tictactoe
+
+$ p='/backups/jan_2021.tar.gz'
+##### add your solution here
+/
+

3) What would be the output of the basename command if the input has no leading directory component or only has the / character?

4) For the paths stored in the shell variable p, how'd you obtain the outputs shown below?

$ p='/a/b/ip.txt /c/d/e/f/op.txt'
+
+# expected output 1
+##### add your solution here
+ip
+op
+
+# expected output 2
+##### add your solution here
+/a/b
+/c/d/e/f
+

5) Given the file path in the shell variable p, how'd you obtain the outputs shown below?

$ p='~/projects/python/square_tictactoe/game.py'
+##### add your solution here
+square_tictactoe
+
+$ p='/backups/aug_2024/ip.tar.gz'
+##### add your solution here
+aug_2024
+
\ No newline at end of file diff --git a/book.js b/book.js new file mode 100644 index 0000000..d40440c --- /dev/null +++ b/book.js @@ -0,0 +1,679 @@ +"use strict"; + +// Fix back button cache problem +window.onunload = function () { }; + +// Global variable, shared between modules +function playground_text(playground) { + let code_block = playground.querySelector("code"); + + if (window.ace && code_block.classList.contains("editable")) { + let editor = window.ace.edit(code_block); + return editor.getValue(); + } else { + return code_block.textContent; + } +} + +(function codeSnippets() { + function fetch_with_timeout(url, options, timeout = 6000) { + return Promise.race([ + fetch(url, options), + new Promise((_, reject) => setTimeout(() => reject(new Error('timeout')), timeout)) + ]); + } + + var playgrounds = Array.from(document.querySelectorAll(".playground")); + if (playgrounds.length > 0) { + fetch_with_timeout("https://play.rust-lang.org/meta/crates", { + headers: { + 'Content-Type': "application/json", + }, + method: 'POST', + mode: 'cors', + }) + .then(response => response.json()) + .then(response => { + // get list of crates available in the rust playground + let playground_crates = response.crates.map(item => item["id"]); + playgrounds.forEach(block => handle_crate_list_update(block, playground_crates)); + }); + } + + function handle_crate_list_update(playground_block, playground_crates) { + // update the play buttons after receiving the response + update_play_button(playground_block, playground_crates); + + // and install on change listener to dynamically update ACE editors + if (window.ace) { + let code_block = playground_block.querySelector("code"); + if (code_block.classList.contains("editable")) { + let editor = window.ace.edit(code_block); + editor.addEventListener("change", function (e) { + update_play_button(playground_block, playground_crates); + }); + // add Ctrl-Enter command to execute rust code + editor.commands.addCommand({ + name: "run", + bindKey: { + win: "Ctrl-Enter", + mac: "Ctrl-Enter" + }, + exec: _editor => run_rust_code(playground_block) + }); + } + } + } + + // updates the visibility of play button based on `no_run` class and + // used crates vs ones available on http://play.rust-lang.org + function update_play_button(pre_block, playground_crates) { + var play_button = pre_block.querySelector(".play-button"); + + // skip if code is `no_run` + if (pre_block.querySelector('code').classList.contains("no_run")) { + play_button.classList.add("hidden"); + return; + } + + // get list of `extern crate`'s from snippet + var txt = playground_text(pre_block); + var re = /extern\s+crate\s+([a-zA-Z_0-9]+)\s*;/g; + var snippet_crates = []; + var item; + while (item = re.exec(txt)) { + snippet_crates.push(item[1]); + } + + // check if all used crates are available on play.rust-lang.org + var all_available = snippet_crates.every(function (elem) { + return playground_crates.indexOf(elem) > -1; + }); + + if (all_available) { + play_button.classList.remove("hidden"); + } else { + play_button.classList.add("hidden"); + } + } + + function run_rust_code(code_block) { + var result_block = code_block.querySelector(".result"); + if (!result_block) { + result_block = document.createElement('code'); + result_block.className = 'result hljs language-bash'; + + code_block.append(result_block); + } + + let text = playground_text(code_block); + let classes = code_block.querySelector('code').classList; + let edition = "2015"; + if(classes.contains("edition2018")) { + edition = "2018"; + } else if(classes.contains("edition2021")) { + edition = "2021"; + } + var params = { + version: "stable", + optimize: "0", + code: text, + edition: edition + }; + + if (text.indexOf("#![feature") !== -1) { + params.version = "nightly"; + } + + result_block.innerText = "Running..."; + + fetch_with_timeout("https://play.rust-lang.org/evaluate.json", { + headers: { + 'Content-Type': "application/json", + }, + method: 'POST', + mode: 'cors', + body: JSON.stringify(params) + }) + .then(response => response.json()) + .then(response => { + if (response.result.trim() === '') { + result_block.innerText = "No output"; + result_block.classList.add("result-no-output"); + } else { + result_block.innerText = response.result; + result_block.classList.remove("result-no-output"); + } + }) + .catch(error => result_block.innerText = "Playground Communication: " + error.message); + } + + // Syntax highlighting Configuration + hljs.configure({ + tabReplace: ' ', // 4 spaces + languages: [], // Languages used for auto-detection + }); + + let code_nodes = Array + .from(document.querySelectorAll('code')) + // Don't highlight `inline code` blocks in headers. + .filter(function (node) {return !node.parentElement.classList.contains("header"); }); + + if (window.ace) { + // language-rust class needs to be removed for editable + // blocks or highlightjs will capture events + code_nodes + .filter(function (node) {return node.classList.contains("editable"); }) + .forEach(function (block) { block.classList.remove('language-rust'); }); + + Array + code_nodes + .filter(function (node) {return !node.classList.contains("editable"); }) + .forEach(function (block) { hljs.highlightBlock(block); }); + } else { + code_nodes.forEach(function (block) { hljs.highlightBlock(block); }); + } + + // Adding the hljs class gives code blocks the color css + // even if highlighting doesn't apply + code_nodes.forEach(function (block) { block.classList.add('hljs'); }); + + Array.from(document.querySelectorAll("code.language-rust")).forEach(function (block) { + + var lines = Array.from(block.querySelectorAll('.boring')); + // If no lines were hidden, return + if (!lines.length) { return; } + block.classList.add("hide-boring"); + + var buttons = document.createElement('div'); + buttons.className = 'buttons'; + buttons.innerHTML = ""; + + // add expand button + var pre_block = block.parentNode; + pre_block.insertBefore(buttons, pre_block.firstChild); + + pre_block.querySelector('.buttons').addEventListener('click', function (e) { + if (e.target.classList.contains('fa-eye')) { + e.target.classList.remove('fa-eye'); + e.target.classList.add('fa-eye-slash'); + e.target.title = 'Hide lines'; + e.target.setAttribute('aria-label', e.target.title); + + block.classList.remove('hide-boring'); + } else if (e.target.classList.contains('fa-eye-slash')) { + e.target.classList.remove('fa-eye-slash'); + e.target.classList.add('fa-eye'); + e.target.title = 'Show hidden lines'; + e.target.setAttribute('aria-label', e.target.title); + + block.classList.add('hide-boring'); + } + }); + }); + + if (window.playground_copyable) { + Array.from(document.querySelectorAll('pre code')).forEach(function (block) { + var pre_block = block.parentNode; + if (!pre_block.classList.contains('playground')) { + var buttons = pre_block.querySelector(".buttons"); + if (!buttons) { + buttons = document.createElement('div'); + buttons.className = 'buttons'; + pre_block.insertBefore(buttons, pre_block.firstChild); + } + + var clipButton = document.createElement('button'); + clipButton.className = 'fa fa-copy clip-button'; + clipButton.title = 'Copy to clipboard'; + clipButton.setAttribute('aria-label', clipButton.title); + clipButton.innerHTML = ''; + + buttons.insertBefore(clipButton, buttons.firstChild); + } + }); + } + + // Process playground code blocks + Array.from(document.querySelectorAll(".playground")).forEach(function (pre_block) { + // Add play button + var buttons = pre_block.querySelector(".buttons"); + if (!buttons) { + buttons = document.createElement('div'); + buttons.className = 'buttons'; + pre_block.insertBefore(buttons, pre_block.firstChild); + } + + var runCodeButton = document.createElement('button'); + runCodeButton.className = 'fa fa-play play-button'; + runCodeButton.hidden = true; + runCodeButton.title = 'Run this code'; + runCodeButton.setAttribute('aria-label', runCodeButton.title); + + buttons.insertBefore(runCodeButton, buttons.firstChild); + runCodeButton.addEventListener('click', function (e) { + run_rust_code(pre_block); + }); + + if (window.playground_copyable) { + var copyCodeClipboardButton = document.createElement('button'); + copyCodeClipboardButton.className = 'fa fa-copy clip-button'; + copyCodeClipboardButton.innerHTML = ''; + copyCodeClipboardButton.title = 'Copy to clipboard'; + copyCodeClipboardButton.setAttribute('aria-label', copyCodeClipboardButton.title); + + buttons.insertBefore(copyCodeClipboardButton, buttons.firstChild); + } + + let code_block = pre_block.querySelector("code"); + if (window.ace && code_block.classList.contains("editable")) { + var undoChangesButton = document.createElement('button'); + undoChangesButton.className = 'fa fa-history reset-button'; + undoChangesButton.title = 'Undo changes'; + undoChangesButton.setAttribute('aria-label', undoChangesButton.title); + + buttons.insertBefore(undoChangesButton, buttons.firstChild); + + undoChangesButton.addEventListener('click', function () { + let editor = window.ace.edit(code_block); + editor.setValue(editor.originalCode); + editor.clearSelection(); + }); + } + }); +})(); + +(function themes() { + var html = document.querySelector('html'); + var themeToggleButton = document.getElementById('theme-toggle'); + var themePopup = document.getElementById('theme-list'); + var themeColorMetaTag = document.querySelector('meta[name="theme-color"]'); + var stylesheets = { + ayuHighlight: document.querySelector("[href$='ayu-highlight.css']"), + tomorrowNight: document.querySelector("[href$='tomorrow-night.css']"), + highlight: document.querySelector("[href$='highlight.css']"), + }; + + function showThemes() { + themePopup.style.display = 'block'; + themeToggleButton.setAttribute('aria-expanded', true); + themePopup.querySelector("button#" + get_theme()).focus(); + } + + function hideThemes() { + themePopup.style.display = 'none'; + themeToggleButton.setAttribute('aria-expanded', false); + themeToggleButton.focus(); + } + + function get_theme() { + var theme; + try { theme = localStorage.getItem('mdbook-theme'); } catch (e) { } + if (theme === null || theme === undefined) { + return default_theme; + } else { + return theme; + } + } + + function set_theme(theme, store = true) { + let ace_theme; + + if (theme == 'coal' || theme == 'navy') { + stylesheets.ayuHighlight.disabled = true; + stylesheets.tomorrowNight.disabled = false; + stylesheets.highlight.disabled = true; + + ace_theme = "ace/theme/tomorrow_night"; + } else if (theme == 'ayu') { + stylesheets.ayuHighlight.disabled = false; + stylesheets.tomorrowNight.disabled = true; + stylesheets.highlight.disabled = true; + ace_theme = "ace/theme/tomorrow_night"; + } else { + stylesheets.ayuHighlight.disabled = true; + stylesheets.tomorrowNight.disabled = true; + stylesheets.highlight.disabled = false; + ace_theme = "ace/theme/dawn"; + } + + setTimeout(function () { + themeColorMetaTag.content = getComputedStyle(document.body).backgroundColor; + }, 1); + + if (window.ace && window.editors) { + window.editors.forEach(function (editor) { + editor.setTheme(ace_theme); + }); + } + + var previousTheme = get_theme(); + + if (store) { + try { localStorage.setItem('mdbook-theme', theme); } catch (e) { } + } + + html.classList.remove(previousTheme); + html.classList.add(theme); + } + + // Set theme + var theme = get_theme(); + + set_theme(theme, false); + + themeToggleButton.addEventListener('click', function () { + if (themePopup.style.display === 'block') { + hideThemes(); + } else { + showThemes(); + } + }); + + themePopup.addEventListener('click', function (e) { + var theme; + if (e.target.className === "theme") { + theme = e.target.id; + } else if (e.target.parentElement.className === "theme") { + theme = e.target.parentElement.id; + } else { + return; + } + set_theme(theme); + }); + + themePopup.addEventListener('focusout', function(e) { + // e.relatedTarget is null in Safari and Firefox on macOS (see workaround below) + if (!!e.relatedTarget && !themeToggleButton.contains(e.relatedTarget) && !themePopup.contains(e.relatedTarget)) { + hideThemes(); + } + }); + + // Should not be needed, but it works around an issue on macOS & iOS: https://github.com/rust-lang/mdBook/issues/628 + document.addEventListener('click', function(e) { + if (themePopup.style.display === 'block' && !themeToggleButton.contains(e.target) && !themePopup.contains(e.target)) { + hideThemes(); + } + }); + + document.addEventListener('keydown', function (e) { + if (e.altKey || e.ctrlKey || e.metaKey || e.shiftKey) { return; } + if (!themePopup.contains(e.target)) { return; } + + switch (e.key) { + case 'Escape': + e.preventDefault(); + hideThemes(); + break; + case 'ArrowUp': + e.preventDefault(); + var li = document.activeElement.parentElement; + if (li && li.previousElementSibling) { + li.previousElementSibling.querySelector('button').focus(); + } + break; + case 'ArrowDown': + e.preventDefault(); + var li = document.activeElement.parentElement; + if (li && li.nextElementSibling) { + li.nextElementSibling.querySelector('button').focus(); + } + break; + case 'Home': + e.preventDefault(); + themePopup.querySelector('li:first-child button').focus(); + break; + case 'End': + e.preventDefault(); + themePopup.querySelector('li:last-child button').focus(); + break; + } + }); +})(); + +(function sidebar() { + var html = document.querySelector("html"); + var sidebar = document.getElementById("sidebar"); + var sidebarLinks = document.querySelectorAll('#sidebar a'); + var sidebarToggleButton = document.getElementById("sidebar-toggle"); + var sidebarResizeHandle = document.getElementById("sidebar-resize-handle"); + var firstContact = null; + + function showSidebar() { + html.classList.remove('sidebar-hidden') + html.classList.add('sidebar-visible'); + Array.from(sidebarLinks).forEach(function (link) { + link.setAttribute('tabIndex', 0); + }); + sidebarToggleButton.setAttribute('aria-expanded', true); + sidebar.setAttribute('aria-hidden', false); + try { localStorage.setItem('mdbook-sidebar', 'visible'); } catch (e) { } + } + + + var sidebarAnchorToggles = document.querySelectorAll('#sidebar a.toggle'); + + function toggleSection(ev) { + ev.currentTarget.parentElement.classList.toggle('expanded'); + } + + Array.from(sidebarAnchorToggles).forEach(function (el) { + el.addEventListener('click', toggleSection); + }); + + function hideSidebar() { + html.classList.remove('sidebar-visible') + html.classList.add('sidebar-hidden'); + Array.from(sidebarLinks).forEach(function (link) { + link.setAttribute('tabIndex', -1); + }); + sidebarToggleButton.setAttribute('aria-expanded', false); + sidebar.setAttribute('aria-hidden', true); + try { localStorage.setItem('mdbook-sidebar', 'hidden'); } catch (e) { } + } + + // Toggle sidebar + sidebarToggleButton.addEventListener('click', function sidebarToggle() { + if (html.classList.contains("sidebar-hidden")) { + var current_width = parseInt( + document.documentElement.style.getPropertyValue('--sidebar-width'), 10); + if (current_width < 150) { + document.documentElement.style.setProperty('--sidebar-width', '150px'); + } + showSidebar(); + } else if (html.classList.contains("sidebar-visible")) { + hideSidebar(); + } else { + if (getComputedStyle(sidebar)['transform'] === 'none') { + hideSidebar(); + } else { + showSidebar(); + } + } + }); + + sidebarResizeHandle.addEventListener('mousedown', initResize, false); + + function initResize(e) { + window.addEventListener('mousemove', resize, false); + window.addEventListener('mouseup', stopResize, false); + html.classList.add('sidebar-resizing'); + } + function resize(e) { + var pos = (e.clientX - sidebar.offsetLeft); + if (pos < 20) { + hideSidebar(); + } else { + if (html.classList.contains("sidebar-hidden")) { + showSidebar(); + } + pos = Math.min(pos, window.innerWidth - 100); + document.documentElement.style.setProperty('--sidebar-width', pos + 'px'); + } + } + //on mouseup remove windows functions mousemove & mouseup + function stopResize(e) { + html.classList.remove('sidebar-resizing'); + window.removeEventListener('mousemove', resize, false); + window.removeEventListener('mouseup', stopResize, false); + } + + document.addEventListener('touchstart', function (e) { + firstContact = { + x: e.touches[0].clientX, + time: Date.now() + }; + }, { passive: true }); + + document.addEventListener('touchmove', function (e) { + if (!firstContact) + return; + + var curX = e.touches[0].clientX; + var xDiff = curX - firstContact.x, + tDiff = Date.now() - firstContact.time; + + if (tDiff < 250 && Math.abs(xDiff) >= 150) { + if (xDiff >= 0 && firstContact.x < Math.min(document.body.clientWidth * 0.25, 300)) + showSidebar(); + else if (xDiff < 0 && curX < 300) + hideSidebar(); + + firstContact = null; + } + }, { passive: true }); + + // Scroll sidebar to current active section + var activeSection = document.getElementById("sidebar").querySelector(".active"); + if (activeSection) { + // https://developer.mozilla.org/en-US/docs/Web/API/Element/scrollIntoView + activeSection.scrollIntoView({ block: 'center' }); + } +})(); + +(function chapterNavigation() { + document.addEventListener('keydown', function (e) { + if (e.altKey || e.ctrlKey || e.metaKey || e.shiftKey) { return; } + if (window.search && window.search.hasFocus()) { return; } + + switch (e.key) { + case 'ArrowRight': + e.preventDefault(); + var nextButton = document.querySelector('.nav-chapters.next'); + if (nextButton) { + window.location.href = nextButton.href; + } + break; + case 'ArrowLeft': + e.preventDefault(); + var previousButton = document.querySelector('.nav-chapters.previous'); + if (previousButton) { + window.location.href = previousButton.href; + } + break; + } + }); +})(); + +(function clipboard() { + var clipButtons = document.querySelectorAll('.clip-button'); + + function hideTooltip(elem) { + elem.firstChild.innerText = ""; + elem.className = 'fa fa-copy clip-button'; + } + + function showTooltip(elem, msg) { + elem.firstChild.innerText = msg; + elem.className = 'fa fa-copy tooltipped'; + } + + var clipboardSnippets = new ClipboardJS('.clip-button', { + text: function (trigger) { + hideTooltip(trigger); + let playground = trigger.closest("pre"); + return playground_text(playground); + } + }); + + Array.from(clipButtons).forEach(function (clipButton) { + clipButton.addEventListener('mouseout', function (e) { + hideTooltip(e.currentTarget); + }); + }); + + clipboardSnippets.on('success', function (e) { + e.clearSelection(); + showTooltip(e.trigger, "Copied!"); + }); + + clipboardSnippets.on('error', function (e) { + showTooltip(e.trigger, "Clipboard error!"); + }); +})(); + +(function scrollToTop () { + var menuTitle = document.querySelector('.menu-title'); + + menuTitle.addEventListener('click', function () { + document.scrollingElement.scrollTo({ top: 0, behavior: 'smooth' }); + }); +})(); + +(function controllMenu() { + var menu = document.getElementById('menu-bar'); + + (function controllPosition() { + var scrollTop = document.scrollingElement.scrollTop; + var prevScrollTop = scrollTop; + var minMenuY = -menu.clientHeight - 50; + // When the script loads, the page can be at any scroll (e.g. if you reforesh it). + menu.style.top = scrollTop + 'px'; + // Same as parseInt(menu.style.top.slice(0, -2), but faster + var topCache = menu.style.top.slice(0, -2); + menu.classList.remove('sticky'); + var stickyCache = false; // Same as menu.classList.contains('sticky'), but faster + document.addEventListener('scroll', function () { + scrollTop = Math.max(document.scrollingElement.scrollTop, 0); + // `null` means that it doesn't need to be updated + var nextSticky = null; + var nextTop = null; + var scrollDown = scrollTop > prevScrollTop; + var menuPosAbsoluteY = topCache - scrollTop; + if (scrollDown) { + nextSticky = false; + if (menuPosAbsoluteY > 0) { + nextTop = prevScrollTop; + } + } else { + if (menuPosAbsoluteY > 0) { + nextSticky = true; + } else if (menuPosAbsoluteY < minMenuY) { + nextTop = prevScrollTop + minMenuY; + } + } + if (nextSticky === true && stickyCache === false) { + menu.classList.add('sticky'); + stickyCache = true; + } else if (nextSticky === false && stickyCache === true) { + menu.classList.remove('sticky'); + stickyCache = false; + } + if (nextTop !== null) { + menu.style.top = nextTop + 'px'; + topCache = nextTop; + } + prevScrollTop = scrollTop; + }, { passive: true }); + })(); + (function controllBorder() { + menu.classList.remove('bordered'); + document.addEventListener('scroll', function () { + if (menu.offsetTop === 0) { + menu.classList.remove('bordered'); + } else { + menu.classList.add('bordered'); + } + }, { passive: true }); + })(); +})(); diff --git a/buy.html b/buy.html new file mode 100644 index 0000000..56000c8 --- /dev/null +++ b/buy.html @@ -0,0 +1,31 @@ +Buy PDF/EPUB versions - CLI text processing with GNU Coreutils

Buy PDF/EPUB versions

You can buy the pdf/epub versions of the book using these links:

Bundles

You can also get the book as part of these bundles:

Testimonials

In my opinion the book does a great job of quickly presenting examples of how commands can be used and then paired up to achieve new or interesting ways of manipulating data. Throughout the text there are little highlights offering tips on extra functionality or limitations of certain commands. For instance, when discussing the shuf command we're warned that shuf will not work with multiple files. However, we can merge multiple files together (using the cat command) and then pass them to shuf. These little gems of wisdom add a dimension to the book and will likely save the reader some time wondering why their scripts are not working as expected.

— book review by Jesse Smith on distrowatch.com

I discovered your books recently and they’re awesome, thank you! As a 20 year *nix they made me realize how much more there are to these rock solid and ancient tools, once you spend the time to actually learn the intricacies of them.

— feedback on reddit

Book list

Here's a list of programming books I've written:

\ No newline at end of file diff --git a/cat-tac.html b/cat-tac.html new file mode 100644 index 0000000..90a35da --- /dev/null +++ b/cat-tac.html @@ -0,0 +1,360 @@ +cat and tac - CLI text processing with GNU Coreutils

cat and tac

cat derives its name from concatenation and provides other nifty options too.

tac helps you to reverse the input line wise, usually used for further text processing.

Creating text files

Yeah, cat can be used to write contents to a file by typing them from the terminal itself. If you invoke cat without providing file arguments or stdin data from a pipe, it will wait for you to type the content. After you are done typing all the text you want to save, press Enter and then the Ctrl+d key combinations. If you don't want the last line to have a newline character, press Ctrl+d twice instead of Enter and Ctrl+d. See also unix.stackexchange: difference between Ctrl+c and Ctrl+d.

# press Enter and Ctrl+d after typing all the required characters
+$ cat > greeting.txt
+Hi there
+Have a nice day
+

In the above example, the output of cat is redirected to a file named greeting.txt. If you don't redirect the stdout data, each line will be echoed as you type. You can check the contents of the file you just created by using cat again.

$ cat greeting.txt
+Hi there
+Have a nice day
+

Here Documents is another popular way to create such files. In this case, the termination condition is a line matching a predefined string which is specified after the << redirection operator. This is especially helpful for automation, since pressing Ctrl+d interactively isn't desirable. Here's an example:

# > and a space at the start of lines represents the secondary prompt PS2
+# don't type them in a shell script
+# EOF is typically used as the identifier
+$ cat << 'EOF' > fruits.txt
+> banana
+> papaya
+> mango
+> EOF
+
+$ cat fruits.txt
+banana
+papaya
+mango
+

The termination string is enclosed in single quotes to prevent parameter expansion, command substitution, etc. You can also use \string for this purpose. If you use <<- instead of <<, you can use leading tab characters for indentation purposes. See bash manual: Here Documents and stackoverflow: here-documents for more examples and details.

info Note that creating files as shown above isn't restricted to cat, it can be applied to any command waiting for stdin.

# 'tr' converts lowercase alphabets to uppercase in this example
+$ tr 'a-z' 'A-Z' << 'end' > op.txt
+> hi there
+> have a nice day
+> end
+
+$ cat op.txt
+HI THERE
+HAVE A NICE DAY
+

Concatenate files

Here are some examples to showcase cat's main utility. One or more files can be passed as arguments.

$ cat greeting.txt fruits.txt nums.txt
+Hi there
+Have a nice day
+banana
+papaya
+mango
+3.14
+42
+1000
+

info Visit the cli_text_processing_coreutils repo to get all the example files used in this book.

To save the output of concatenation, use the shell's redirection features.

$ cat greeting.txt fruits.txt nums.txt > op.txt
+
+$ cat op.txt
+Hi there
+Have a nice day
+banana
+papaya
+mango
+3.14
+42
+1000
+

Accepting stdin data

You can represent the stdin data using - as a file argument. If the file arguments are not present, cat will read the stdin data if present or wait for interactive input as seen earlier.

# only stdin (- is optional in this case)
+$ echo 'apple banana cherry' | cat
+apple banana cherry
+
+# both stdin and file arguments
+$ echo 'apple banana cherry' | cat greeting.txt -
+Hi there
+Have a nice day
+apple banana cherry
+
+# here's an example without a newline character at the end of the first input
+$ printf 'Some\nNumbers' | cat - nums.txt
+Some
+Numbers3.14
+42
+1000
+

Squeeze consecutive empty lines

As mentioned before, cat provides many features beyond concatenation. Consider this sample stdin data:

$ printf 'hello\n\n\nworld\n\nhave a nice day\n\n\n\n\n\napple\n'
+hello
+
+
+world
+
+have a nice day
+
+
+
+
+
+apple
+

You can use the -s option to squeeze consecutive empty lines to a single empty line. If present, leading and trailing empty lines will also be squeezed (won't be completely removed). You can modify the below example to test it out.

$ printf 'hello\n\n\nworld\n\nhave a nice day\n\n\n\n\n\napple\n' | cat -s
+hello
+
+world
+
+have a nice day
+
+apple
+

Prefix line numbers

The -n option will prefix line numbers and a tab character to each input line. The line numbers are right justified to occupy a minimum of 6 characters, with space as the filler.

$ cat -n greeting.txt fruits.txt nums.txt
+     1  Hi there
+     2  Have a nice day
+     3  banana
+     4  papaya
+     5  mango
+     6  3.14
+     7  42
+     8  1000
+

Use the -b option instead of -n if you don't want empty lines to be numbered.

# -n option numbers all the input lines
+$ printf 'apple\n\nbanana\n\ncherry\n' | cat -n
+     1  apple
+     2  
+     3  banana
+     4  
+     5  cherry
+
+# -b option numbers only the non-empty lines
+$ printf 'apple\n\nbanana\n\ncherry\n' | cat -b
+     1  apple
+
+     2  banana
+
+     3  cherry
+

info Use the nl command if you want more customization options like number formatting, separator string, regular expression based filtering and so on.

Viewing special characters

Characters like backspace and carriage return will mangle the contents if viewed naively on the terminal. Characters like NUL won't even be visible. You can use the -v option to show such characters using the caret notation (see wikipedia: Control code chart for details). See this unix.stackexchange thread for non-ASCII examples.

# example for backspace and carriage return characters
+$ printf 'mar\bt\nbike\rp\n'
+mat
+pike
+$ printf 'mar\bt\nbike\rp\n' | cat -v
+mar^Ht
+bike^Mp
+
+# NUL character
+$ printf 'car\0jeep\0bus\0' | cat -v
+car^@jeep^@bus^@
+
+# form-feed and vertical-tab
+$ printf '1 2\t3\f4\v5\n' | cat -v
+1 2     3^L4^K5
+

The -v option doesn't cover the newline and tab characters. You can use the -T option to spot tab characters.

$ printf 'good food\tnice dice\napple\tbanana\tcherry\n' | cat -T
+good food^Inice dice
+apple^Ibanana^Icherry
+

The -E option adds a $ marker at the end of input lines. This is useful to spot trailing whitespace characters.

$ printf 'ice   \nwater\n cool  \n chill\n' | cat -E
+ice   $
+water$
+ cool  $
+ chill$
+

The following options combine two or more of the above options:

  • -e option is equivalent to -vE
  • -t option is equivalent to -vT
  • -A option is equivalent to -vET
$ printf 'mar\bt\nbike\rp\n' | cat -e
+mar^Ht$
+bike^Mp$
+
+$ printf '1 2\t3\f4\v5\n' | cat -t
+1 2^I3^L4^K5
+
+$ printf '1 2\t3\f4\v5\n' | cat -A
+1 2^I3^L4^K5$
+

Useless use of cat

Using cat to view the contents of a file, to concatenate them, etc are well and good. But, using cat when it is not needed is a bad habit that you should avoid. See wikipedia: UUOC and Useless Use of Cat Award for more details.

Most commands that you'll see in this book can directly work with file arguments, so you shouldn't use cat to pipe the contents for such cases. Here's a single file example:

# useless use of cat
+$ cat greeting.txt | sed -E 's/\w+/\L\u&/g'
+Hi There
+Have A Nice Day
+
+# sed can handle file arguments
+$ sed -E 's/\w+/\L\u&/g' greeting.txt
+Hi There
+Have A Nice Day
+

If you prefer having the file argument before the command, you can use the shell's redirection feature to supply input data instead of cat. This also applies to commands like tr that do not accept file arguments.

# useless use of cat
+$ cat greeting.txt | tr 'a-z' 'A-Z'
+HI THERE
+HAVE A NICE DAY
+
+# use shell redirection instead
+$ <greeting.txt tr 'a-z' 'A-Z'
+HI THERE
+HAVE A NICE DAY
+

Such useless use of cat might not have a noticeable negative impact for most cases. But it becomes important if you are dealing with large input files. Especially for commands like tac and tail which will have to wait for all the data to be read instead of directly processing from the end of the file if they had been passed as arguments (or using shell redirection).

If you are dealing with multiple files, then the use of cat will depend upon the desired result. Here are some examples:

# match lines containing 'o' or '0'
+# -n option adds line number prefix
+$ cat greeting.txt fruits.txt nums.txt | grep -n '[o0]'
+5:mango
+8:1000
+$ grep -n '[o0]' greeting.txt fruits.txt nums.txt
+fruits.txt:3:mango
+nums.txt:3:1000
+
+# count the number of lines containing 'o' or '0'
+$ grep -c '[o0]' greeting.txt fruits.txt nums.txt
+greeting.txt:0
+fruits.txt:1
+nums.txt:1
+$ cat greeting.txt fruits.txt nums.txt | grep -c '[o0]'
+2
+

For some use cases like in-place editing with sed, you can't use cat or shell redirection at all. The files have to be passed as arguments only. To conclude, don't use cat just to pass the input as stdin to another command, unless necessary.

tac

tac will reverse the order of the input lines. If you pass multiple input files, each file content will be reversed separately. Here are some examples:

# won't be the same as: cat greeting.txt fruits.txt | tac
+$ tac greeting.txt fruits.txt
+Have a nice day
+Hi there
+mango
+papaya
+banana
+
+$ printf 'apple\nbanana\ncherry\n' | tac
+cherry
+banana
+apple
+

warning If the last input line doesn't end with a newline, the output will also not have that newline character.

$ printf 'apple\nbanana\ncherry' | tac
+cherrybanana
+apple
+

Reversing input lines makes some of the text processing tasks easier. For example, if there are multiple matches but you want only the last one. See my ebooks on GNU sed and GNU awk for more such use cases.

$ cat log.txt
+--> warning 1
+a,b,c,d
+42
+--> warning 2
+x,y,z
+--> warning 3
+4,3,1
+
+$ tac log.txt | grep -m1 'warning'
+--> warning 3
+
+$ tac log.txt | sed '/warning/q' | tac
+--> warning 3
+4,3,1
+

In the above example, log.txt has multiple lines containing warning. The task is to fetch lines based on the last match, which isn't usually supported by CLI tools. Matching the first occurrence is easy with tools like grep and sed. Hence, tac is helpful to reverse the condition from the last match to the first match. After processing with tools like sed, the result is then reversed again to get back the original order of input lines. Another benefit is that the first tac command will stop reading the input contents after the match is found.

info Use the rev command if you want each input line to be reversed character wise.

Customize line separator for tac

By default, the newline character is used to split the input content into lines. You can use the -s option to specify a different string to be used as the separator.

# use NUL as the line separator
+# -s $'\0' can also be used instead of -s '' if ANSI-C quoting is supported
+$ printf 'car\0jeep\0bus\0' | tac -s '' | cat -v
+bus^@jeep^@car^@
+
+# as seen before, the last entry should also have the separator
+# otherwise it won't be present in the output
+$ printf 'apple banana cherry' | tac -s ' ' | cat -e
+cherrybanana apple $
+$ printf 'apple banana cherry ' | tac -s ' ' | cat -e
+cherry banana apple $
+

When the custom separator occurs before the content of interest, use the -b option to print those separators before the content in the output as well.

$ cat body_sep.txt
+%=%=
+apple
+banana
+%=%=
+teal
+green
+
+$ tac -b -s '%=%=' body_sep.txt
+%=%=
+teal
+green
+%=%=
+apple
+banana
+

The separator will be treated as a regular expression if you use the -r option as well.

$ cat shopping.txt
+apple   50
+toys    5
+Pizza   2
+mango   25
+Banana  10
+
+# separator character is 'a' or 'm' at the start of a line
+$ tac -b -rs '^[am]' shopping.txt
+mango   25
+Banana  10
+apple   50
+toys    5
+Pizza   2
+
+# alternate solution for: tac log.txt | sed '/warning/q' | tac
+# separator is zero or more characters from the start of a line till 'warning'
+$ tac -b -rs '^.*warning' log.txt | awk '/warning/ && ++c==2{exit} 1'
+--> warning 3
+4,3,1
+

info See Regular Expressions chapter from my GNU grep ebook if you want to learn about regexp syntax and features.

Exercises

info All the exercises are also collated together in one place at Exercises.md. For solutions, see Exercise_solutions.md.

info The exercises directory has all the files used in this section.

1) The given sample data has empty lines at the start and end of the input. Also, there are multiple empty lines between the paragraphs. How would you get the output shown below?

# note that there's an empty line at the end of the output
+$ printf '\n\n\ndragon\n\n\n\nunicorn\nbee\n\n\n' | ##### add your solution here
+
+     1  dragon
+
+     2  unicorn
+     3  bee
+
+

2) Pass appropriate arguments to the cat command to get the output shown below.

$ cat greeting.txt
+Hi there
+Have a nice day
+
+$ echo '42 apples and 100 bananas' | cat ##### add your solution here
+42 apples and 100 bananas
+Hi there
+Have a nice day
+

3) What does the -v option of the cat command do?

4) Which options of the cat command do the following stand in for?

  • -e option is equivalent to
  • -t option is equivalent to
  • -A option is equivalent to

5) Will the two commands shown below produce the same output? If not, why not?

$ cat fruits.txt ip.txt | tac
+
+$ tac fruits.txt ip.txt
+

6) Reverse the contents of blocks.txt file as shown below, considering ---- as the separator.

$ cat blocks.txt
+----
+apple--banana
+mango---fig
+----
+3.14
+-42
+1000
+----
+sky blue
+dark green
+----
+hi hello
+
+##### add your solution here
+----
+hi hello
+----
+sky blue
+dark green
+----
+3.14
+-42
+1000
+----
+apple--banana
+mango---fig
+

7) For the blocks.txt file, write solutions to display only the last such group and last two groups.

##### add your solution here
+----
+hi hello
+
+##### add your solution here
+----
+sky blue
+dark green
+----
+hi hello
+

8) Reverse the contents of items.txt as shown below. Consider digits at the start of lines as the separator.

$ cat items.txt
+1) fruits
+apple 5
+banana 10
+2) colors
+green
+sky blue
+3) magical beasts
+dragon 3
+unicorn 42
+
+##### add your solution here
+3) magical beasts
+dragon 3
+unicorn 42
+2) colors
+green
+sky blue
+1) fruits
+apple 5
+banana 10
+
\ No newline at end of file diff --git a/clipboard.min.js b/clipboard.min.js new file mode 100644 index 0000000..02c549e --- /dev/null +++ b/clipboard.min.js @@ -0,0 +1,7 @@ +/*! + * clipboard.js v2.0.4 + * https://zenorocha.github.io/clipboard.js + * + * Licensed MIT © Zeno Rocha + */ +!function(t,e){"object"==typeof exports&&"object"==typeof module?module.exports=e():"function"==typeof define&&define.amd?define([],e):"object"==typeof exports?exports.ClipboardJS=e():t.ClipboardJS=e()}(this,function(){return function(n){var o={};function r(t){if(o[t])return o[t].exports;var e=o[t]={i:t,l:!1,exports:{}};return n[t].call(e.exports,e,e.exports,r),e.l=!0,e.exports}return r.m=n,r.c=o,r.d=function(t,e,n){r.o(t,e)||Object.defineProperty(t,e,{enumerable:!0,get:n})},r.r=function(t){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(t,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(t,"__esModule",{value:!0})},r.t=function(e,t){if(1&t&&(e=r(e)),8&t)return e;if(4&t&&"object"==typeof e&&e&&e.__esModule)return e;var n=Object.create(null);if(r.r(n),Object.defineProperty(n,"default",{enumerable:!0,value:e}),2&t&&"string"!=typeof e)for(var o in e)r.d(n,o,function(t){return e[t]}.bind(null,o));return n},r.n=function(t){var e=t&&t.__esModule?function(){return t.default}:function(){return t};return r.d(e,"a",e),e},r.o=function(t,e){return Object.prototype.hasOwnProperty.call(t,e)},r.p="",r(r.s=0)}([function(t,e,n){"use strict";var r="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(t){return typeof t}:function(t){return t&&"function"==typeof Symbol&&t.constructor===Symbol&&t!==Symbol.prototype?"symbol":typeof t},i=function(){function o(t,e){for(var n=0;ncomm - CLI text processing with GNU Coreutils

comm

The comm command finds common and unique lines between two sorted files. These results are formatted as a table with three columns and one or more of these columns can be suppressed as required.

Three column output

Consider the sample input files as shown below:

# side by side view of the sample files
+# note that these files are already sorted
+$ paste colors_1.txt colors_2.txt
+Blue    Black
+Brown   Blue
+Orange  Green
+Purple  Orange
+Red     Pink
+Teal    Red
+White   White
+

By default, comm gives a tabular output with three columns:

  • first column has lines unique to the first file
  • second column has lines unique to the second file
  • third column has lines common to both the files

The columns are separated by a tab character. Here's the output for the above sample files:

$ comm colors_1.txt colors_2.txt
+        Black
+                Blue
+Brown
+        Green
+                Orange
+        Pink
+Purple
+                Red
+Teal
+                White
+

You can change the column separator to a string of your choice using the --output-delimiter option. Here's an example:

# note that the input files need not have the same number of lines
+$ comm <(seq 3) <(seq 2 5)
+1
+                2
+                3
+        4
+        5
+
+$ comm --output-delimiter=, <(seq 3) <(seq 2 5)
+1
+,,2
+,,3
+,4
+,5
+

info Collating order for comm should be same as the one used to sort the input files.

info --nocheck-order option can be used for unsorted inputs. However, as per the documentation, this option "is not guaranteed to produce any particular output."

Suppressing columns

You can use one or more of the following options to suppress columns:

  • -1 to suppress the lines unique to the first file
  • -2 to suppress the lines unique to the second file
  • -3 to suppress the lines common to both the files

Here's how the output looks like when you suppress one of the columns:

# suppress lines common to both the files
+$ comm -3 colors_1.txt colors_2.txt
+        Black
+Brown
+        Green
+        Pink
+Purple
+Teal
+

Combining two of these options gives three useful solutions. -12 will give you only the common lines.

$ comm -12 colors_1.txt colors_2.txt
+Blue
+Orange
+Red
+White
+

-23 will give you the lines unique to the first file.

$ comm -23 colors_1.txt colors_2.txt
+Brown
+Purple
+Teal
+

-13 will give you the lines unique to the second file.

$ comm -13 colors_1.txt colors_2.txt
+Black
+Green
+Pink
+

You can combine all the three options as well. Useful with the --total option to get only the count of lines for each of the three columns.

$ comm --total -123 colors_1.txt colors_2.txt
+3       3       4       total
+

Duplicate lines

The number of duplicate lines in the common column will be minimum of the duplicate occurrences between the two files. Rest of the duplicate lines, if any, will be considered as unique to the file having the excess lines. Here's an example:

$ paste list_1.txt list_2.txt
+apple   cherry
+banana  cherry
+cherry  mango
+cherry  papaya
+cherry  
+cherry  
+
+# 'cherry' occurs only twice in the second file
+# rest of the 'cherry' lines will be unique to the first file
+$ comm list_1.txt list_2.txt
+apple
+banana
+                cherry
+                cherry
+cherry
+cherry
+        mango
+        papaya
+

NUL separator

Use the -z option if you want to use NUL character as the line separator. In this scenario, comm will ensure to add a final NUL character even if not present in the input.

$ comm -z -12 <(printf 'a\0b\0c') <(printf 'a\0c\0x') | cat -v
+a^@c^@
+

Alternatives

Here are some alternate commands you can explore if comm isn't enough to solve your task. These alternatives do not require the input files to be sorted.

Exercises

info The exercises directory has all the files used in this section.

1) Get the common lines between the s1.txt and s2.txt files. Assume that their contents are already sorted.

$ paste s1.txt s2.txt
+apple   banana
+coffee  coffee
+fig     eclair
+honey   fig
+mango   honey
+pasta   milk
+sugar   tea
+tea     yeast
+
+##### add your solution here
+coffee
+fig
+honey
+tea
+

2) Display lines present in s1.txt but not s2.txt and vice versa.

# lines unique to the first file
+##### add your solution here
+apple
+mango
+pasta
+sugar
+
+# lines unique to the second file
+##### add your solution here
+banana
+eclair
+milk
+yeast
+

3) Display lines unique to the s1.txt file and the common lines when compared to the s2.txt file. Use ==> to separate the output columns.

##### add your solution here
+apple
+==>coffee
+==>fig
+==>honey
+mango
+pasta
+sugar
+==>tea
+

4) What does the --total option do?

5) Will the comm command fail if there are repeated lines in the input files? If not, what'd be the expected output for the command shown below?

$ cat s3.txt
+apple
+apple
+guava
+honey
+tea
+tea
+tea
+
+$ comm -23 s3.txt s1.txt
+
\ No newline at end of file diff --git a/cover.html b/cover.html new file mode 100644 index 0000000..6574265 --- /dev/null +++ b/cover.html @@ -0,0 +1,31 @@ +Cover - CLI text processing with GNU Coreutils
\ No newline at end of file diff --git a/csplit.html b/csplit.html new file mode 100644 index 0000000..002c2d1 --- /dev/null +++ b/csplit.html @@ -0,0 +1,440 @@ +csplit - CLI text processing with GNU Coreutils

csplit

The csplit command is useful to divide the input into smaller parts based on line numbers and regular expression patterns. Similar to split, this command also supports customizing output filenames.

info Since a lot of output files will be generated in this chapter (often with same filenames), remove these files after every illustration.

Split on Nth line

You can split the input into two based on a particular line number. To do so, specify the line number after the input source (filename or stdin data). The first output file will have the input lines before the given line number and the second output file will have the rest of the contents.

By default, the output files will be named xx00, xx01, xx02, and so on (where xx is the prefix). The numerical suffix will automatically use more digits if needed. You'll see examples with more than two output files later.

# split input into two based on line number 4
+$ seq 10 | csplit - 4
+6
+15
+
+# first output file will have the first 3 lines
+# second output file will have the rest
+$ head xx*
+==> xx00 <==
+1
+2
+3
+
+==> xx01 <==
+4
+5
+6
+7
+8
+9
+10
+
+$ rm xx*
+

info As seen in the example above, csplit will also display the number of bytes written for each output file. You can use the -q option to suppress this message.

info warning As mentioned earlier, remove the output files after every illustration.

Split on regexp

You can also split the input based on a line matching the given regular expression. The output produced will vary based on the // or %% delimiters being used to surround the regexp.

When /regexp/ is used, output is similar to the line number based splitting. The first output file will have the input lines before the first occurrence of a line matching the given regexp and the second output file will have the rest of the contents.

# match a line containing 't' followed by zero or more characters and then 'p'
+# 'toothpaste' is the only match for this input file
+$ csplit -q purchases.txt '/t.*p/'
+
+$ head xx*
+==> xx00 <==
+coffee
+tea
+washing powder
+coffee
+
+==> xx01 <==
+toothpaste
+tea
+soap
+tea
+

When %regexp% is used, the lines occurring before the matching line won't be part of the output. Only the line matching the given regexp and the rest of the contents will be part of the single output file.

$ csplit -q purchases.txt '%t.*p%'
+
+$ cat xx00
+toothpaste
+tea
+soap
+tea
+

warning You'll get an error if the given regexp isn't found in the input.

$ csplit -q purchases.txt '/xyz/'
+csplit: ‘/xyz/’: match not found
+

info See the Regular Expressions chapter from my GNU grep ebook if you want to learn more about regexp syntax and features.

Regexp offset

You can also provide offset numbers that'll affect where the matching line and its surrounding lines should be placed. When the offset is greater than zero, the split will happen that many lines after the matching line. The default offset is zero.

# when the offset is '1', the matching line will be part of the first file
+$ csplit -q purchases.txt '/t.*p/1'
+$ head xx*
+==> xx00 <==
+coffee
+tea
+washing powder
+coffee
+toothpaste
+
+==> xx01 <==
+tea
+soap
+tea
+
+# matching line and 1 line after won't be part of the output
+$ csplit -q purchases.txt '%t.*p%2'
+$ cat xx00
+soap
+tea
+

When the offset is less than zero, the split will happen that many lines before the matching line.

# 2 lines before the matching line will be part of the second file
+$ csplit -q purchases.txt '/t.*p/-2'
+$ head xx*
+==> xx00 <==
+coffee
+tea
+
+==> xx01 <==
+washing powder
+coffee
+toothpaste
+tea
+soap
+tea
+

warning You'll get an error if the offset goes beyond the number of lines available in the input.

$ csplit -q purchases.txt '/t.*p/5'
+csplit: ‘/t.*p/5’: line number out of range
+
+$ csplit -q purchases.txt '/t.*p/-5'
+csplit: ‘/t.*p/-5’: line number out of range
+

Repeat split

You can perform line number and regexp based split more than once by adding the {N} argument after the pattern. Default behavior examples seen so far is same as specifying {0}. Any number greater than zero will result in that many more splits.

# {1} means split one time more than the default split
+# so, two splits in total and three output files
+# in this example, split happens on the 4th and 8th line numbers
+$ seq 10 | csplit -q - 4 '{1}'
+
+$ head xx*
+==> xx00 <==
+1
+2
+3
+
+==> xx01 <==
+4
+5
+6
+7
+
+==> xx02 <==
+8
+9
+10
+

Here's an example with regexp:

$ cat log.txt
+--> warning 1
+a,b,c,d
+42
+--> warning 2
+x,y,z
+--> warning 3
+4,3,1
+
+# split on the third (2+1) occurrence of a line containing 'warning'
+$ csplit -q log.txt '%warning%' '{2}'
+$ cat xx00
+--> warning 3
+4,3,1
+

As a special case, you can use {*} to repeat the split until the input is exhausted. This is especially useful with the /regexp/ form of splitting. Here's an example:

# split on all lines matching 'paste' or 'powder'
+$ csplit -q purchases.txt '/paste\|powder/' '{*}'
+$ head xx*
+==> xx00 <==
+coffee
+tea
+
+==> xx01 <==
+washing powder
+coffee
+
+==> xx02 <==
+toothpaste
+tea
+soap
+tea
+

warning You'll get an error if the repeat count goes beyond the number of matches possible with the given input.

$ seq 10 | csplit -q - 4 '{2}'
+csplit: ‘4’: line number out of range on repetition 2
+
+$ csplit -q purchases.txt '/tea/' '{4}'
+csplit: ‘/tea/’: match not found on repetition 3
+

Keep files on error

By default, csplit will remove the created output files if there's an error or a signal that causes the command to stop. You can use the -k option to keep such files. One use case is line number based splitting with the {*} modifier.

$ seq 7 | csplit -q - 4 '{*}'
+csplit: ‘4’: line number out of range on repetition 1
+$ ls xx*
+ls: cannot access 'xx*': No such file or directory
+
+# -k option will allow you to retain the created files
+$ seq 7 | csplit -qk - 4 '{*}'
+csplit: ‘4’: line number out of range on repetition 1
+$ head xx*
+==> xx00 <==
+1
+2
+3
+
+==> xx01 <==
+4
+5
+6
+7
+

Suppress matched lines

The --suppress-matched option will suppress the lines matching the split condition.

$ seq 5 | csplit -q --suppress-matched - 3
+# 3rd line won't be part of the output
+$ head xx*
+==> xx00 <==
+1
+2
+
+==> xx01 <==
+4
+5
+
+$ rm xx*
+
+$ seq 10 | csplit -q --suppress-matched - 4 '{1}'
+# 4th and 8th lines won't be part of the output
+$ head xx*
+==> xx00 <==
+1
+2
+3
+
+==> xx01 <==
+5
+6
+7
+
+==> xx02 <==
+9
+10
+

Here's an example with regexp based split:

$ csplit -q --suppress-matched purchases.txt '/soap\|powder/' '{*}'
+# lines matching 'soap' or 'powder' won't be part of the output
+$ head xx*
+==> xx00 <==
+coffee
+tea
+
+==> xx01 <==
+coffee
+toothpaste
+tea
+
+==> xx02 <==
+tea
+

Here's another example:

$ seq 11 16 | csplit -q --suppress-matched - '/[35]/' '{1}'
+# lines matching '3' or '5' won't be part of the output
+$ head xx*
+==> xx00 <==
+11
+12
+
+==> xx01 <==
+14
+
+==> xx02 <==
+16
+
+$ rm xx*
+

Exclude empty files

There are various cases that can result in empty output files. For example, first or last line matching the given split condition. Another possibility is the --suppress-matched option combined with consecutive lines matching during multiple splits. Here's an example:

$ csplit -q --suppress-matched purchases.txt '/coffee\|tea/' '{*}'
+
+$ head xx*
+==> xx00 <==
+
+==> xx01 <==
+
+==> xx02 <==
+washing powder
+
+==> xx03 <==
+toothpaste
+
+==> xx04 <==
+soap
+
+==> xx05 <==
+

You can use the -z option to exclude empty files from the output. The suffix numbering will be automatically adjusted in such cases.

$ csplit -qz --suppress-matched purchases.txt '/coffee\|tea/' '{*}'
+
+$ head xx*
+==> xx00 <==
+washing powder
+
+==> xx01 <==
+toothpaste
+
+==> xx02 <==
+soap
+

Customize filenames

As seen earlier, xx is the default prefix for output filenames. Use the -f option to change this prefix.

$ seq 4 | csplit -q -f'num_' - 3
+
+$ head num_*
+==> num_00 <==
+1
+2
+
+==> num_01 <==
+3
+4
+

The -n option controls the length of the numeric suffix. The suffix length will automatically increment if filenames are exhausted.

$ seq 4 | csplit -q -n1 - 3
+$ ls xx*
+xx0  xx1
+$ rm xx*
+
+$ seq 4 | csplit -q -n3 - 3
+$ ls xx*
+xx000  xx001
+

The -b option allows you to control the suffix using the printf formatting. Quoting from the manual:

When this option is specified, the suffix string must include exactly one printf(3)-style conversion specification, possibly including format specification flags, a field width, a precision specifications, or all of these kinds of modifiers. The format letter must convert a binary unsigned integer argument to readable form. The format letters d and i are aliases for u, and the u, o, x, and X conversions are allowed.

Here are some examples:

# hexadecimal numbering
+# minimum two digits, zero filled
+$ seq 100 | csplit -q -b'%02x' - 3 '{20}'
+$ ls xx*
+xx00  xx02  xx04  xx06  xx08  xx0a  xx0c  xx0e  xx10  xx12  xx14
+xx01  xx03  xx05  xx07  xx09  xx0b  xx0d  xx0f  xx11  xx13  xx15
+$ rm xx*
+
+# custom prefix and suffix around decimal numbering
+# default minimum is a single digit
+$ seq 20 | csplit -q -f'num_' -b'%d.txt' - 3 '{4}'
+$ ls num_*
+num_0.txt  num_1.txt  num_2.txt  num_3.txt  num_4.txt  num_5.txt
+

info Note that the -b option will override the -n option. See man 3 printf for more details about the formatting options.

Exercises

info The exercises directory has all the files used in this section.

info Remove the output files after every exercise.

1) Split the blocks.txt file such that the first 7 lines are in the first file and the rest are in the second file as shown below.

##### add your solution here
+
+$ head xx*
+==> xx00 <==
+----
+apple--banana
+mango---fig
+----
+3.14
+-42
+1000
+
+==> xx01 <==
+----
+sky blue
+dark green
+----
+hi hello
+
+$ rm xx*
+

2) Split the input file items.txt such that the text before a line containing colors is part of the first file and the rest are part of the second file as shown below.

##### add your solution here
+
+$ head xx*
+==> xx00 <==
+1) fruits
+apple 5
+banana 10
+
+==> xx01 <==
+2) colors
+green
+sky blue
+3) magical beasts
+dragon 3
+unicorn 42
+
+$ rm xx*
+

3) Split the input file items.txt such that the line containing magical and all the lines that come after are part of the single output file.

##### add your solution here
+
+$ cat xx00
+3) magical beasts
+dragon 3
+unicorn 42
+
+$ rm xx00
+

4) Split the input file items.txt such that the line containing colors as well the line that comes after are part of the first output file.

##### add your solution here
+
+$ head xx*
+==> xx00 <==
+1) fruits
+apple 5
+banana 10
+2) colors
+green
+
+==> xx01 <==
+sky blue
+3) magical beasts
+dragon 3
+unicorn 42
+
+$ rm xx*
+

5) Split the input file items.txt on the line that comes before a line containing magical. Generate only a single output file as shown below.

##### add your solution here
+
+$ cat xx00
+sky blue
+3) magical beasts
+dragon 3
+unicorn 42
+
+$ rm xx00
+

6) Split the input file blocks.txt on the 4th occurrence of a line starting with the - character. Generate only a single output file as shown below.

##### add your solution here
+
+$ cat xx00
+----
+sky blue
+dark green
+----
+hi hello
+
+$ rm xx00
+

7) For the input file blocks.txt, determine the logic to produce the expected output shown below.

##### add your solution here
+
+$ head xx*
+==> xx00 <==
+apple--banana
+mango---fig
+
+==> xx01 <==
+3.14
+-42
+1000
+
+==> xx02 <==
+sky blue
+dark green
+
+==> xx03 <==
+hi hello
+
+$ rm xx*
+

8) What does the -k option do?

9) Split the books.txt file on every line as shown below.

##### add your solution here
+csplit: ‘1’: line number out of range on repetition 3
+
+$ head row_*
+==> row_0 <==
+Cradle:::Mage Errant::The Weirkey Chronicles
+
+==> row_1 <==
+Mother of Learning::Eight:::::Dear Spellbook:Ascendant
+
+==> row_2 <==
+Mark of the Fool:Super Powereds:::Ends of Magic
+
+$ rm row_*
+

10) Split the items.txt file on lines starting with a digit character. Matching lines shouldn't be part of the output and the files should be named group_0.txt, group_1.txt and so on.

##### add your solution here
+
+$ head group_*
+==> group_0.txt <==
+apple 5
+banana 10
+
+==> group_1.txt <==
+green
+sky blue
+
+==> group_2.txt <==
+dragon 3
+unicorn 42
+
+$ rm group_*
+
\ No newline at end of file diff --git a/css/chrome.css b/css/chrome.css new file mode 100644 index 0000000..10fa4b3 --- /dev/null +++ b/css/chrome.css @@ -0,0 +1,534 @@ +/* CSS for UI elements (a.k.a. chrome) */ + +@import 'variables.css'; + +::-webkit-scrollbar { + background: var(--bg); +} +::-webkit-scrollbar-thumb { + background: var(--scrollbar); +} +html { + scrollbar-color: var(--scrollbar) var(--bg); +} +#searchresults a, +.content a:link, +a:visited, +a > .hljs { + color: var(--links); +} + +/* Menu Bar */ + +#menu-bar, +#menu-bar-hover-placeholder { + z-index: 101; + margin: auto calc(0px - var(--page-padding)); +} +#menu-bar { + position: relative; + display: flex; + flex-wrap: wrap; + background-color: var(--bg); + border-bottom-color: var(--bg); + border-bottom-width: 1px; + border-bottom-style: solid; +} +#menu-bar.sticky, +.js #menu-bar-hover-placeholder:hover + #menu-bar, +.js #menu-bar:hover, +.js.sidebar-visible #menu-bar { + position: -webkit-sticky; + position: sticky; + top: 0 !important; +} +#menu-bar-hover-placeholder { + position: sticky; + position: -webkit-sticky; + top: 0; + height: var(--menu-bar-height); +} +#menu-bar.bordered { + border-bottom-color: var(--table-border-color); +} +#menu-bar i, #menu-bar .icon-button { + position: relative; + padding: 0 8px; + z-index: 10; + line-height: var(--menu-bar-height); + cursor: pointer; + transition: color 0.5s; +} +@media only screen and (max-width: 420px) { + #menu-bar i, #menu-bar .icon-button { + padding: 0 5px; + } +} + +.icon-button { + border: none; + background: none; + padding: 0; + color: inherit; +} +.icon-button i { + margin: 0; +} + +.right-buttons { + margin: 0 15px; +} +.right-buttons a { + text-decoration: none; +} + +.left-buttons { + display: flex; + margin: 0 5px; +} +.no-js .left-buttons { + display: none; +} + +.menu-title { + display: inline-block; + font-weight: 200; + font-size: 2.4rem; + line-height: var(--menu-bar-height); + text-align: center; + margin: 0; + flex: 1; + white-space: nowrap; + overflow: hidden; + text-overflow: ellipsis; +} +.js .menu-title { + cursor: pointer; +} + +.menu-bar, +.menu-bar:visited, +.nav-chapters, +.nav-chapters:visited, +.mobile-nav-chapters, +.mobile-nav-chapters:visited, +.menu-bar .icon-button, +.menu-bar a i { + color: var(--icons); +} + +.menu-bar i:hover, +.menu-bar .icon-button:hover, +.nav-chapters:hover, +.mobile-nav-chapters i:hover { + color: var(--icons-hover); +} + +/* Nav Icons */ + +.nav-chapters { + font-size: 2.5em; + text-align: center; + text-decoration: none; + + position: fixed; + top: 0; + bottom: 0; + margin: 0; + max-width: 150px; + min-width: 90px; + + display: flex; + justify-content: center; + align-content: center; + flex-direction: column; + + transition: color 0.5s, background-color 0.5s; +} + +.nav-chapters:hover { + text-decoration: none; + background-color: var(--theme-hover); + transition: background-color 0.15s, color 0.15s; +} + +.nav-wrapper { + margin-top: 50px; + display: none; +} + +.mobile-nav-chapters { + font-size: 2.5em; + text-align: center; + text-decoration: none; + width: 90px; + border-radius: 5px; + background-color: var(--sidebar-bg); +} + +.previous { + float: left; +} + +.next { + float: right; + right: var(--page-padding); +} + +@media only screen and (max-width: 1080px) { + .nav-wide-wrapper { display: none; } + .nav-wrapper { display: block; } +} + +@media only screen and (max-width: 1380px) { + .sidebar-visible .nav-wide-wrapper { display: none; } + .sidebar-visible .nav-wrapper { display: block; } +} + +/* Inline code */ + +:not(pre) > .hljs { + display: inline; + padding: 0.1em 0.3em; + border-radius: 3px; +} + +:not(pre):not(a) > .hljs { + color: var(--inline-code-color); + overflow-x: initial; +} + +a:hover > .hljs { + text-decoration: underline; +} + +pre { + position: relative; +} +pre > .buttons { + position: absolute; + z-index: 100; + right: 0px; + top: 2px; + margin: 0px; + padding: 2px 0px; + + color: var(--sidebar-fg); + cursor: pointer; + visibility: hidden; + opacity: 0; + transition: visibility 0.1s linear, opacity 0.1s linear; +} +pre:hover > .buttons { + visibility: visible; + opacity: 1 +} +pre > .buttons :hover { + color: var(--sidebar-active); + border-color: var(--icons-hover); + background-color: var(--theme-hover); +} +pre > .buttons i { + margin-left: 8px; +} +pre > .buttons button { + cursor: inherit; + margin: 0px 5px; + padding: 3px 5px; + font-size: 14px; + + border-style: solid; + border-width: 1px; + border-radius: 4px; + border-color: var(--icons); + background-color: var(--theme-popup-bg); + transition: 100ms; + transition-property: color,border-color,background-color; + color: var(--icons); +} +@media (pointer: coarse) { + pre > .buttons button { + /* On mobile, make it easier to tap buttons. */ + padding: 0.3rem 1rem; + } +} +pre > code { + padding: 1rem; +} + +/* FIXME: ACE editors overlap their buttons because ACE does absolute + positioning within the code block which breaks padding. The only solution I + can think of is to move the padding to the outer pre tag (or insert a div + wrapper), but that would require fixing a whole bunch of CSS rules. +*/ +.hljs.ace_editor { + padding: 0rem 0rem; +} + +pre > .result { + margin-top: 10px; +} + +/* Search */ + +#searchresults a { + text-decoration: none; +} + +mark { + border-radius: 2px; + padding: 0 3px 1px 3px; + margin: 0 -3px -1px -3px; + background-color: var(--search-mark-bg); + transition: background-color 300ms linear; + cursor: pointer; +} + +mark.fade-out { + background-color: rgba(0,0,0,0) !important; + cursor: auto; +} + +.searchbar-outer { + margin-left: auto; + margin-right: auto; + max-width: var(--content-max-width); +} + +#searchbar { + width: 100%; + margin: 5px auto 0px auto; + padding: 10px 16px; + transition: box-shadow 300ms ease-in-out; + border: 1px solid var(--searchbar-border-color); + border-radius: 3px; + background-color: var(--searchbar-bg); + color: var(--searchbar-fg); +} +#searchbar:focus, +#searchbar.active { + box-shadow: 0 0 3px var(--searchbar-shadow-color); +} + +.searchresults-header { + font-weight: bold; + font-size: 1em; + padding: 18px 0 0 5px; + color: var(--searchresults-header-fg); +} + +.searchresults-outer { + margin-left: auto; + margin-right: auto; + max-width: var(--content-max-width); + border-bottom: 1px dashed var(--searchresults-border-color); +} + +ul#searchresults { + list-style: none; + padding-left: 20px; +} +ul#searchresults li { + margin: 10px 0px; + padding: 2px; + border-radius: 2px; +} +ul#searchresults li.focus { + background-color: var(--searchresults-li-bg); +} +ul#searchresults span.teaser { + display: block; + clear: both; + margin: 5px 0 0 20px; + font-size: 0.8em; +} +ul#searchresults span.teaser em { + font-weight: bold; + font-style: normal; +} + +/* Sidebar */ + +.sidebar { + position: fixed; + left: 0; + top: 0; + bottom: 0; + width: var(--sidebar-width); + font-size: 0.875em; + box-sizing: border-box; + -webkit-overflow-scrolling: touch; + overscroll-behavior-y: contain; + background-color: var(--sidebar-bg); + color: var(--sidebar-fg); +} +.sidebar-resizing { + -moz-user-select: none; + -webkit-user-select: none; + -ms-user-select: none; + user-select: none; +} +.js:not(.sidebar-resizing) .sidebar { + transition: transform 0.3s; /* Animation: slide away */ +} +.sidebar code { + line-height: 2em; +} +.sidebar .sidebar-scrollbox { + overflow-y: auto; + position: absolute; + top: 0; + bottom: 0; + left: 0; + right: 0; + padding: 10px 10px; +} +.sidebar .sidebar-resize-handle { + position: absolute; + cursor: col-resize; + width: 0; + right: 0; + top: 0; + bottom: 0; +} +.js .sidebar .sidebar-resize-handle { + cursor: col-resize; + width: 5px; +} +.sidebar-hidden .sidebar { + transform: translateX(calc(0px - var(--sidebar-width))); +} +.sidebar::-webkit-scrollbar { + background: var(--sidebar-bg); +} +.sidebar::-webkit-scrollbar-thumb { + background: var(--scrollbar); +} + +.sidebar-visible .page-wrapper { + transform: translateX(var(--sidebar-width)); +} +@media only screen and (min-width: 620px) { + .sidebar-visible .page-wrapper { + transform: none; + margin-left: var(--sidebar-width); + } +} + +.chapter { + list-style: none outside none; + padding-left: 0; + line-height: 2.2em; +} + +.chapter ol { + width: 100%; +} + +.chapter li { + display: flex; + color: var(--sidebar-non-existant); +} +.chapter li a { + display: block; + padding: 0; + text-decoration: none; + color: var(--sidebar-fg); +} + +.chapter li a:hover { + color: var(--sidebar-active); +} + +.chapter li a.active { + color: var(--sidebar-active); +} + +.chapter li > a.toggle { + cursor: pointer; + display: block; + margin-left: auto; + padding: 0 10px; + user-select: none; + opacity: 0.68; +} + +.chapter li > a.toggle div { + transition: transform 0.5s; +} + +/* collapse the section */ +.chapter li:not(.expanded) + li > ol { + display: none; +} + +.chapter li.chapter-item { + line-height: 1.5em; + margin-top: 0.6em; +} + +.chapter li.expanded > a.toggle div { + transform: rotate(90deg); +} + +.spacer { + width: 100%; + height: 3px; + margin: 5px 0px; +} +.chapter .spacer { + background-color: var(--sidebar-spacer); +} + +@media (-moz-touch-enabled: 1), (pointer: coarse) { + .chapter li a { padding: 5px 0; } + .spacer { margin: 10px 0; } +} + +.section { + list-style: none outside none; + padding-left: 20px; + line-height: 1.9em; +} + +/* Theme Menu Popup */ + +.theme-popup { + position: absolute; + left: 10px; + top: var(--menu-bar-height); + z-index: 1000; + border-radius: 4px; + font-size: 0.7em; + color: var(--fg); + background: var(--theme-popup-bg); + border: 1px solid var(--theme-popup-border); + margin: 0; + padding: 0; + list-style: none; + display: none; +} +.theme-popup .default { + color: var(--icons); +} +.theme-popup .theme { + width: 100%; + border: 0; + margin: 0; + padding: 2px 10px; + line-height: 25px; + white-space: nowrap; + text-align: left; + cursor: pointer; + color: inherit; + background: inherit; + font-size: inherit; +} +.theme-popup .theme:hover { + background-color: var(--theme-hover); +} +.theme-popup .theme:hover:first-child, +.theme-popup .theme:hover:last-child { + border-top-left-radius: inherit; + border-top-right-radius: inherit; +} diff --git a/css/general.css b/css/general.css new file mode 100644 index 0000000..0e4f07a --- /dev/null +++ b/css/general.css @@ -0,0 +1,191 @@ +/* Base styles and content styles */ + +@import 'variables.css'; + +:root { + /* Browser default font-size is 16px, this way 1 rem = 10px */ + font-size: 62.5%; +} + +html { + font-family: "Open Sans", sans-serif; + color: var(--fg); + background-color: var(--bg); + text-size-adjust: none; + -webkit-text-size-adjust: none; +} + +body { + margin: 0; + font-size: 1.6rem; + overflow-x: hidden; +} + +code { + font-family: "Source Code Pro", Consolas, "Ubuntu Mono", Menlo, "DejaVu Sans Mono", monospace, monospace !important; + font-size: 0.875em; /* please adjust the ace font size accordingly in editor.js */ +} + +/* make long words/inline code not x overflow */ +main { + overflow-wrap: break-word; +} + +/* make wide tables scroll if they overflow */ +.table-wrapper { + overflow-x: auto; +} + +/* Don't change font size in headers. */ +h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { + font-size: unset; +} + +.left { float: left; } +.right { float: right; } +.boring { opacity: 0.6; } +.hide-boring .boring { display: none; } +.hidden { display: none !important; } + +h2, h3 { margin-top: 2.5em; } +h4, h5 { margin-top: 2em; } + +.header + .header h3, +.header + .header h4, +.header + .header h5 { + margin-top: 1em; +} + +h1:target::before, +h2:target::before, +h3:target::before, +h4:target::before, +h5:target::before, +h6:target::before { + display: inline-block; + content: "»"; + margin-left: -30px; + width: 30px; +} + +/* This is broken on Safari as of version 14, but is fixed + in Safari Technology Preview 117 which I think will be Safari 14.2. + https://bugs.webkit.org/show_bug.cgi?id=218076 +*/ +:target { + scroll-margin-top: calc(var(--menu-bar-height) + 0.5em); +} + +.page { + outline: 0; + padding: 0 var(--page-padding); + margin-top: calc(0px - var(--menu-bar-height)); /* Compensate for the #menu-bar-hover-placeholder */ +} +.page-wrapper { + box-sizing: border-box; +} +.js:not(.sidebar-resizing) .page-wrapper { + transition: margin-left 0.3s ease, transform 0.3s ease; /* Animation: slide away */ +} + +.content { + overflow-y: auto; + padding: 0 5px 50px 5px; +} +.content main { + margin-left: auto; + margin-right: auto; + max-width: var(--content-max-width); +} +.content p { line-height: 1.45em; } +.content ol { line-height: 1.45em; } +.content ul { line-height: 1.45em; } +.content a { text-decoration: none; } +.content a:hover { text-decoration: underline; } +.content img, .content video { max-width: 100%; } +.content .header:link, +.content .header:visited { + color: var(--fg); +} +.content .header:link, +.content .header:visited:hover { + text-decoration: none; +} + +table { + margin: 0 auto; + border-collapse: collapse; +} +table td { + padding: 3px 20px; + border: 1px var(--table-border-color) solid; +} +table thead { + background: var(--table-header-bg); +} +table thead td { + font-weight: 700; + border: none; +} +table thead th { + padding: 3px 20px; +} +table thead tr { + border: 1px var(--table-header-bg) solid; +} +/* Alternate background colors for rows */ +table tbody tr:nth-child(2n) { + background: var(--table-alternate-bg); +} + + +blockquote { + margin: 20px 0; + padding: 0 20px; + color: var(--fg); + background-color: var(--quote-bg); + border-top: .1em solid var(--quote-border); + border-bottom: .1em solid var(--quote-border); +} + + +:not(.footnote-definition) + .footnote-definition, +.footnote-definition + :not(.footnote-definition) { + margin-top: 2em; +} +.footnote-definition { + font-size: 0.9em; + margin: 0.5em 0; +} +.footnote-definition p { + display: inline; +} + +.tooltiptext { + position: absolute; + visibility: hidden; + color: #fff; + background-color: #333; + transform: translateX(-50%); /* Center by moving tooltip 50% of its width left */ + left: -8px; /* Half of the width of the icon */ + top: -35px; + font-size: 0.8em; + text-align: center; + border-radius: 6px; + padding: 5px 8px; + margin: 5px; + z-index: 1000; +} +.tooltipped .tooltiptext { + visibility: visible; +} + +.chapter li.part-title { + color: var(--sidebar-fg); + margin: 5px 0px; + font-weight: bold; +} + +.result-no-output { + font-style: italic; +} diff --git a/css/variables.css b/css/variables.css new file mode 100644 index 0000000..56b634b --- /dev/null +++ b/css/variables.css @@ -0,0 +1,253 @@ + +/* Globals */ + +:root { + --sidebar-width: 300px; + --page-padding: 15px; + --content-max-width: 750px; + --menu-bar-height: 50px; +} + +/* Themes */ + +.ayu { + --bg: hsl(210, 25%, 8%); + --fg: #c5c5c5; + + --sidebar-bg: #14191f; + --sidebar-fg: #c8c9db; + --sidebar-non-existant: #5c6773; + --sidebar-active: #ffb454; + --sidebar-spacer: #2d334f; + + --scrollbar: var(--sidebar-fg); + + --icons: #737480; + --icons-hover: #b7b9cc; + + --links: #0096cf; + + --inline-code-color: #ffb454; + + --theme-popup-bg: #14191f; + --theme-popup-border: #5c6773; + --theme-hover: #191f26; + + --quote-bg: hsl(226, 15%, 17%); + --quote-border: hsl(226, 15%, 22%); + + --table-border-color: hsl(210, 25%, 13%); + --table-header-bg: hsl(210, 25%, 28%); + --table-alternate-bg: hsl(210, 25%, 11%); + + --searchbar-border-color: #848484; + --searchbar-bg: #424242; + --searchbar-fg: #fff; + --searchbar-shadow-color: #d4c89f; + --searchresults-header-fg: #666; + --searchresults-border-color: #888; + --searchresults-li-bg: #252932; + --search-mark-bg: #e3b171; +} + +.coal { + --bg: hsl(200, 7%, 8%); + --fg: #98a3ad; + + --sidebar-bg: #292c2f; + --sidebar-fg: #a1adb8; + --sidebar-non-existant: #505254; + --sidebar-active: #3473ad; + --sidebar-spacer: #393939; + + --scrollbar: var(--sidebar-fg); + + --icons: #43484d; + --icons-hover: #b3c0cc; + + --links: #2b79a2; + + --inline-code-color: #c5c8c6; + + --theme-popup-bg: #141617; + --theme-popup-border: #43484d; + --theme-hover: #1f2124; + + --quote-bg: hsl(234, 21%, 18%); + --quote-border: hsl(234, 21%, 23%); + + --table-border-color: hsl(200, 7%, 13%); + --table-header-bg: hsl(200, 7%, 28%); + --table-alternate-bg: hsl(200, 7%, 11%); + + --searchbar-border-color: #aaa; + --searchbar-bg: #b7b7b7; + --searchbar-fg: #000; + --searchbar-shadow-color: #aaa; + --searchresults-header-fg: #666; + --searchresults-border-color: #98a3ad; + --searchresults-li-bg: #2b2b2f; + --search-mark-bg: #355c7d; +} + +.light { + --bg: hsl(0, 0%, 100%); + --fg: hsl(0, 0%, 0%); + + --sidebar-bg: #fafafa; + --sidebar-fg: hsl(0, 0%, 0%); + --sidebar-non-existant: #aaaaaa; + --sidebar-active: #1f1fff; + --sidebar-spacer: #f4f4f4; + + --scrollbar: #8F8F8F; + + --icons: #747474; + --icons-hover: #000000; + + --links: #20609f; + + --inline-code-color: #301900; + + --theme-popup-bg: #fafafa; + --theme-popup-border: #cccccc; + --theme-hover: #e6e6e6; + + --quote-bg: hsl(197, 37%, 96%); + --quote-border: hsl(197, 37%, 91%); + + --table-border-color: hsl(0, 0%, 95%); + --table-header-bg: hsl(0, 0%, 80%); + --table-alternate-bg: hsl(0, 0%, 97%); + + --searchbar-border-color: #aaa; + --searchbar-bg: #fafafa; + --searchbar-fg: #000; + --searchbar-shadow-color: #aaa; + --searchresults-header-fg: #666; + --searchresults-border-color: #888; + --searchresults-li-bg: #e4f2fe; + --search-mark-bg: #a2cff5; +} + +.navy { + --bg: hsl(226, 23%, 11%); + --fg: #bcbdd0; + + --sidebar-bg: #282d3f; + --sidebar-fg: #c8c9db; + --sidebar-non-existant: #505274; + --sidebar-active: #2b79a2; + --sidebar-spacer: #2d334f; + + --scrollbar: var(--sidebar-fg); + + --icons: #737480; + --icons-hover: #b7b9cc; + + --links: #2b79a2; + + --inline-code-color: #c5c8c6; + + --theme-popup-bg: #161923; + --theme-popup-border: #737480; + --theme-hover: #282e40; + + --quote-bg: hsl(226, 15%, 17%); + --quote-border: hsl(226, 15%, 22%); + + --table-border-color: hsl(226, 23%, 16%); + --table-header-bg: hsl(226, 23%, 31%); + --table-alternate-bg: hsl(226, 23%, 14%); + + --searchbar-border-color: #aaa; + --searchbar-bg: #aeaec6; + --searchbar-fg: #000; + --searchbar-shadow-color: #aaa; + --searchresults-header-fg: #5f5f71; + --searchresults-border-color: #5c5c68; + --searchresults-li-bg: #242430; + --search-mark-bg: #a2cff5; +} + +.rust { + --bg: hsl(60, 9%, 87%); + --fg: #262625; + + --sidebar-bg: #3b2e2a; + --sidebar-fg: #c8c9db; + --sidebar-non-existant: #505254; + --sidebar-active: #e69f67; + --sidebar-spacer: #45373a; + + --scrollbar: var(--sidebar-fg); + + --icons: #737480; + --icons-hover: #262625; + + --links: #2b79a2; + + --inline-code-color: #6e6b5e; + + --theme-popup-bg: #e1e1db; + --theme-popup-border: #b38f6b; + --theme-hover: #99908a; + + --quote-bg: hsl(60, 5%, 75%); + --quote-border: hsl(60, 5%, 70%); + + --table-border-color: hsl(60, 9%, 82%); + --table-header-bg: #b3a497; + --table-alternate-bg: hsl(60, 9%, 84%); + + --searchbar-border-color: #aaa; + --searchbar-bg: #fafafa; + --searchbar-fg: #000; + --searchbar-shadow-color: #aaa; + --searchresults-header-fg: #666; + --searchresults-border-color: #888; + --searchresults-li-bg: #dec2a2; + --search-mark-bg: #e69f67; +} + +@media (prefers-color-scheme: dark) { + .light.no-js { + --bg: hsl(200, 7%, 8%); + --fg: #98a3ad; + + --sidebar-bg: #292c2f; + --sidebar-fg: #a1adb8; + --sidebar-non-existant: #505254; + --sidebar-active: #3473ad; + --sidebar-spacer: #393939; + + --scrollbar: var(--sidebar-fg); + + --icons: #43484d; + --icons-hover: #b3c0cc; + + --links: #2b79a2; + + --inline-code-color: #c5c8c6; + + --theme-popup-bg: #141617; + --theme-popup-border: #43484d; + --theme-hover: #1f2124; + + --quote-bg: hsl(234, 21%, 18%); + --quote-border: hsl(234, 21%, 23%); + + --table-border-color: hsl(200, 7%, 13%); + --table-header-bg: hsl(200, 7%, 28%); + --table-alternate-bg: hsl(200, 7%, 11%); + + --searchbar-border-color: #aaa; + --searchbar-bg: #b7b7b7; + --searchbar-fg: #000; + --searchbar-shadow-color: #aaa; + --searchresults-header-fg: #666; + --searchresults-border-color: #98a3ad; + --searchresults-li-bg: #2b2b2f; + --search-mark-bg: #355c7d; + } +} diff --git a/cut.html b/cut.html new file mode 100644 index 0000000..e15cddb --- /dev/null +++ b/cut.html @@ -0,0 +1,199 @@ +cut - CLI text processing with GNU Coreutils

cut

cut is a handy tool for many field processing use cases. The features are limited compared to awk and perl commands, but the reduced scope also leads to faster processing.

Individual field selections

By default, cut splits the input content into fields based on the tab character. You can use the -f option to select a desired field from each input line. To extract multiple fields, specify the selections separated by the comma character.

# only the second field
+$ printf 'apple\tbanana\tcherry\n' | cut -f2
+banana
+
+# first and third fields
+$ printf 'apple\tbanana\tcherry\n' | cut -f1,3
+apple   cherry
+

cut will always display the selected fields in ascending order. And you cannot display a field more than once.

# same as: cut -f1,3
+$ printf 'apple\tbanana\tcherry\n' | cut -f3,1
+apple   cherry
+
+# same as: cut -f1,2
+$ printf 'apple\tbanana\tcherry\n' | cut -f1,1,2,1,2,1,1,2
+apple   banana
+

By default, cut uses the newline character as the line separator. cut will add a newline character to the output even if the last input line doesn't end with a newline.

$ printf 'good\tfood\ntip\ttap' | cut -f2
+food
+tap
+

Field ranges

You can use the - character to specify field ranges. You can skip the starting or ending range, but not both.

# 2nd, 3rd and 4th fields
+$ printf 'apple\tbanana\tcherry\tfig\tmango\n' | cut -f2-4
+banana  cherry  fig
+
+# all fields from the start till the 3rd field
+$ printf 'apple\tbanana\tcherry\tfig\tmango\n' | cut -f-3
+apple   banana  cherry
+
+# all fields from the 3rd one till the end
+$ printf 'apple\tbanana\tcherry\tfig\tmango\n' | cut -f3-
+cherry  fig     mango
+

Input field delimiter

Use the -d option to change the input delimiter. Only a single byte character is allowed. By default, the output delimiter will be same as the input delimiter.

$ cat scores.csv
+Name,Maths,Physics,Chemistry
+Ith,100,100,100
+Cy,97,98,95
+Lin,78,83,80
+
+$ cut -d, -f2,4 scores.csv
+Maths,Chemistry
+100,100
+97,95
+78,80
+
+# use quotes if the delimiter is a shell metacharacter
+$ echo 'one;two;three;four' | cut -d; -f3
+cut: option requires an argument -- 'd'
+Try 'cut --help' for more information.
+-f3: command not found
+$ echo 'one;two;three;four' | cut -d';' -f3
+three
+

Output field delimiter

Use the --output-delimiter option to customize the output separator to any string of your choice. The string is treated literally. Depending on your shell you can use ANSI-C quoting to allow escape sequences.

# same as: tr '\t' ','
+$ printf 'apple\tbanana\tcherry\n' | cut --output-delimiter=, -f1-
+apple,banana,cherry
+
+# example for multicharacter output separator
+$ echo 'one;two;three;four' | cut -d';' --output-delimiter=' : ' -f1,3-
+one : three : four
+
+# ANSI-C quoting example
+# depending on your environment, you can also press Ctrl+v and then the Tab key
+$ echo 'one;two;three;four' | cut -d';' --output-delimiter=$'\t' -f1,3-
+one     three   four
+
+# newline as the output field separator
+$ echo 'one;two;three;four' | cut -d';' --output-delimiter=$'\n' -f2,4
+two
+four
+

Complement

The --complement option allows you to invert the field selections.

# except the second field
+$ printf 'apple ball cat\n1 2 3 4 5' | cut --complement -d' ' -f2
+apple cat
+1 3 4 5
+
+# except the first and third fields
+$ printf 'apple ball cat\n1 2 3 4 5' | cut --complement -d' ' -f1,3
+ball
+2 4 5
+

Suppress lines without delimiters

By default, lines not containing the input delimiter will still be part of the output. You can use the -s option to suppress such lines.

$ cat mixed_fields.csv
+1,2,3,4
+hello
+a,b,c
+
+# second line doesn't have the comma separator
+# by default, such lines will be part of the output
+$ cut -d, -f2 mixed_fields.csv
+2
+hello
+b
+
+# use the -s option to suppress such lines
+$ cut -sd, -f2 mixed_fields.csv
+2
+b
+
+$ cut --complement -sd, -f2 mixed_fields.csv
+1,3,4
+a,c
+

info If a line contains the specified delimiter but doesn't have the field number requested, you'll get a blank line. The -s option has no effect on such lines.

$ printf 'apple ball cat\n1 2 3 4 5' | cut -d' ' -f4
+
+4
+

Character selections

You can use the -b or -c options to select specific bytes from each input line. The syntax is same as the -f option. The -c option is intended for multibyte character selection, but for now it works exactly as the -b option. Character selection is useful for working with fixed-width fields.

$ printf 'apple\tbanana\tcherry\n' | cut -c2,8,11
+pan
+
+$ printf 'apple\tbanana\tcherry\n' | cut -c2,8,11 --output-delimiter=-
+p-a-n
+
+$ printf 'apple\tbanana\tcherry\n' | cut -c-5
+apple
+
+$ printf 'apple\tbanana\tcherry\n' | cut --complement -c13-
+apple   banana
+
+$ printf 'cat-bat\ndog:fog\nget;pet' | cut -c5-
+bat
+fog
+pet
+

NUL separator

Use the -z option if you want to use NUL character as the line separator. In this scenario, cut will ensure to add a final NUL character even if not present in the input.

$ printf 'good-food\0tip-tap\0' | cut -zd- -f2 | cat -v
+food^@tap^@
+

Alternatives

Here are some alternate commands you can explore if cut isn't enough to solve your task.

  • hck — supports regexp delimiters, field reordering, header based selection, etc
  • choose — negative indexing, regexp based delimiters, etc
  • xsv — fast CSV command line toolkit
  • rcut — my bash+awk script, supports regexp delimiters, field reordering, negative indexing, etc
  • awk — my ebook on GNU awk one-liners
  • perl — my ebook on Perl one-liners

Exercises

info The exercises directory has all the files used in this section.

1) Display only the third field.

$ printf 'tea\tcoffee\tchocolate\tfruit\n' | ##### add your solution here
+chocolate
+

2) Display the second and fifth fields. Consider , as the field separator.

$ echo 'tea,coffee,chocolate,ice cream,fruit' | ##### add your solution here
+coffee,fruit
+

3) Why does the below command not work as expected? What other tools can you use in such cases?

# not working as expected
+$ echo 'apple,banana,cherry,fig' | cut -d, -f3,1,3
+apple,cherry
+
+# expected output
+$ echo 'apple,banana,cherry,fig' | ##### add your solution here
+cherry,apple,cherry
+

4) Display except the second field in the format shown below. Can you construct two different solutions?

# solution 1
+$ echo 'apple,banana,cherry,fig' | ##### add your solution here
+apple cherry fig
+
+# solution 2
+$ echo '2,3,4,5,6,7,8' | ##### add your solution here
+2 4 5 6 7 8
+

5) Extract the first three characters from the input lines as shown below. Can you also use the head command for this purpose? If not, why not?

$ printf 'apple\nbanana\ncherry\nfig\n' | ##### add your solution here
+app
+ban
+che
+fig
+

6) Display only the first and third fields of the scores.csv input file, with tab as the output field separator.

$ cat scores.csv
+Name,Maths,Physics,Chemistry
+Ith,100,100,100
+Cy,97,98,95
+Lin,78,83,80
+
+##### add your solution here
+Name    Physics
+Ith     100
+Cy      98
+Lin     83
+

7) The given input data uses one or more : characters as the field separator. Assume that no field content will have the : character. Display except the second field, with : as the output field separator.

$ cat books.txt
+Cradle:::Mage Errant::The Weirkey Chronicles
+Mother of Learning::Eight:::::Dear Spellbook:Ascendant
+Mark of the Fool:Super Powereds:::Ends of Magic
+
+##### add your solution here
+Cradle : The Weirkey Chronicles
+Mother of Learning : Dear Spellbook : Ascendant
+Mark of the Fool : Ends of Magic
+

8) Which option would you use to not display lines that do not contain the input delimiter character?

9) Modify the command to get the expected output shown below.

$ printf 'apple\nbanana\ncherry\n' | cut -c-3 --output-delimiter=:
+app
+ban
+che
+
+$ printf 'apple\nbanana\ncherry\n' | ##### add your solution here
+a:p:p
+b:a:n
+c:h:e
+

10) Figure out the logic based on the given input and output data.

$ printf 'apple\0fig\0carpet\0jeep\0' | ##### add your solution here | cat -v
+ple^@g^@rpet^@ep^@
+
\ No newline at end of file diff --git a/elasticlunr.min.js b/elasticlunr.min.js new file mode 100644 index 0000000..94b20dd --- /dev/null +++ b/elasticlunr.min.js @@ -0,0 +1,10 @@ +/** + * elasticlunr - http://weixsong.github.io + * Lightweight full-text search engine in Javascript for browser search and offline search. - 0.9.5 + * + * Copyright (C) 2017 Oliver Nightingale + * Copyright (C) 2017 Wei Song + * MIT Licensed + * @license + */ +!function(){function e(e){if(null===e||"object"!=typeof e)return e;var t=e.constructor();for(var n in e)e.hasOwnProperty(n)&&(t[n]=e[n]);return t}var t=function(e){var n=new t.Index;return n.pipeline.add(t.trimmer,t.stopWordFilter,t.stemmer),e&&e.call(n,n),n};t.version="0.9.5",lunr=t,t.utils={},t.utils.warn=function(e){return function(t){e.console&&console.warn&&console.warn(t)}}(this),t.utils.toString=function(e){return void 0===e||null===e?"":e.toString()},t.EventEmitter=function(){this.events={}},t.EventEmitter.prototype.addListener=function(){var e=Array.prototype.slice.call(arguments),t=e.pop(),n=e;if("function"!=typeof t)throw new TypeError("last argument must be a function");n.forEach(function(e){this.hasHandler(e)||(this.events[e]=[]),this.events[e].push(t)},this)},t.EventEmitter.prototype.removeListener=function(e,t){if(this.hasHandler(e)){var n=this.events[e].indexOf(t);-1!==n&&(this.events[e].splice(n,1),0==this.events[e].length&&delete this.events[e])}},t.EventEmitter.prototype.emit=function(e){if(this.hasHandler(e)){var t=Array.prototype.slice.call(arguments,1);this.events[e].forEach(function(e){e.apply(void 0,t)},this)}},t.EventEmitter.prototype.hasHandler=function(e){return e in this.events},t.tokenizer=function(e){if(!arguments.length||null===e||void 0===e)return[];if(Array.isArray(e)){var n=e.filter(function(e){return null===e||void 0===e?!1:!0});n=n.map(function(e){return t.utils.toString(e).toLowerCase()});var i=[];return n.forEach(function(e){var n=e.split(t.tokenizer.seperator);i=i.concat(n)},this),i}return e.toString().trim().toLowerCase().split(t.tokenizer.seperator)},t.tokenizer.defaultSeperator=/[\s\-]+/,t.tokenizer.seperator=t.tokenizer.defaultSeperator,t.tokenizer.setSeperator=function(e){null!==e&&void 0!==e&&"object"==typeof e&&(t.tokenizer.seperator=e)},t.tokenizer.resetSeperator=function(){t.tokenizer.seperator=t.tokenizer.defaultSeperator},t.tokenizer.getSeperator=function(){return t.tokenizer.seperator},t.Pipeline=function(){this._queue=[]},t.Pipeline.registeredFunctions={},t.Pipeline.registerFunction=function(e,n){n in t.Pipeline.registeredFunctions&&t.utils.warn("Overwriting existing registered function: "+n),e.label=n,t.Pipeline.registeredFunctions[n]=e},t.Pipeline.getRegisteredFunction=function(e){return e in t.Pipeline.registeredFunctions!=!0?null:t.Pipeline.registeredFunctions[e]},t.Pipeline.warnIfFunctionNotRegistered=function(e){var n=e.label&&e.label in this.registeredFunctions;n||t.utils.warn("Function is not registered with pipeline. This may cause problems when serialising the index.\n",e)},t.Pipeline.load=function(e){var n=new t.Pipeline;return e.forEach(function(e){var i=t.Pipeline.getRegisteredFunction(e);if(!i)throw new Error("Cannot load un-registered function: "+e);n.add(i)}),n},t.Pipeline.prototype.add=function(){var e=Array.prototype.slice.call(arguments);e.forEach(function(e){t.Pipeline.warnIfFunctionNotRegistered(e),this._queue.push(e)},this)},t.Pipeline.prototype.after=function(e,n){t.Pipeline.warnIfFunctionNotRegistered(n);var i=this._queue.indexOf(e);if(-1===i)throw new Error("Cannot find existingFn");this._queue.splice(i+1,0,n)},t.Pipeline.prototype.before=function(e,n){t.Pipeline.warnIfFunctionNotRegistered(n);var i=this._queue.indexOf(e);if(-1===i)throw new Error("Cannot find existingFn");this._queue.splice(i,0,n)},t.Pipeline.prototype.remove=function(e){var t=this._queue.indexOf(e);-1!==t&&this._queue.splice(t,1)},t.Pipeline.prototype.run=function(e){for(var t=[],n=e.length,i=this._queue.length,o=0;n>o;o++){for(var r=e[o],s=0;i>s&&(r=this._queue[s](r,o,e),void 0!==r&&null!==r);s++);void 0!==r&&null!==r&&t.push(r)}return t},t.Pipeline.prototype.reset=function(){this._queue=[]},t.Pipeline.prototype.get=function(){return this._queue},t.Pipeline.prototype.toJSON=function(){return this._queue.map(function(e){return t.Pipeline.warnIfFunctionNotRegistered(e),e.label})},t.Index=function(){this._fields=[],this._ref="id",this.pipeline=new t.Pipeline,this.documentStore=new t.DocumentStore,this.index={},this.eventEmitter=new t.EventEmitter,this._idfCache={},this.on("add","remove","update",function(){this._idfCache={}}.bind(this))},t.Index.prototype.on=function(){var e=Array.prototype.slice.call(arguments);return this.eventEmitter.addListener.apply(this.eventEmitter,e)},t.Index.prototype.off=function(e,t){return this.eventEmitter.removeListener(e,t)},t.Index.load=function(e){e.version!==t.version&&t.utils.warn("version mismatch: current "+t.version+" importing "+e.version);var n=new this;n._fields=e.fields,n._ref=e.ref,n.documentStore=t.DocumentStore.load(e.documentStore),n.pipeline=t.Pipeline.load(e.pipeline),n.index={};for(var i in e.index)n.index[i]=t.InvertedIndex.load(e.index[i]);return n},t.Index.prototype.addField=function(e){return this._fields.push(e),this.index[e]=new t.InvertedIndex,this},t.Index.prototype.setRef=function(e){return this._ref=e,this},t.Index.prototype.saveDocument=function(e){return this.documentStore=new t.DocumentStore(e),this},t.Index.prototype.addDoc=function(e,n){if(e){var n=void 0===n?!0:n,i=e[this._ref];this.documentStore.addDoc(i,e),this._fields.forEach(function(n){var o=this.pipeline.run(t.tokenizer(e[n]));this.documentStore.addFieldLength(i,n,o.length);var r={};o.forEach(function(e){e in r?r[e]+=1:r[e]=1},this);for(var s in r){var u=r[s];u=Math.sqrt(u),this.index[n].addToken(s,{ref:i,tf:u})}},this),n&&this.eventEmitter.emit("add",e,this)}},t.Index.prototype.removeDocByRef=function(e){if(e&&this.documentStore.isDocStored()!==!1&&this.documentStore.hasDoc(e)){var t=this.documentStore.getDoc(e);this.removeDoc(t,!1)}},t.Index.prototype.removeDoc=function(e,n){if(e){var n=void 0===n?!0:n,i=e[this._ref];this.documentStore.hasDoc(i)&&(this.documentStore.removeDoc(i),this._fields.forEach(function(n){var o=this.pipeline.run(t.tokenizer(e[n]));o.forEach(function(e){this.index[n].removeToken(e,i)},this)},this),n&&this.eventEmitter.emit("remove",e,this))}},t.Index.prototype.updateDoc=function(e,t){var t=void 0===t?!0:t;this.removeDocByRef(e[this._ref],!1),this.addDoc(e,!1),t&&this.eventEmitter.emit("update",e,this)},t.Index.prototype.idf=function(e,t){var n="@"+t+"/"+e;if(Object.prototype.hasOwnProperty.call(this._idfCache,n))return this._idfCache[n];var i=this.index[t].getDocFreq(e),o=1+Math.log(this.documentStore.length/(i+1));return this._idfCache[n]=o,o},t.Index.prototype.getFields=function(){return this._fields.slice()},t.Index.prototype.search=function(e,n){if(!e)return[];e="string"==typeof e?{any:e}:JSON.parse(JSON.stringify(e));var i=null;null!=n&&(i=JSON.stringify(n));for(var o=new t.Configuration(i,this.getFields()).get(),r={},s=Object.keys(e),u=0;u0&&t.push(e);for(var i in n)"docs"!==i&&"df"!==i&&this.expandToken(e+i,t,n[i]);return t},t.InvertedIndex.prototype.toJSON=function(){return{root:this.root}},t.Configuration=function(e,n){var e=e||"";if(void 0==n||null==n)throw new Error("fields should not be null");this.config={};var i;try{i=JSON.parse(e),this.buildUserConfig(i,n)}catch(o){t.utils.warn("user configuration parse failed, will use default configuration"),this.buildDefaultConfig(n)}},t.Configuration.prototype.buildDefaultConfig=function(e){this.reset(),e.forEach(function(e){this.config[e]={boost:1,bool:"OR",expand:!1}},this)},t.Configuration.prototype.buildUserConfig=function(e,n){var i="OR",o=!1;if(this.reset(),"bool"in e&&(i=e.bool||i),"expand"in e&&(o=e.expand||o),"fields"in e)for(var r in e.fields)if(n.indexOf(r)>-1){var s=e.fields[r],u=o;void 0!=s.expand&&(u=s.expand),this.config[r]={boost:s.boost||0===s.boost?s.boost:1,bool:s.bool||i,expand:u}}else t.utils.warn("field name in user configuration not found in index instance fields");else this.addAllFields2UserConfig(i,o,n)},t.Configuration.prototype.addAllFields2UserConfig=function(e,t,n){n.forEach(function(n){this.config[n]={boost:1,bool:e,expand:t}},this)},t.Configuration.prototype.get=function(){return this.config},t.Configuration.prototype.reset=function(){this.config={}},lunr.SortedSet=function(){this.length=0,this.elements=[]},lunr.SortedSet.load=function(e){var t=new this;return t.elements=e,t.length=e.length,t},lunr.SortedSet.prototype.add=function(){var e,t;for(e=0;e1;){if(r===e)return o;e>r&&(t=o),r>e&&(n=o),i=n-t,o=t+Math.floor(i/2),r=this.elements[o]}return r===e?o:-1},lunr.SortedSet.prototype.locationFor=function(e){for(var t=0,n=this.elements.length,i=n-t,o=t+Math.floor(i/2),r=this.elements[o];i>1;)e>r&&(t=o),r>e&&(n=o),i=n-t,o=t+Math.floor(i/2),r=this.elements[o];return r>e?o:e>r?o+1:void 0},lunr.SortedSet.prototype.intersect=function(e){for(var t=new lunr.SortedSet,n=0,i=0,o=this.length,r=e.length,s=this.elements,u=e.elements;;){if(n>o-1||i>r-1)break;s[n]!==u[i]?s[n]u[i]&&i++:(t.add(s[n]),n++,i++)}return t},lunr.SortedSet.prototype.clone=function(){var e=new lunr.SortedSet;return e.elements=this.toArray(),e.length=e.elements.length,e},lunr.SortedSet.prototype.union=function(e){var t,n,i;this.length>=e.length?(t=this,n=e):(t=e,n=this),i=t.clone();for(var o=0,r=n.toArray();oexpand and unexpand - CLI text processing with GNU Coreutils

expand and unexpand

These two commands will help you convert tabs to spaces and vice versa. Both these commands support options to customize the width of tab stops and which occurrences should be converted.

Default expand

The expand command converts tab characters to space characters. The default expansion aligns at multiples of 8 columns (calculated in terms of bytes).

# sample stdin data
+$ printf 'apple\tbanana\tcherry\na\tb\tc\n' | cat -T
+apple^Ibanana^Icherry
+a^Ib^Ic
+# 'apple' = 5 bytes, \t converts to 3 spaces
+# 'banana' = 6 bytes, \t converts to 2 spaces
+# 'a' and 'b' = 1 byte, \t converts to 7 spaces
+$ printf 'apple\tbanana\tcherry\na\tb\tc\n' | expand
+apple   banana  cherry
+a       b       c
+
+# 'αλε' = 6 bytes, \t converts to 2 spaces
+$ printf 'αλε\tπού\n' | expand
+αλε  πού
+

Here's an example with strings of size 7 and 8 bytes before the tab character:

$ printf 'deviate\treached\nbackdrop\toverhang\n' | expand
+deviate reached
+backdrop        overhang
+

The expand command also considers backspace characters to determine the number of spaces needed.

# sample input with a backspace character
+$ printf 'cart\bd\tbard\n' | cat -t
+cart^Hd^Ibard
+
+# 'card' = 4 bytes, \t converts to 4 spaces
+$ printf 'cart\bd\tbard\n' | expand
+card    bard
+$ printf 'cart\bd\tbard\n' | expand | cat -t
+cart^Hd    bard
+

info expand will concatenate multiple files passed as input source, so cat will not be needed for such cases.

Expand only the initial tabs

You can use the -i option to convert only the tab characters present at the start of a line. The first occurrence of a character that is not tab or space characters will stop the expansion.

# 'a' present at the start of line is not a tab/space character
+# so no tabs are expanded for this input
+$ printf 'a\tb\tc\n' | expand -i | cat -T
+a^Ib^Ic
+
+# the first \t gets expanded here, 'a' stops further expansion
+$ printf '\ta\tb\tc\n' | expand -i | cat -T
+        a^Ib^Ic
+
+# first two \t gets expanded here, 'a' stops further expansion
+# presence of space characters will not stop the expansion
+$ printf '\t \ta\tb\tc\n' | expand -i | cat -T
+                a^Ib^Ic
+

Customize the tab stop width

You can use the -t option to control the expansion width. Default is 8 as seen in the previous examples.

This option provides various features. Here's an example where all the tab characters are converted equally to the given width:

$ cat -T code.py
+def compute(x, y):
+^Iif x > y:
+^I^Iprint('hello')
+^Ielse:
+^I^Iprint('bye')
+
+$ expand -t 2 code.py
+def compute(x, y):
+  if x > y:
+    print('hello')
+  else:
+    print('bye')
+

You can provide multiple widths separated by a comma character. In such a case, the given widths determine the stop locations for those many tab characters. These stop values refer to absolute positions from the start of the line, not the number of spaces they can expand to. Rest of the tab characters will be expanded to a single space character.

# first tab character can expand till the 3rd column
+# second tab character can expand till the 7th column
+# rest of the tab characters will be expanded to a single space
+$ printf 'a\tb\tc\td\te\n' | expand -t 3,7
+a  b   c d e
+
+# here are two more examples with the same specification as above
+# second tab expands to two spaces to end at the 7th column
+$ printf 'a\tbb\tc\td\te\n' | expand -t 3,7
+a  bb  c d e
+# second tab expands to a single space since it goes beyond the 7th column
+$ printf 'a\tbbbbbbbb\tc\td\te\n' | expand -t 3,7
+a  bbbbbbbb c d e
+

If you prefix a / character to the last width, the remaining tab characters will use multiple of this position instead of a single space default.

# first tab character can expand till the 3rd column
+# remaining tab characters can expand till 7/14/21/etc
+$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,/7
+a  b   c      d      e      f      g
+
+# first tab character can expand till the 3rd column
+# second tab character can expand till the 7th column
+# remaining tab characters can expand till 10/15/20/etc
+$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,7,/5
+a  b   c  d    e    f    g
+

If you use + instead of / as the prefix for the last width, the multiple calculation will use the second last width as an offset.

# first tab character can expand till the 3rd column
+# 3+7=10, so remaining tab characters can expand till 10/17/24/etc
+$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,+7
+a  b      c      d      e      f      g
+
+# first tab character can expand till the 3rd column
+# second tab character can expand till the 7th column
+# 7+5=12, so remaining tab characters can expand till 12/17/22/etc
+$ printf 'a\tb\tc\td\te\tf\tg\n' | expand -t 3,7,+5
+a  b   c    d    e    f    g
+

Default unexpand

By default, the unexpand command converts initial blank characters (space or tab) to tabs. The first occurrence of a non-blank character will stop the conversion. By default, every 8 columns worth of blanks is converted to a tab.

# input is 8 spaces followed by 'a' and then more characters
+# the initial 8 spaces is converted to a tab character
+# 'a' stops any further conversion, since it is a non-blank character
+$ printf '        a       b       c\n' | unexpand | cat -T
+^Ia       b       c
+
+# input is 9 spaces followed by 'a' and then more characters
+# the initial 8 spaces are converted to a tab character
+# remaining space is left as is
+$ printf '         a       b       c\n' | unexpand | cat -T
+^I a       b       c
+
+# input has 16 initial spaces, gets converted to two tabs
+$ printf '\t\ta\tb\tc\n' | expand | unexpand | cat -T
+^I^Ia       b       c
+
+# input has 4 spaces and a tab character (that expands till the 8th column)
+# output will have a single tab character at the start
+$ printf '    \ta b\n' | unexpand | cat -T
+^Ia b
+

info The current locale determines which characters are considered as blanks. Also, unexpand will concatenate multiple files passed as input source, so cat will not be needed for such cases.

Unexpand all blanks

The -a option will allow you to convert all sequences of two or more blanks at tab boundaries. Here are some examples:

# default unexpand stops at the first non-blank character
+$ printf '        a       b       c\n' | unexpand | cat -T
+^Ia       b       c
+# -a option will convert all sequences of blanks at tab boundaries
+$ printf '        a       b       c\n' | unexpand -a | cat -T
+^Ia^Ib^Ic
+
+# only two or more consecutive blanks are considered for conversion
+$ printf 'riddled reached\n' | unexpand -a | cat -T
+riddled reached
+$ printf 'riddle  reached\n' | unexpand -a | cat -T
+riddle^Ireached
+
+# blanks at non-tab boundaries won't be converted
+$ printf 'oh  hi  hello\n' | unexpand -a | cat -T
+oh  hi^Ihello
+

The unexpand command also considers backspace characters to determine the tab boundary.

# 'card' = 4 bytes, so the 4 spaces gets converted to a tab
+$ printf 'cart\bd    bard\n' | unexpand -a | cat -T
+card^Ibard
+$ printf 'cart\bd    bard\n' | unexpand -a | cat -t
+cart^Hd^Ibard
+

Change the tab stop width

The -t option has the same features as seen with the expand command. The -a option is also implied when this option is used.

Here's an example of changing the tab stop width to 2:

$ printf '\ta\n\t\tb\n' | expand -t 2
+  a
+    b
+
+$ printf '\ta\n\t\tb\n' | expand -t 2 | unexpand -t 2 | cat -T
+^Ia
+^I^Ib
+

Here are some examples with multiple tab widths:

$ printf 'a\tb\tc\td\te\n' | expand -t 3,7
+a  b   c d e
+$ printf 'a  b   c d e\n' | unexpand -t 3,7 | cat -T
+a^Ib^Ic d e
+$ printf 'a\tb\tc\td\te\n' | expand -t 3,7 | unexpand -t 3,7 | cat -T
+a^Ib^Ic d e
+
+$ printf 'a\tb\tc\td\te\tf\n' | expand -t 3,/7
+a  b   c      d      e      f
+$ printf 'a  b   c      d      e      f\n' | unexpand -t 3,/7 | cat -T
+a^Ib^Ic^Id^Ie^If
+
+$ printf 'a\tb\tc\td\te\tf\n' | expand -t 3,+7
+a  b      c      d      e      f
+$ printf 'a  b      c      d      e      f\n' | unexpand -t 3,+7 | cat -T
+a^Ib^Ic^Id^Ie^If
+

Exercises

info The exercises directory has all the files used in this section.

1) The items.txt file has space separated words. Convert the spaces to be aligned at 10 column widths as shown below.

$ cat items.txt
+1) fruits
+apple 5
+banana 10
+2) colors
+green
+sky blue
+3) magical beasts
+dragon 3
+unicorn 42
+
+##### add your solution here
+1)        fruits
+apple     5
+banana    10
+2)        colors
+green
+sky       blue
+3)        magical   beasts
+dragon    3
+unicorn   42
+

2) What does the expand -i option do?

3) Expand the first tab character to stop at the 10th column and the second one at the 16th column. Rest of the tabs should be converted to a single space character.

$ printf 'app\tfix\tjoy\tmap\ttap\n' | ##### add your solution here
+app       fix   joy map tap
+
+$ printf 'appleseed\tfig\tjoy\n' | ##### add your solution here
+appleseed fig   joy
+
+$ printf 'a\tb\tc\td\te\n' | ##### add your solution here
+a         b     c d e
+

4) Will the following code give back the original input? If not, is there an option that can help?

$ printf 'a\tb\tc\n' | expand | unexpand
+

5) How do the + and / prefix modifiers affect the -t option?

\ No newline at end of file diff --git a/favicon.png b/favicon.png new file mode 100644 index 0000000..be5433d Binary files /dev/null and b/favicon.png differ diff --git a/favicon.svg b/favicon.svg new file mode 100644 index 0000000..74f6cf8 --- /dev/null +++ b/favicon.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/fold-fmt.html b/fold-fmt.html new file mode 100644 index 0000000..c29ae27 --- /dev/null +++ b/fold-fmt.html @@ -0,0 +1,146 @@ +fold and fmt - CLI text processing with GNU Coreutils

fold and fmt

These two commands are useful to split and join lines to meet a specific line length requirement. fmt is smarter and usually the tool you want, but fold can be handy for some cases.

fold

By default, fold will wrap lines that are greater than 80 bytes long, which can be customized using the -w option. The newline character isn't part of this line length calculation. You might wonder if there are tasks where wrapping without context could be useful. One use case I can think of is the FASTA format.

$ cat greeting.txt
+Hi there
+Have a nice day
+
+# splits the second line since it is greater than 10 bytes
+$ fold -w10 greeting.txt
+Hi there
+Have a nic
+e day
+

The -s option looks for the presence of spaces to determine the line splitting. This check is performed within the limits of the wrap length.

$ fold -s -w10 greeting.txt
+Hi there
+Have a 
+nice day
+

However, the -s option can still split words if there's no blank space before the specified width. Use fmt if you don't want this behavior.

$ echo 'hi there' | fold -s -w4
+hi 
+ther
+e
+

The -b option will cause fold to treat tab, backspace, and carriage return characters as if they were a single byte character.

# tab can occupy up to 8 columns
+$ printf 'a\tb\tc\t1\t2\t3\n' | fold -w6
+a
+        
+b
+        
+c
+        
+1
+        
+2
+        
+3
+
+# here, tab will be treated as if it occupies only a single column
+$ printf 'a\tb\tc\t1\t2\t3\n' | fold -b -w6
+a       b       c       
+1       2       3
+

fmt

The fmt command makes a smarter decision based on sentences, paragraphs and other details. Here's an example that splits a single line (taken from the documentation of fmt command) into several lines. The default formatting is 93% of 75 columns. The -w option controls the width parameter and the -g option controls the percentage of columns.

$ fmt info_fmt.txt
+fmt prefers breaking lines at the end of a sentence, and tries to
+avoid line breaks after the first word of a sentence or before the last
+word of a sentence. A sentence break is defined as either the end of a
+paragraph or a word ending in any of '.?!', followed by two spaces or
+end of line, ignoring any intervening parentheses or quotes. Like TeX,
+fmt reads entire "paragraphs" before choosing line breaks; the algorithm
+is a variant of that given by Donald E. Knuth and Michael F. Plass in
+"Breaking Paragraphs Into Lines", Software—Practice & Experience 11,
+11 (November 1981), 1119–1184.
+

Unlike the fold command, words are not split even if they exceed the maximum line width. Another difference is that fmt will add a final newline character even if it wasn't present in the input.

$ printf 'hi there' | fmt -w4
+hi
+there
+

The fmt command also allows you to join lines together that are shorter than the specified width. As mentioned before, paragraphs are taken into consideration, so empty lines will prevent merging. The -s option will disable line merging.

$ cat sample.txt
+ 1) Hello World
+ 2) 
+ 3) Hi there
+ 4) How are you
+ 5) 
+ 6) Just do-it
+ 7) Believe it
+ 8) 
+ 9) banana
+10) papaya
+11) mango
+12) 
+13) Much ado about nothing
+14) He he he
+15) Adios amigo
+
+# 'cut' here helps to ignore the first 4 characters of sample.txt
+$ cut -c5- sample.txt | fmt -w30
+Hello World
+
+Hi there How are you
+
+Just do-it Believe it
+
+banana papaya mango
+
+Much ado about nothing He he
+he Adios amigo
+

The -u option will change multiple spaces to a single space. Excess spacing between sentences will be changed to two spaces.

$ printf 'Hi    there.    Have a nice   day\n' | fmt -u
+Hi there.  Have a nice day
+

There are options that control indentation, formatting only lines with a specific prefix and so on. See fmt documentation for more details.

Exercises

info The exercises directory has all the files used in this section.

1) What's the default wrap length of the fold and fmt commands?

2) Fold the given stdin data at 9 bytes.

$ echo 'hi hello, how are you?' | ##### add your solution here
+hi hello,
+ how are 
+you?
+

3) Figure out the logic based on the given input and output data using the fold command.

$ cat ip.txt
+it is a warm and cozy day
+listen to what I say
+go play in the park
+come back before the sky turns dark
+
+There are so many delights to cherish
+Apple, Banana and Cherry
+Bread, Butter and Jelly
+Try them all before you perish
+
+##### add your solution here
+it is a 
+warm and 
+cozy day
+listen to 
+what I say
+

4) What does the fold -b option do?

5) How'd you get the expected output shown below?

# wrong output
+$ echo 'fig appleseed mango pomegranate' | fold -sw7
+fig 
+applese
+ed 
+mango 
+pomegra
+nate
+
+# expected output
+$ echo 'fig appleseed mango pomegranate' | ##### add your solution here
+fig
+appleseed
+mango
+pomegranate
+

6) What do the options -s and -u of the fmt command do?

\ No newline at end of file diff --git a/fonts/OPEN-SANS-LICENSE.txt b/fonts/OPEN-SANS-LICENSE.txt new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/fonts/OPEN-SANS-LICENSE.txt @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/fonts/SOURCE-CODE-PRO-LICENSE.txt b/fonts/SOURCE-CODE-PRO-LICENSE.txt new file mode 100644 index 0000000..366206f --- /dev/null +++ b/fonts/SOURCE-CODE-PRO-LICENSE.txt @@ -0,0 +1,93 @@ +Copyright 2010, 2012 Adobe Systems Incorporated (http://www.adobe.com/), with Reserved Font Name 'Source'. All Rights Reserved. Source is a trademark of Adobe Systems Incorporated in the United States and/or other countries. + +This Font Software is licensed under the SIL Open Font License, Version 1.1. +This license is copied below, and is also available with a FAQ at: +http://scripts.sil.org/OFL + + +----------------------------------------------------------- +SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007 +----------------------------------------------------------- + +PREAMBLE +The goals of the Open Font License (OFL) are to stimulate worldwide +development of collaborative font projects, to support the font creation +efforts of academic and linguistic communities, and to provide a free and +open framework in which fonts may be shared and improved in partnership +with others. + +The OFL allows the licensed fonts to be used, studied, modified and +redistributed freely as long as they are not sold by themselves. The +fonts, including any derivative works, can be bundled, embedded, +redistributed and/or sold with any software provided that any reserved +names are not used by derivative works. The fonts and derivatives, +however, cannot be released under any other type of license. The +requirement for fonts to remain under this license does not apply +to any document created using the fonts or their derivatives. + +DEFINITIONS +"Font Software" refers to the set of files released by the Copyright +Holder(s) under this license and clearly marked as such. This may +include source files, build scripts and documentation. + +"Reserved Font Name" refers to any names specified as such after the +copyright statement(s). + +"Original Version" refers to the collection of Font Software components as +distributed by the Copyright Holder(s). + +"Modified Version" refers to any derivative made by adding to, deleting, +or substituting -- in part or in whole -- any of the components of the +Original Version, by changing formats or by porting the Font Software to a +new environment. + +"Author" refers to any designer, engineer, programmer, technical +writer or other person who contributed to the Font Software. + +PERMISSION & CONDITIONS +Permission is hereby granted, free of charge, to any person obtaining +a copy of the Font Software, to use, study, copy, merge, embed, modify, +redistribute, and sell modified and unmodified copies of the Font +Software, subject to the following conditions: + +1) Neither the Font Software nor any of its individual components, +in Original or Modified Versions, may be sold by itself. + +2) Original or Modified Versions of the Font Software may be bundled, +redistributed and/or sold with any software, provided that each copy +contains the above copyright notice and this license. These can be +included either as stand-alone text files, human-readable headers or +in the appropriate machine-readable metadata fields within text or +binary files as long as those fields can be easily viewed by the user. + +3) No Modified Version of the Font Software may use the Reserved Font +Name(s) unless explicit written permission is granted by the corresponding +Copyright Holder. This restriction only applies to the primary font name as +presented to the users. + +4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font +Software shall not be used to promote, endorse or advertise any +Modified Version, except to acknowledge the contribution(s) of the +Copyright Holder(s) and the Author(s) or with their explicit written +permission. + +5) The Font Software, modified or unmodified, in part or in whole, +must be distributed entirely under this license, and must not be +distributed under any other license. The requirement for fonts to +remain under this license does not apply to any document created +using the Font Software. + +TERMINATION +This license becomes null and void if any of the above conditions are +not met. + +DISCLAIMER +THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT +OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE +COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, +INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL +DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM +OTHER DEALINGS IN THE FONT SOFTWARE. diff --git a/fonts/fonts.css b/fonts/fonts.css new file mode 100644 index 0000000..858efa5 --- /dev/null +++ b/fonts/fonts.css @@ -0,0 +1,100 @@ +/* Open Sans is licensed under the Apache License, Version 2.0. See http://www.apache.org/licenses/LICENSE-2.0 */ +/* Source Code Pro is under the Open Font License. See https://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL */ + +/* open-sans-300 - latin_vietnamese_latin-ext_greek-ext_greek_cyrillic-ext_cyrillic */ +@font-face { + font-family: 'Open Sans'; + font-style: normal; + font-weight: 300; + src: local('Open Sans Light'), local('OpenSans-Light'), + url('open-sans-v17-all-charsets-300.woff2') format('woff2'); +} + +/* open-sans-300italic - latin_vietnamese_latin-ext_greek-ext_greek_cyrillic-ext_cyrillic */ +@font-face { + font-family: 'Open Sans'; + font-style: italic; + font-weight: 300; + src: local('Open Sans Light Italic'), local('OpenSans-LightItalic'), + url('open-sans-v17-all-charsets-300italic.woff2') format('woff2'); +} + +/* open-sans-regular - latin_vietnamese_latin-ext_greek-ext_greek_cyrillic-ext_cyrillic */ +@font-face { + font-family: 'Open Sans'; + font-style: normal; + font-weight: 400; + src: local('Open Sans Regular'), local('OpenSans-Regular'), + url('open-sans-v17-all-charsets-regular.woff2') format('woff2'); +} + +/* open-sans-italic - latin_vietnamese_latin-ext_greek-ext_greek_cyrillic-ext_cyrillic */ +@font-face { + font-family: 'Open Sans'; + font-style: italic; + font-weight: 400; + src: local('Open Sans Italic'), local('OpenSans-Italic'), + url('open-sans-v17-all-charsets-italic.woff2') format('woff2'); +} + +/* open-sans-600 - latin_vietnamese_latin-ext_greek-ext_greek_cyrillic-ext_cyrillic */ +@font-face { + font-family: 'Open Sans'; + font-style: normal; + font-weight: 600; + src: local('Open Sans SemiBold'), local('OpenSans-SemiBold'), + url('open-sans-v17-all-charsets-600.woff2') format('woff2'); +} + +/* open-sans-600italic - latin_vietnamese_latin-ext_greek-ext_greek_cyrillic-ext_cyrillic */ +@font-face { + font-family: 'Open Sans'; + font-style: italic; + font-weight: 600; + src: local('Open Sans SemiBold Italic'), local('OpenSans-SemiBoldItalic'), + url('open-sans-v17-all-charsets-600italic.woff2') format('woff2'); +} + +/* open-sans-700 - latin_vietnamese_latin-ext_greek-ext_greek_cyrillic-ext_cyrillic */ +@font-face { + font-family: 'Open Sans'; + font-style: normal; + font-weight: 700; + src: local('Open Sans Bold'), local('OpenSans-Bold'), + url('open-sans-v17-all-charsets-700.woff2') format('woff2'); +} + +/* open-sans-700italic - latin_vietnamese_latin-ext_greek-ext_greek_cyrillic-ext_cyrillic */ +@font-face { + font-family: 'Open Sans'; + font-style: italic; + font-weight: 700; + src: local('Open Sans Bold Italic'), local('OpenSans-BoldItalic'), + url('open-sans-v17-all-charsets-700italic.woff2') format('woff2'); +} + +/* open-sans-800 - latin_vietnamese_latin-ext_greek-ext_greek_cyrillic-ext_cyrillic */ +@font-face { + font-family: 'Open Sans'; + font-style: normal; + font-weight: 800; + src: local('Open Sans ExtraBold'), local('OpenSans-ExtraBold'), + url('open-sans-v17-all-charsets-800.woff2') format('woff2'); +} + +/* open-sans-800italic - latin_vietnamese_latin-ext_greek-ext_greek_cyrillic-ext_cyrillic */ +@font-face { + font-family: 'Open Sans'; + font-style: italic; + font-weight: 800; + src: local('Open Sans ExtraBold Italic'), local('OpenSans-ExtraBoldItalic'), + url('open-sans-v17-all-charsets-800italic.woff2') format('woff2'); +} + +/* source-code-pro-500 - latin_vietnamese_latin-ext_greek_cyrillic-ext_cyrillic */ +@font-face { + font-family: 'Source Code Pro'; + font-style: normal; + font-weight: 500; + src: url('source-code-pro-v11-all-charsets-500.woff2') format('woff2'); +} diff --git a/fonts/open-sans-v17-all-charsets-300.woff2 b/fonts/open-sans-v17-all-charsets-300.woff2 new file mode 100644 index 0000000..9f51be3 Binary files /dev/null and b/fonts/open-sans-v17-all-charsets-300.woff2 differ diff --git a/fonts/open-sans-v17-all-charsets-300italic.woff2 b/fonts/open-sans-v17-all-charsets-300italic.woff2 new file mode 100644 index 0000000..2f54544 Binary files /dev/null and b/fonts/open-sans-v17-all-charsets-300italic.woff2 differ diff --git a/fonts/open-sans-v17-all-charsets-600.woff2 b/fonts/open-sans-v17-all-charsets-600.woff2 new file mode 100644 index 0000000..f503d55 Binary files /dev/null and b/fonts/open-sans-v17-all-charsets-600.woff2 differ diff --git a/fonts/open-sans-v17-all-charsets-600italic.woff2 b/fonts/open-sans-v17-all-charsets-600italic.woff2 new file mode 100644 index 0000000..c99aabe Binary files /dev/null and b/fonts/open-sans-v17-all-charsets-600italic.woff2 differ diff --git a/fonts/open-sans-v17-all-charsets-700.woff2 b/fonts/open-sans-v17-all-charsets-700.woff2 new file mode 100644 index 0000000..421a1ab Binary files /dev/null and b/fonts/open-sans-v17-all-charsets-700.woff2 differ diff --git a/fonts/open-sans-v17-all-charsets-700italic.woff2 b/fonts/open-sans-v17-all-charsets-700italic.woff2 new file mode 100644 index 0000000..12ce3d2 Binary files /dev/null and b/fonts/open-sans-v17-all-charsets-700italic.woff2 differ diff --git a/fonts/open-sans-v17-all-charsets-800.woff2 b/fonts/open-sans-v17-all-charsets-800.woff2 new file mode 100644 index 0000000..c94a223 Binary files /dev/null and b/fonts/open-sans-v17-all-charsets-800.woff2 differ diff --git a/fonts/open-sans-v17-all-charsets-800italic.woff2 b/fonts/open-sans-v17-all-charsets-800italic.woff2 new file mode 100644 index 0000000..eed7d3c Binary files /dev/null and b/fonts/open-sans-v17-all-charsets-800italic.woff2 differ diff --git a/fonts/open-sans-v17-all-charsets-italic.woff2 b/fonts/open-sans-v17-all-charsets-italic.woff2 new file mode 100644 index 0000000..398b68a Binary files /dev/null and b/fonts/open-sans-v17-all-charsets-italic.woff2 differ diff --git a/fonts/open-sans-v17-all-charsets-regular.woff2 b/fonts/open-sans-v17-all-charsets-regular.woff2 new file mode 100644 index 0000000..8383e94 Binary files /dev/null and b/fonts/open-sans-v17-all-charsets-regular.woff2 differ diff --git a/fonts/source-code-pro-v11-all-charsets-500.woff2 b/fonts/source-code-pro-v11-all-charsets-500.woff2 new file mode 100644 index 0000000..7222456 Binary files /dev/null and b/fonts/source-code-pro-v11-all-charsets-500.woff2 differ diff --git a/head-tail.html b/head-tail.html new file mode 100644 index 0000000..7585cb2 --- /dev/null +++ b/head-tail.html @@ -0,0 +1,208 @@ +head and tail - CLI text processing with GNU Coreutils

head and tail

cat is useful to view entire contents of files. Pagers like less can be used if you are working with large files (man pages for example). Sometimes though, you just want a peek at the starting or ending lines of input files. Or, you know the line numbers for the information you are looking for. In such cases, you can use head or tail or a combination of both these commands to extract the content you want.

Leading and trailing lines

Consider this sample file, with line numbers prefixed for convenience.

$ cat sample.txt
+ 1) Hello World
+ 2) 
+ 3) Hi there
+ 4) How are you
+ 5) 
+ 6) Just do-it
+ 7) Believe it
+ 8) 
+ 9) banana
+10) papaya
+11) mango
+12) 
+13) Much ado about nothing
+14) He he he
+15) Adios amigo
+

By default, head and tail will display the first and last 10 lines respectively.

$ head sample.txt
+ 1) Hello World
+ 2) 
+ 3) Hi there
+ 4) How are you
+ 5) 
+ 6) Just do-it
+ 7) Believe it
+ 8) 
+ 9) banana
+10) papaya
+
+$ tail sample.txt
+ 6) Just do-it
+ 7) Believe it
+ 8) 
+ 9) banana
+10) papaya
+11) mango
+12) 
+13) Much ado about nothing
+14) He he he
+15) Adios amigo
+

If there are less than 10 lines in the input, only those lines will be displayed.

# seq command will be discussed in detail later, generates 1 to 3 here
+# same as: seq 3 | tail
+$ seq 3 | head
+1
+2
+3
+

You can use the -nN option to customize the number of lines (N) needed.

# first three lines
+# space between -n and N is optional
+$ head -n3 sample.txt
+ 1) Hello World
+ 2) 
+ 3) Hi there
+
+# last two lines
+$ tail -n2 sample.txt
+14) He he he
+15) Adios amigo
+

Excluding the last N lines

By using head -n -N, you can get all the input lines except the ones you'll get when you use the tail -nN command.

# except the last 11 lines
+# space between -n and -N is optional
+$ head -n -11 sample.txt
+ 1) Hello World
+ 2) 
+ 3) Hi there
+ 4) How are you
+

Starting from the Nth line

By using tail -n +N, you can get all the input lines except the ones you'll get when you use the head -n(N-1) command.

# all lines starting from the 11th line
+# space between -n and +N is optional
+$ tail -n +11 sample.txt
+11) mango
+12) 
+13) Much ado about nothing
+14) He he he
+15) Adios amigo
+

Multiple input files

If you pass multiple input files to the head and tail commands, each file will be processed separately. By default, the output is nicely formatted with filename headers and empty line separators.

$ seq 2 | head -n1 greeting.txt -
+==> greeting.txt <==
+Hi there
+
+==> standard input <==
+1
+

You can use the -q option to avoid filename headers and empty line separators.

$ tail -q -n2 sample.txt nums.txt
+14) He he he
+15) Adios amigo
+42
+1000
+

Byte selection

The -c option works similar to the -n option, but with bytes instead of lines. In the below examples, the shell prompt at the end of the output aren't shown for illustration purposes.

# first three characters
+$ printf 'apple pie' | head -c3
+app
+
+# last three characters
+$ printf 'apple pie' | tail -c3
+pie
+
+# excluding the last four characters
+$ printf 'car\njeep\nbus\n' | head -c -4
+car
+jeep
+
+# all characters starting from the fifth character
+$ printf 'car\njeep\nbus\n' | tail -c +5
+jeep
+bus
+

Since -c works byte wise, it may not be suitable for multibyte characters:

# all input characters in this example occupy two bytes each
+$ printf 'αλεπού' | head -c2
+α
+
+# g̈ requires three bytes
+$ printf 'cag̈e' | tail -c4
+g̈e
+

Range of lines

You can select a range of lines by combining both the head and tail commands.

# 9th to 11th lines
+# same as: head -n11 sample.txt | tail -n +9
+$ tail -n +9 sample.txt | head -n3
+ 9) banana
+10) papaya
+11) mango
+
+# 6th to 7th lines
+# same as: tail -n +6 sample.txt | head -n2
+$ head -n7 sample.txt | tail -n +6
+ 6) Just do-it
+ 7) Believe it
+

info See unix.stackexchange: line X to line Y on a huge file for performance comparison with other commands like sed, awk, etc.

NUL separator

The -z option sets the NUL character as the line separator instead of the newline character.

$ printf 'car\0jeep\0bus\0' | head -z -n2 | cat -v
+car^@jeep^@
+
+$ printf 'car\0jeep\0bus\0' | tail -z -n2 | cat -v
+jeep^@bus^@
+

Further Reading

Exercises

info The exercises directory has all the files used in this section.

1) Use appropriate commands and shell features to get the output shown below.

$ printf 'carpet\njeep\nbus\n'
+carpet
+jeep
+bus
+
+# use the above 'printf' command for input data
+$ c=##### add your solution here
+$ echo "$c"
+car
+

2) How would you display all the input lines except the first one?

$ printf 'apple\nfig\ncarpet\njeep\nbus\n' | ##### add your solution here
+fig
+carpet
+jeep
+bus
+

3) Which command would you use to get the output shown below?

$ cat fruits.txt
+banana
+papaya
+mango
+$ cat blocks.txt
+----
+apple--banana
+mango---fig
+----
+3.14
+-42
+1000
+----
+sky blue
+dark green
+----
+hi hello
+
+##### add your solution here
+==> fruits.txt <==
+banana
+papaya
+
+==> blocks.txt <==
+----
+apple--banana
+

4) Use a combination of head and tail commands to get the 11th to 14th characters from the given input.

$ printf 'apple\nfig\ncarpet\njeep\nbus\n' | ##### add your solution here
+carp
+

5) Extract the starting six bytes from the input files ip.txt and fruits.txt.

##### add your solution here
+it is banana
+

6) Extract the last six bytes from the input files fruits.txt and ip.txt.

##### add your solution here
+mango
+erish
+

7) For the input file ip.txt, display except the last 5 lines.

##### add your solution here
+it is a warm and cozy day
+listen to what I say
+go play in the park
+come back before the sky turns dark
+

8) Display the third line from the given stdin data. Consider the NUL character as the line separator.

$ printf 'apple\0fig\0carpet\0jeep\0bus\0' | ##### add your solution here
+carpet
+
\ No newline at end of file diff --git a/highlight.css b/highlight.css new file mode 100644 index 0000000..ba57b82 --- /dev/null +++ b/highlight.css @@ -0,0 +1,82 @@ +/* + * An increased contrast highlighting scheme loosely based on the + * "Base16 Atelier Dune Light" theme by Bram de Haan + * (http://atelierbram.github.io/syntax-highlighting/atelier-schemes/dune) + * Original Base16 color scheme by Chris Kempson + * (https://github.com/chriskempson/base16) + */ + +/* Comment */ +.hljs-comment, +.hljs-quote { + color: #575757; +} + +/* Red */ +.hljs-variable, +.hljs-template-variable, +.hljs-attribute, +.hljs-tag, +.hljs-name, +.hljs-regexp, +.hljs-link, +.hljs-name, +.hljs-selector-id, +.hljs-selector-class { + color: #d70025; +} + +/* Orange */ +.hljs-number, +.hljs-meta, +.hljs-built_in, +.hljs-builtin-name, +.hljs-literal, +.hljs-type, +.hljs-params { + color: #b21e00; +} + +/* Green */ +.hljs-string, +.hljs-symbol, +.hljs-bullet { + color: #008200; +} + +/* Blue */ +.hljs-title, +.hljs-section { + color: #0030f2; +} + +/* Purple */ +.hljs-keyword, +.hljs-selector-tag { + color: #9d00ec; +} + +.hljs { + display: block; + overflow-x: auto; + background: #f6f7f6; + color: #000; +} + +.hljs-emphasis { + font-style: italic; +} + +.hljs-strong { + font-weight: bold; +} + +.hljs-addition { + color: #22863a; + background-color: #f0fff4; +} + +.hljs-deletion { + color: #b31d28; + background-color: #ffeef0; +} diff --git a/highlight.js b/highlight.js new file mode 100644 index 0000000..180385b --- /dev/null +++ b/highlight.js @@ -0,0 +1,6 @@ +/* + Highlight.js 10.1.1 (93fd0d73) + License: BSD-3-Clause + Copyright (c) 2006-2020, Ivan Sagalaev +*/ +var hljs=function(){"use strict";function e(n){Object.freeze(n);var t="function"==typeof n;return Object.getOwnPropertyNames(n).forEach((function(r){!Object.hasOwnProperty.call(n,r)||null===n[r]||"object"!=typeof n[r]&&"function"!=typeof n[r]||t&&("caller"===r||"callee"===r||"arguments"===r)||Object.isFrozen(n[r])||e(n[r])})),n}class n{constructor(e){void 0===e.data&&(e.data={}),this.data=e.data}ignoreMatch(){this.ignore=!0}}function t(e){return e.replace(/&/g,"&").replace(//g,">").replace(/"/g,""").replace(/'/g,"'")}function r(e,...n){var t={};for(const n in e)t[n]=e[n];return n.forEach((function(e){for(const n in e)t[n]=e[n]})),t}function a(e){return e.nodeName.toLowerCase()}var i=Object.freeze({__proto__:null,escapeHTML:t,inherit:r,nodeStream:function(e){var n=[];return function e(t,r){for(var i=t.firstChild;i;i=i.nextSibling)3===i.nodeType?r+=i.nodeValue.length:1===i.nodeType&&(n.push({event:"start",offset:r,node:i}),r=e(i,r),a(i).match(/br|hr|img|input/)||n.push({event:"stop",offset:r,node:i}));return r}(e,0),n},mergeStreams:function(e,n,r){var i=0,s="",o=[];function l(){return e.length&&n.length?e[0].offset!==n[0].offset?e[0].offset"}function u(e){s+=""}function d(e){("start"===e.event?c:u)(e.node)}for(;e.length||n.length;){var g=l();if(s+=t(r.substring(i,g[0].offset)),i=g[0].offset,g===e){o.reverse().forEach(u);do{d(g.splice(0,1)[0]),g=l()}while(g===e&&g.length&&g[0].offset===i);o.reverse().forEach(c)}else"start"===g[0].event?o.push(g[0].node):o.pop(),d(g.splice(0,1)[0])}return s+t(r.substr(i))}});const s="",o=e=>!!e.kind;class l{constructor(e,n){this.buffer="",this.classPrefix=n.classPrefix,e.walk(this)}addText(e){this.buffer+=t(e)}openNode(e){if(!o(e))return;let n=e.kind;e.sublanguage||(n=`${this.classPrefix}${n}`),this.span(n)}closeNode(e){o(e)&&(this.buffer+=s)}value(){return this.buffer}span(e){this.buffer+=``}}class c{constructor(){this.rootNode={children:[]},this.stack=[this.rootNode]}get top(){return this.stack[this.stack.length-1]}get root(){return this.rootNode}add(e){this.top.children.push(e)}openNode(e){const n={kind:e,children:[]};this.add(n),this.stack.push(n)}closeNode(){if(this.stack.length>1)return this.stack.pop()}closeAllNodes(){for(;this.closeNode(););}toJSON(){return JSON.stringify(this.rootNode,null,4)}walk(e){return this.constructor._walk(e,this.rootNode)}static _walk(e,n){return"string"==typeof n?e.addText(n):n.children&&(e.openNode(n),n.children.forEach(n=>this._walk(e,n)),e.closeNode(n)),e}static _collapse(e){"string"!=typeof e&&e.children&&(e.children.every(e=>"string"==typeof e)?e.children=[e.children.join("")]:e.children.forEach(e=>{c._collapse(e)}))}}class u extends c{constructor(e){super(),this.options=e}addKeyword(e,n){""!==e&&(this.openNode(n),this.addText(e),this.closeNode())}addText(e){""!==e&&this.add(e)}addSublanguage(e,n){const t=e.root;t.kind=n,t.sublanguage=!0,this.add(t)}toHTML(){return new l(this,this.options).value()}finalize(){return!0}}function d(e){return e?"string"==typeof e?e:e.source:null}const g="(-?)(\\b0[xX][a-fA-F0-9]+|(\\b\\d+(\\.\\d*)?|\\.\\d+)([eE][-+]?\\d+)?)",h={begin:"\\\\[\\s\\S]",relevance:0},f={className:"string",begin:"'",end:"'",illegal:"\\n",contains:[h]},p={className:"string",begin:'"',end:'"',illegal:"\\n",contains:[h]},b={begin:/\b(a|an|the|are|I'm|isn't|don't|doesn't|won't|but|just|should|pretty|simply|enough|gonna|going|wtf|so|such|will|you|your|they|like|more)\b/},m=function(e,n,t={}){var a=r({className:"comment",begin:e,end:n,contains:[]},t);return a.contains.push(b),a.contains.push({className:"doctag",begin:"(?:TODO|FIXME|NOTE|BUG|OPTIMIZE|HACK|XXX):",relevance:0}),a},v=m("//","$"),x=m("/\\*","\\*/"),E=m("#","$");var _=Object.freeze({__proto__:null,IDENT_RE:"[a-zA-Z]\\w*",UNDERSCORE_IDENT_RE:"[a-zA-Z_]\\w*",NUMBER_RE:"\\b\\d+(\\.\\d+)?",C_NUMBER_RE:g,BINARY_NUMBER_RE:"\\b(0b[01]+)",RE_STARTERS_RE:"!|!=|!==|%|%=|&|&&|&=|\\*|\\*=|\\+|\\+=|,|-|-=|/=|/|:|;|<<|<<=|<=|<|===|==|=|>>>=|>>=|>=|>>>|>>|>|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~",SHEBANG:(e={})=>{const n=/^#![ ]*\//;return e.binary&&(e.begin=function(...e){return e.map(e=>d(e)).join("")}(n,/.*\b/,e.binary,/\b.*/)),r({className:"meta",begin:n,end:/$/,relevance:0,"on:begin":(e,n)=>{0!==e.index&&n.ignoreMatch()}},e)},BACKSLASH_ESCAPE:h,APOS_STRING_MODE:f,QUOTE_STRING_MODE:p,PHRASAL_WORDS_MODE:b,COMMENT:m,C_LINE_COMMENT_MODE:v,C_BLOCK_COMMENT_MODE:x,HASH_COMMENT_MODE:E,NUMBER_MODE:{className:"number",begin:"\\b\\d+(\\.\\d+)?",relevance:0},C_NUMBER_MODE:{className:"number",begin:g,relevance:0},BINARY_NUMBER_MODE:{className:"number",begin:"\\b(0b[01]+)",relevance:0},CSS_NUMBER_MODE:{className:"number",begin:"\\b\\d+(\\.\\d+)?(%|em|ex|ch|rem|vw|vh|vmin|vmax|cm|mm|in|pt|pc|px|deg|grad|rad|turn|s|ms|Hz|kHz|dpi|dpcm|dppx)?",relevance:0},REGEXP_MODE:{begin:/(?=\/[^/\n]*\/)/,contains:[{className:"regexp",begin:/\//,end:/\/[gimuy]*/,illegal:/\n/,contains:[h,{begin:/\[/,end:/\]/,relevance:0,contains:[h]}]}]},TITLE_MODE:{className:"title",begin:"[a-zA-Z]\\w*",relevance:0},UNDERSCORE_TITLE_MODE:{className:"title",begin:"[a-zA-Z_]\\w*",relevance:0},METHOD_GUARD:{begin:"\\.\\s*[a-zA-Z_]\\w*",relevance:0},END_SAME_AS_BEGIN:function(e){return Object.assign(e,{"on:begin":(e,n)=>{n.data._beginMatch=e[1]},"on:end":(e,n)=>{n.data._beginMatch!==e[1]&&n.ignoreMatch()}})}}),N="of and for in not or if then".split(" ");function w(e,n){return n?+n:function(e){return N.includes(e.toLowerCase())}(e)?0:1}const R=t,y=r,{nodeStream:k,mergeStreams:O}=i,M=Symbol("nomatch");return function(t){var a=[],i={},s={},o=[],l=!0,c=/(^(<[^>]+>|\t|)+|\n)/gm,g="Could not find the language '{}', did you forget to load/include a language module?";const h={disableAutodetect:!0,name:"Plain text",contains:[]};var f={noHighlightRe:/^(no-?highlight)$/i,languageDetectRe:/\blang(?:uage)?-([\w-]+)\b/i,classPrefix:"hljs-",tabReplace:null,useBR:!1,languages:null,__emitter:u};function p(e){return f.noHighlightRe.test(e)}function b(e,n,t,r){var a={code:n,language:e};S("before:highlight",a);var i=a.result?a.result:m(a.language,a.code,t,r);return i.code=a.code,S("after:highlight",i),i}function m(e,t,a,s){var o=t;function c(e,n){var t=E.case_insensitive?n[0].toLowerCase():n[0];return Object.prototype.hasOwnProperty.call(e.keywords,t)&&e.keywords[t]}function u(){null!=y.subLanguage?function(){if(""!==A){var e=null;if("string"==typeof y.subLanguage){if(!i[y.subLanguage])return void O.addText(A);e=m(y.subLanguage,A,!0,k[y.subLanguage]),k[y.subLanguage]=e.top}else e=v(A,y.subLanguage.length?y.subLanguage:null);y.relevance>0&&(I+=e.relevance),O.addSublanguage(e.emitter,e.language)}}():function(){if(!y.keywords)return void O.addText(A);let e=0;y.keywordPatternRe.lastIndex=0;let n=y.keywordPatternRe.exec(A),t="";for(;n;){t+=A.substring(e,n.index);const r=c(y,n);if(r){const[e,a]=r;O.addText(t),t="",I+=a,O.addKeyword(n[0],e)}else t+=n[0];e=y.keywordPatternRe.lastIndex,n=y.keywordPatternRe.exec(A)}t+=A.substr(e),O.addText(t)}(),A=""}function h(e){return e.className&&O.openNode(e.className),y=Object.create(e,{parent:{value:y}})}function p(e){return 0===y.matcher.regexIndex?(A+=e[0],1):(L=!0,0)}var b={};function x(t,r){var i=r&&r[0];if(A+=t,null==i)return u(),0;if("begin"===b.type&&"end"===r.type&&b.index===r.index&&""===i){if(A+=o.slice(r.index,r.index+1),!l){const n=Error("0 width match regex");throw n.languageName=e,n.badRule=b.rule,n}return 1}if(b=r,"begin"===r.type)return function(e){var t=e[0],r=e.rule;const a=new n(r),i=[r.__beforeBegin,r["on:begin"]];for(const n of i)if(n&&(n(e,a),a.ignore))return p(t);return r&&r.endSameAsBegin&&(r.endRe=RegExp(t.replace(/[-/\\^$*+?.()|[\]{}]/g,"\\$&"),"m")),r.skip?A+=t:(r.excludeBegin&&(A+=t),u(),r.returnBegin||r.excludeBegin||(A=t)),h(r),r.returnBegin?0:t.length}(r);if("illegal"===r.type&&!a){const e=Error('Illegal lexeme "'+i+'" for mode "'+(y.className||"")+'"');throw e.mode=y,e}if("end"===r.type){var s=function(e){var t=e[0],r=o.substr(e.index),a=function e(t,r,a){let i=function(e,n){var t=e&&e.exec(n);return t&&0===t.index}(t.endRe,a);if(i){if(t["on:end"]){const e=new n(t);t["on:end"](r,e),e.ignore&&(i=!1)}if(i){for(;t.endsParent&&t.parent;)t=t.parent;return t}}if(t.endsWithParent)return e(t.parent,r,a)}(y,e,r);if(!a)return M;var i=y;i.skip?A+=t:(i.returnEnd||i.excludeEnd||(A+=t),u(),i.excludeEnd&&(A=t));do{y.className&&O.closeNode(),y.skip||y.subLanguage||(I+=y.relevance),y=y.parent}while(y!==a.parent);return a.starts&&(a.endSameAsBegin&&(a.starts.endRe=a.endRe),h(a.starts)),i.returnEnd?0:t.length}(r);if(s!==M)return s}if("illegal"===r.type&&""===i)return 1;if(B>1e5&&B>3*r.index)throw Error("potential infinite loop, way more iterations than matches");return A+=i,i.length}var E=T(e);if(!E)throw console.error(g.replace("{}",e)),Error('Unknown language: "'+e+'"');var _=function(e){function n(n,t){return RegExp(d(n),"m"+(e.case_insensitive?"i":"")+(t?"g":""))}class t{constructor(){this.matchIndexes={},this.regexes=[],this.matchAt=1,this.position=0}addRule(e,n){n.position=this.position++,this.matchIndexes[this.matchAt]=n,this.regexes.push([n,e]),this.matchAt+=function(e){return RegExp(e.toString()+"|").exec("").length-1}(e)+1}compile(){0===this.regexes.length&&(this.exec=()=>null);const e=this.regexes.map(e=>e[1]);this.matcherRe=n(function(e,n="|"){for(var t=/\[(?:[^\\\]]|\\.)*\]|\(\??|\\([1-9][0-9]*)|\\./,r=0,a="",i=0;i0&&(a+=n),a+="(";o.length>0;){var l=t.exec(o);if(null==l){a+=o;break}a+=o.substring(0,l.index),o=o.substring(l.index+l[0].length),"\\"===l[0][0]&&l[1]?a+="\\"+(+l[1]+s):(a+=l[0],"("===l[0]&&r++)}a+=")"}return a}(e),!0),this.lastIndex=0}exec(e){this.matcherRe.lastIndex=this.lastIndex;const n=this.matcherRe.exec(e);if(!n)return null;const t=n.findIndex((e,n)=>n>0&&void 0!==e),r=this.matchIndexes[t];return n.splice(0,t),Object.assign(n,r)}}class a{constructor(){this.rules=[],this.multiRegexes=[],this.count=0,this.lastIndex=0,this.regexIndex=0}getMatcher(e){if(this.multiRegexes[e])return this.multiRegexes[e];const n=new t;return this.rules.slice(e).forEach(([e,t])=>n.addRule(e,t)),n.compile(),this.multiRegexes[e]=n,n}considerAll(){this.regexIndex=0}addRule(e,n){this.rules.push([e,n]),"begin"===n.type&&this.count++}exec(e){const n=this.getMatcher(this.regexIndex);n.lastIndex=this.lastIndex;const t=n.exec(e);return t&&(this.regexIndex+=t.position+1,this.regexIndex===this.count&&(this.regexIndex=0)),t}}function i(e,n){const t=e.input[e.index-1],r=e.input[e.index+e[0].length];"."!==t&&"."!==r||n.ignoreMatch()}if(e.contains&&e.contains.includes("self"))throw Error("ERR: contains `self` is not supported at the top-level of a language. See documentation.");return function t(s,o){const l=s;if(s.compiled)return l;s.compiled=!0,s.__beforeBegin=null,s.keywords=s.keywords||s.beginKeywords;let c=null;if("object"==typeof s.keywords&&(c=s.keywords.$pattern,delete s.keywords.$pattern),s.keywords&&(s.keywords=function(e,n){var t={};return"string"==typeof e?r("keyword",e):Object.keys(e).forEach((function(n){r(n,e[n])})),t;function r(e,r){n&&(r=r.toLowerCase()),r.split(" ").forEach((function(n){var r=n.split("|");t[r[0]]=[e,w(r[0],r[1])]}))}}(s.keywords,e.case_insensitive)),s.lexemes&&c)throw Error("ERR: Prefer `keywords.$pattern` to `mode.lexemes`, BOTH are not allowed. (see mode reference) ");return l.keywordPatternRe=n(s.lexemes||c||/\w+/,!0),o&&(s.beginKeywords&&(s.begin="\\b("+s.beginKeywords.split(" ").join("|")+")(?=\\b|\\s)",s.__beforeBegin=i),s.begin||(s.begin=/\B|\b/),l.beginRe=n(s.begin),s.endSameAsBegin&&(s.end=s.begin),s.end||s.endsWithParent||(s.end=/\B|\b/),s.end&&(l.endRe=n(s.end)),l.terminator_end=d(s.end)||"",s.endsWithParent&&o.terminator_end&&(l.terminator_end+=(s.end?"|":"")+o.terminator_end)),s.illegal&&(l.illegalRe=n(s.illegal)),void 0===s.relevance&&(s.relevance=1),s.contains||(s.contains=[]),s.contains=[].concat(...s.contains.map((function(e){return function(e){return e.variants&&!e.cached_variants&&(e.cached_variants=e.variants.map((function(n){return r(e,{variants:null},n)}))),e.cached_variants?e.cached_variants:function e(n){return!!n&&(n.endsWithParent||e(n.starts))}(e)?r(e,{starts:e.starts?r(e.starts):null}):Object.isFrozen(e)?r(e):e}("self"===e?s:e)}))),s.contains.forEach((function(e){t(e,l)})),s.starts&&t(s.starts,o),l.matcher=function(e){const n=new a;return e.contains.forEach(e=>n.addRule(e.begin,{rule:e,type:"begin"})),e.terminator_end&&n.addRule(e.terminator_end,{type:"end"}),e.illegal&&n.addRule(e.illegal,{type:"illegal"}),n}(l),l}(e)}(E),N="",y=s||_,k={},O=new f.__emitter(f);!function(){for(var e=[],n=y;n!==E;n=n.parent)n.className&&e.unshift(n.className);e.forEach(e=>O.openNode(e))}();var A="",I=0,S=0,B=0,L=!1;try{for(y.matcher.considerAll();;){B++,L?L=!1:(y.matcher.lastIndex=S,y.matcher.considerAll());const e=y.matcher.exec(o);if(!e)break;const n=x(o.substring(S,e.index),e);S=e.index+n}return x(o.substr(S)),O.closeAllNodes(),O.finalize(),N=O.toHTML(),{relevance:I,value:N,language:e,illegal:!1,emitter:O,top:y}}catch(n){if(n.message&&n.message.includes("Illegal"))return{illegal:!0,illegalBy:{msg:n.message,context:o.slice(S-100,S+100),mode:n.mode},sofar:N,relevance:0,value:R(o),emitter:O};if(l)return{illegal:!1,relevance:0,value:R(o),emitter:O,language:e,top:y,errorRaised:n};throw n}}function v(e,n){n=n||f.languages||Object.keys(i);var t=function(e){const n={relevance:0,emitter:new f.__emitter(f),value:R(e),illegal:!1,top:h};return n.emitter.addText(e),n}(e),r=t;return n.filter(T).filter(I).forEach((function(n){var a=m(n,e,!1);a.language=n,a.relevance>r.relevance&&(r=a),a.relevance>t.relevance&&(r=t,t=a)})),r.language&&(t.second_best=r),t}function x(e){return f.tabReplace||f.useBR?e.replace(c,e=>"\n"===e?f.useBR?"
":e:f.tabReplace?e.replace(/\t/g,f.tabReplace):e):e}function E(e){let n=null;const t=function(e){var n=e.className+" ";n+=e.parentNode?e.parentNode.className:"";const t=f.languageDetectRe.exec(n);if(t){var r=T(t[1]);return r||(console.warn(g.replace("{}",t[1])),console.warn("Falling back to no-highlight mode for this block.",e)),r?t[1]:"no-highlight"}return n.split(/\s+/).find(e=>p(e)||T(e))}(e);if(p(t))return;S("before:highlightBlock",{block:e,language:t}),f.useBR?(n=document.createElement("div")).innerHTML=e.innerHTML.replace(/\n/g,"").replace(//g,"\n"):n=e;const r=n.textContent,a=t?b(t,r,!0):v(r),i=k(n);if(i.length){const e=document.createElement("div");e.innerHTML=a.value,a.value=O(i,k(e),r)}a.value=x(a.value),S("after:highlightBlock",{block:e,result:a}),e.innerHTML=a.value,e.className=function(e,n,t){var r=n?s[n]:t,a=[e.trim()];return e.match(/\bhljs\b/)||a.push("hljs"),e.includes(r)||a.push(r),a.join(" ").trim()}(e.className,t,a.language),e.result={language:a.language,re:a.relevance,relavance:a.relevance},a.second_best&&(e.second_best={language:a.second_best.language,re:a.second_best.relevance,relavance:a.second_best.relevance})}const N=()=>{if(!N.called){N.called=!0;var e=document.querySelectorAll("pre code");a.forEach.call(e,E)}};function T(e){return e=(e||"").toLowerCase(),i[e]||i[s[e]]}function A(e,{languageName:n}){"string"==typeof e&&(e=[e]),e.forEach(e=>{s[e]=n})}function I(e){var n=T(e);return n&&!n.disableAutodetect}function S(e,n){var t=e;o.forEach((function(e){e[t]&&e[t](n)}))}Object.assign(t,{highlight:b,highlightAuto:v,fixMarkup:x,highlightBlock:E,configure:function(e){f=y(f,e)},initHighlighting:N,initHighlightingOnLoad:function(){window.addEventListener("DOMContentLoaded",N,!1)},registerLanguage:function(e,n){var r=null;try{r=n(t)}catch(n){if(console.error("Language definition for '{}' could not be registered.".replace("{}",e)),!l)throw n;console.error(n),r=h}r.name||(r.name=e),i[e]=r,r.rawDefinition=n.bind(null,t),r.aliases&&A(r.aliases,{languageName:e})},listLanguages:function(){return Object.keys(i)},getLanguage:T,registerAliases:A,requireLanguage:function(e){var n=T(e);if(n)return n;throw Error("The '{}' language is required, but not loaded.".replace("{}",e))},autoDetection:I,inherit:y,addPlugin:function(e){o.push(e)}}),t.debugMode=function(){l=!1},t.safeMode=function(){l=!0},t.versionString="10.1.1";for(const n in _)"object"==typeof _[n]&&e(_[n]);return Object.assign(t,_),t}({})}();"object"==typeof exports&&"undefined"!=typeof module&&(module.exports=hljs);hljs.registerLanguage("php",function(){"use strict";return function(e){var r={begin:"\\$+[a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*"},t={className:"meta",variants:[{begin:/<\?php/,relevance:10},{begin:/<\?[=]?/},{begin:/\?>/}]},a={className:"string",contains:[e.BACKSLASH_ESCAPE,t],variants:[{begin:'b"',end:'"'},{begin:"b'",end:"'"},e.inherit(e.APOS_STRING_MODE,{illegal:null}),e.inherit(e.QUOTE_STRING_MODE,{illegal:null})]},n={variants:[e.BINARY_NUMBER_MODE,e.C_NUMBER_MODE]},i={keyword:"__CLASS__ __DIR__ __FILE__ __FUNCTION__ __LINE__ __METHOD__ __NAMESPACE__ __TRAIT__ die echo exit include include_once print require require_once array abstract and as binary bool boolean break callable case catch class clone const continue declare default do double else elseif empty enddeclare endfor endforeach endif endswitch endwhile eval extends final finally float for foreach from global goto if implements instanceof insteadof int integer interface isset iterable list new object or private protected public real return string switch throw trait try unset use var void while xor yield",literal:"false null true",built_in:"Error|0 AppendIterator ArgumentCountError ArithmeticError ArrayIterator ArrayObject AssertionError BadFunctionCallException BadMethodCallException CachingIterator CallbackFilterIterator CompileError Countable DirectoryIterator DivisionByZeroError DomainException EmptyIterator ErrorException Exception FilesystemIterator FilterIterator GlobIterator InfiniteIterator InvalidArgumentException IteratorIterator LengthException LimitIterator LogicException MultipleIterator NoRewindIterator OutOfBoundsException OutOfRangeException OuterIterator OverflowException ParentIterator ParseError RangeException RecursiveArrayIterator RecursiveCachingIterator RecursiveCallbackFilterIterator RecursiveDirectoryIterator RecursiveFilterIterator RecursiveIterator RecursiveIteratorIterator RecursiveRegexIterator RecursiveTreeIterator RegexIterator RuntimeException SeekableIterator SplDoublyLinkedList SplFileInfo SplFileObject SplFixedArray SplHeap SplMaxHeap SplMinHeap SplObjectStorage SplObserver SplObserver SplPriorityQueue SplQueue SplStack SplSubject SplSubject SplTempFileObject TypeError UnderflowException UnexpectedValueException ArrayAccess Closure Generator Iterator IteratorAggregate Serializable Throwable Traversable WeakReference Directory __PHP_Incomplete_Class parent php_user_filter self static stdClass"};return{aliases:["php","php3","php4","php5","php6","php7"],case_insensitive:!0,keywords:i,contains:[e.HASH_COMMENT_MODE,e.COMMENT("//","$",{contains:[t]}),e.COMMENT("/\\*","\\*/",{contains:[{className:"doctag",begin:"@[A-Za-z]+"}]}),e.COMMENT("__halt_compiler.+?;",!1,{endsWithParent:!0,keywords:"__halt_compiler"}),{className:"string",begin:/<<<['"]?\w+['"]?$/,end:/^\w+;?$/,contains:[e.BACKSLASH_ESCAPE,{className:"subst",variants:[{begin:/\$\w+/},{begin:/\{\$/,end:/\}/}]}]},t,{className:"keyword",begin:/\$this\b/},r,{begin:/(::|->)+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/},{className:"function",beginKeywords:"fn function",end:/[;{]/,excludeEnd:!0,illegal:"[$%\\[]",contains:[e.UNDERSCORE_TITLE_MODE,{className:"params",begin:"\\(",end:"\\)",excludeBegin:!0,excludeEnd:!0,keywords:i,contains:["self",r,e.C_BLOCK_COMMENT_MODE,a,n]}]},{className:"class",beginKeywords:"class interface",end:"{",excludeEnd:!0,illegal:/[:\(\$"]/,contains:[{beginKeywords:"extends implements"},e.UNDERSCORE_TITLE_MODE]},{beginKeywords:"namespace",end:";",illegal:/[\.']/,contains:[e.UNDERSCORE_TITLE_MODE]},{beginKeywords:"use",end:";",contains:[e.UNDERSCORE_TITLE_MODE]},{begin:"=>"},a,n]}}}());hljs.registerLanguage("nginx",function(){"use strict";return function(e){var n={className:"variable",variants:[{begin:/\$\d+/},{begin:/\$\{/,end:/}/},{begin:"[\\$\\@]"+e.UNDERSCORE_IDENT_RE}]},a={endsWithParent:!0,keywords:{$pattern:"[a-z/_]+",literal:"on off yes no true false none blocked debug info notice warn error crit select break last permanent redirect kqueue rtsig epoll poll /dev/poll"},relevance:0,illegal:"=>",contains:[e.HASH_COMMENT_MODE,{className:"string",contains:[e.BACKSLASH_ESCAPE,n],variants:[{begin:/"/,end:/"/},{begin:/'/,end:/'/}]},{begin:"([a-z]+):/",end:"\\s",endsWithParent:!0,excludeEnd:!0,contains:[n]},{className:"regexp",contains:[e.BACKSLASH_ESCAPE,n],variants:[{begin:"\\s\\^",end:"\\s|{|;",returnEnd:!0},{begin:"~\\*?\\s+",end:"\\s|{|;",returnEnd:!0},{begin:"\\*(\\.[a-z\\-]+)+"},{begin:"([a-z\\-]+\\.)+\\*"}]},{className:"number",begin:"\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}(:\\d{1,5})?\\b"},{className:"number",begin:"\\b\\d+[kKmMgGdshdwy]*\\b",relevance:0},n]};return{name:"Nginx config",aliases:["nginxconf"],contains:[e.HASH_COMMENT_MODE,{begin:e.UNDERSCORE_IDENT_RE+"\\s+{",returnBegin:!0,end:"{",contains:[{className:"section",begin:e.UNDERSCORE_IDENT_RE}],relevance:0},{begin:e.UNDERSCORE_IDENT_RE+"\\s",end:";|{",returnBegin:!0,contains:[{className:"attribute",begin:e.UNDERSCORE_IDENT_RE,starts:a}],relevance:0}],illegal:"[^\\s\\}]"}}}());hljs.registerLanguage("csharp",function(){"use strict";return function(e){var n={keyword:"abstract as base bool break byte case catch char checked const continue decimal default delegate do double enum event explicit extern finally fixed float for foreach goto if implicit in int interface internal is lock long object operator out override params private protected public readonly ref sbyte sealed short sizeof stackalloc static string struct switch this try typeof uint ulong unchecked unsafe ushort using virtual void volatile while add alias ascending async await by descending dynamic equals from get global group into join let nameof on orderby partial remove select set value var when where yield",literal:"null false true"},i=e.inherit(e.TITLE_MODE,{begin:"[a-zA-Z](\\.?\\w)*"}),a={className:"number",variants:[{begin:"\\b(0b[01']+)"},{begin:"(-?)\\b([\\d']+(\\.[\\d']*)?|\\.[\\d']+)(u|U|l|L|ul|UL|f|F|b|B)"},{begin:"(-?)(\\b0[xX][a-fA-F0-9']+|(\\b[\\d']+(\\.[\\d']*)?|\\.[\\d']+)([eE][-+]?[\\d']+)?)"}],relevance:0},s={className:"string",begin:'@"',end:'"',contains:[{begin:'""'}]},t=e.inherit(s,{illegal:/\n/}),l={className:"subst",begin:"{",end:"}",keywords:n},r=e.inherit(l,{illegal:/\n/}),c={className:"string",begin:/\$"/,end:'"',illegal:/\n/,contains:[{begin:"{{"},{begin:"}}"},e.BACKSLASH_ESCAPE,r]},o={className:"string",begin:/\$@"/,end:'"',contains:[{begin:"{{"},{begin:"}}"},{begin:'""'},l]},g=e.inherit(o,{illegal:/\n/,contains:[{begin:"{{"},{begin:"}}"},{begin:'""'},r]});l.contains=[o,c,s,e.APOS_STRING_MODE,e.QUOTE_STRING_MODE,a,e.C_BLOCK_COMMENT_MODE],r.contains=[g,c,t,e.APOS_STRING_MODE,e.QUOTE_STRING_MODE,a,e.inherit(e.C_BLOCK_COMMENT_MODE,{illegal:/\n/})];var d={variants:[o,c,s,e.APOS_STRING_MODE,e.QUOTE_STRING_MODE]},E={begin:"<",end:">",contains:[{beginKeywords:"in out"},i]},_=e.IDENT_RE+"(<"+e.IDENT_RE+"(\\s*,\\s*"+e.IDENT_RE+")*>)?(\\[\\])?",b={begin:"@"+e.IDENT_RE,relevance:0};return{name:"C#",aliases:["cs","c#"],keywords:n,illegal:/::/,contains:[e.COMMENT("///","$",{returnBegin:!0,contains:[{className:"doctag",variants:[{begin:"///",relevance:0},{begin:"\x3c!--|--\x3e"},{begin:""}]}]}),e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE,{className:"meta",begin:"#",end:"$",keywords:{"meta-keyword":"if else elif endif define undef warning error line region endregion pragma checksum"}},d,a,{beginKeywords:"class interface",end:/[{;=]/,illegal:/[^\s:,]/,contains:[{beginKeywords:"where class"},i,E,e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE]},{beginKeywords:"namespace",end:/[{;=]/,illegal:/[^\s:]/,contains:[i,e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE]},{className:"meta",begin:"^\\s*\\[",excludeBegin:!0,end:"\\]",excludeEnd:!0,contains:[{className:"meta-string",begin:/"/,end:/"/}]},{beginKeywords:"new return throw await else",relevance:0},{className:"function",begin:"("+_+"\\s+)+"+e.IDENT_RE+"\\s*(\\<.+\\>)?\\s*\\(",returnBegin:!0,end:/\s*[{;=]/,excludeEnd:!0,keywords:n,contains:[{begin:e.IDENT_RE+"\\s*(\\<.+\\>)?\\s*\\(",returnBegin:!0,contains:[e.TITLE_MODE,E],relevance:0},{className:"params",begin:/\(/,end:/\)/,excludeBegin:!0,excludeEnd:!0,keywords:n,relevance:0,contains:[d,a,e.C_BLOCK_COMMENT_MODE]},e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE]},b]}}}());hljs.registerLanguage("perl",function(){"use strict";return function(e){var n={$pattern:/[\w.]+/,keyword:"getpwent getservent quotemeta msgrcv scalar kill dbmclose undef lc ma syswrite tr send umask sysopen shmwrite vec qx utime local oct semctl localtime readpipe do return format read sprintf dbmopen pop getpgrp not getpwnam rewinddir qq fileno qw endprotoent wait sethostent bless s|0 opendir continue each sleep endgrent shutdown dump chomp connect getsockname die socketpair close flock exists index shmget sub for endpwent redo lstat msgctl setpgrp abs exit select print ref gethostbyaddr unshift fcntl syscall goto getnetbyaddr join gmtime symlink semget splice x|0 getpeername recv log setsockopt cos last reverse gethostbyname getgrnam study formline endhostent times chop length gethostent getnetent pack getprotoent getservbyname rand mkdir pos chmod y|0 substr endnetent printf next open msgsnd readdir use unlink getsockopt getpriority rindex wantarray hex system getservbyport endservent int chr untie rmdir prototype tell listen fork shmread ucfirst setprotoent else sysseek link getgrgid shmctl waitpid unpack getnetbyname reset chdir grep split require caller lcfirst until warn while values shift telldir getpwuid my getprotobynumber delete and sort uc defined srand accept package seekdir getprotobyname semop our rename seek if q|0 chroot sysread setpwent no crypt getc chown sqrt write setnetent setpriority foreach tie sin msgget map stat getlogin unless elsif truncate exec keys glob tied closedir ioctl socket readlink eval xor readline binmode setservent eof ord bind alarm pipe atan2 getgrent exp time push setgrent gt lt or ne m|0 break given say state when"},t={className:"subst",begin:"[$@]\\{",end:"\\}",keywords:n},s={begin:"->{",end:"}"},r={variants:[{begin:/\$\d/},{begin:/[\$%@](\^\w\b|#\w+(::\w+)*|{\w+}|\w+(::\w*)*)/},{begin:/[\$%@][^\s\w{]/,relevance:0}]},i=[e.BACKSLASH_ESCAPE,t,r],a=[r,e.HASH_COMMENT_MODE,e.COMMENT("^\\=\\w","\\=cut",{endsWithParent:!0}),s,{className:"string",contains:i,variants:[{begin:"q[qwxr]?\\s*\\(",end:"\\)",relevance:5},{begin:"q[qwxr]?\\s*\\[",end:"\\]",relevance:5},{begin:"q[qwxr]?\\s*\\{",end:"\\}",relevance:5},{begin:"q[qwxr]?\\s*\\|",end:"\\|",relevance:5},{begin:"q[qwxr]?\\s*\\<",end:"\\>",relevance:5},{begin:"qw\\s+q",end:"q",relevance:5},{begin:"'",end:"'",contains:[e.BACKSLASH_ESCAPE]},{begin:'"',end:'"'},{begin:"`",end:"`",contains:[e.BACKSLASH_ESCAPE]},{begin:"{\\w+}",contains:[],relevance:0},{begin:"-?\\w+\\s*\\=\\>",contains:[],relevance:0}]},{className:"number",begin:"(\\b0[0-7_]+)|(\\b0x[0-9a-fA-F_]+)|(\\b[1-9][0-9_]*(\\.[0-9_]+)?)|[0_]\\b",relevance:0},{begin:"(\\/\\/|"+e.RE_STARTERS_RE+"|\\b(split|return|print|reverse|grep)\\b)\\s*",keywords:"split return print reverse grep",relevance:0,contains:[e.HASH_COMMENT_MODE,{className:"regexp",begin:"(s|tr|y)/(\\\\.|[^/])*/(\\\\.|[^/])*/[a-z]*",relevance:10},{className:"regexp",begin:"(m|qr)?/",end:"/[a-z]*",contains:[e.BACKSLASH_ESCAPE],relevance:0}]},{className:"function",beginKeywords:"sub",end:"(\\s*\\(.*?\\))?[;{]",excludeEnd:!0,relevance:5,contains:[e.TITLE_MODE]},{begin:"-\\w\\b",relevance:0},{begin:"^__DATA__$",end:"^__END__$",subLanguage:"mojolicious",contains:[{begin:"^@@.*",end:"$",className:"comment"}]}];return t.contains=a,s.contains=a,{name:"Perl",aliases:["pl","pm"],keywords:n,contains:a}}}());hljs.registerLanguage("swift",function(){"use strict";return function(e){var i={keyword:"#available #colorLiteral #column #else #elseif #endif #file #fileLiteral #function #if #imageLiteral #line #selector #sourceLocation _ __COLUMN__ __FILE__ __FUNCTION__ __LINE__ Any as as! as? associatedtype associativity break case catch class continue convenience default defer deinit didSet do dynamic dynamicType else enum extension fallthrough false fileprivate final for func get guard if import in indirect infix init inout internal is lazy left let mutating nil none nonmutating open operator optional override postfix precedence prefix private protocol Protocol public repeat required rethrows return right self Self set static struct subscript super switch throw throws true try try! try? Type typealias unowned var weak where while willSet",literal:"true false nil",built_in:"abs advance alignof alignofValue anyGenerator assert assertionFailure bridgeFromObjectiveC bridgeFromObjectiveCUnconditional bridgeToObjectiveC bridgeToObjectiveCUnconditional c compactMap contains count countElements countLeadingZeros debugPrint debugPrintln distance dropFirst dropLast dump encodeBitsAsWords enumerate equal fatalError filter find getBridgedObjectiveCType getVaList indices insertionSort isBridgedToObjectiveC isBridgedVerbatimToObjectiveC isUniquelyReferenced isUniquelyReferencedNonObjC join lazy lexicographicalCompare map max maxElement min minElement numericCast overlaps partition posix precondition preconditionFailure print println quickSort readLine reduce reflect reinterpretCast reverse roundUpToAlignment sizeof sizeofValue sort split startsWith stride strideof strideofValue swap toString transcode underestimateCount unsafeAddressOf unsafeBitCast unsafeDowncast unsafeUnwrap unsafeReflect withExtendedLifetime withObjectAtPlusZero withUnsafePointer withUnsafePointerToObject withUnsafeMutablePointer withUnsafeMutablePointers withUnsafePointer withUnsafePointers withVaList zip"},n=e.COMMENT("/\\*","\\*/",{contains:["self"]}),t={className:"subst",begin:/\\\(/,end:"\\)",keywords:i,contains:[]},a={className:"string",contains:[e.BACKSLASH_ESCAPE,t],variants:[{begin:/"""/,end:/"""/},{begin:/"/,end:/"/}]},r={className:"number",begin:"\\b([\\d_]+(\\.[\\deE_]+)?|0x[a-fA-F0-9_]+(\\.[a-fA-F0-9p_]+)?|0b[01_]+|0o[0-7_]+)\\b",relevance:0};return t.contains=[r],{name:"Swift",keywords:i,contains:[a,e.C_LINE_COMMENT_MODE,n,{className:"type",begin:"\\b[A-Z][\\wÀ-ʸ']*[!?]"},{className:"type",begin:"\\b[A-Z][\\wÀ-ʸ']*",relevance:0},r,{className:"function",beginKeywords:"func",end:"{",excludeEnd:!0,contains:[e.inherit(e.TITLE_MODE,{begin:/[A-Za-z$_][0-9A-Za-z$_]*/}),{begin://},{className:"params",begin:/\(/,end:/\)/,endsParent:!0,keywords:i,contains:["self",r,a,e.C_BLOCK_COMMENT_MODE,{begin:":"}],illegal:/["']/}],illegal:/\[|%/},{className:"class",beginKeywords:"struct protocol class extension enum",keywords:i,end:"\\{",excludeEnd:!0,contains:[e.inherit(e.TITLE_MODE,{begin:/[A-Za-z$_][\u00C0-\u02B80-9A-Za-z$_]*/})]},{className:"meta",begin:"(@discardableResult|@warn_unused_result|@exported|@lazy|@noescape|@NSCopying|@NSManaged|@objc|@objcMembers|@convention|@required|@noreturn|@IBAction|@IBDesignable|@IBInspectable|@IBOutlet|@infix|@prefix|@postfix|@autoclosure|@testable|@available|@nonobjc|@NSApplicationMain|@UIApplicationMain|@dynamicMemberLookup|@propertyWrapper)\\b"},{beginKeywords:"import",end:/$/,contains:[e.C_LINE_COMMENT_MODE,n]}]}}}());hljs.registerLanguage("makefile",function(){"use strict";return function(e){var i={className:"variable",variants:[{begin:"\\$\\("+e.UNDERSCORE_IDENT_RE+"\\)",contains:[e.BACKSLASH_ESCAPE]},{begin:/\$[@%`]+/}]}]}]};return{name:"HTML, XML",aliases:["html","xhtml","rss","atom","xjb","xsd","xsl","plist","wsf","svg"],case_insensitive:!0,contains:[{className:"meta",begin:"",relevance:10,contains:[a,i,t,s,{begin:"\\[",end:"\\]",contains:[{className:"meta",begin:"",contains:[a,s,i,t]}]}]},e.COMMENT("\x3c!--","--\x3e",{relevance:10}),{begin:"<\\!\\[CDATA\\[",end:"\\]\\]>",relevance:10},n,{className:"meta",begin:/<\?xml/,end:/\?>/,relevance:10},{className:"tag",begin:")",end:">",keywords:{name:"style"},contains:[c],starts:{end:"",returnEnd:!0,subLanguage:["css","xml"]}},{className:"tag",begin:")",end:">",keywords:{name:"script"},contains:[c],starts:{end:"<\/script>",returnEnd:!0,subLanguage:["javascript","handlebars","xml"]}},{className:"tag",begin:"",contains:[{className:"name",begin:/[^\/><\s]+/,relevance:0},c]}]}}}());hljs.registerLanguage("bash",function(){"use strict";return function(e){const s={};Object.assign(s,{className:"variable",variants:[{begin:/\$[\w\d#@][\w\d_]*/},{begin:/\$\{/,end:/\}/,contains:[{begin:/:-/,contains:[s]}]}]});const t={className:"subst",begin:/\$\(/,end:/\)/,contains:[e.BACKSLASH_ESCAPE]},n={className:"string",begin:/"/,end:/"/,contains:[e.BACKSLASH_ESCAPE,s,t]};t.contains.push(n);const a={begin:/\$\(\(/,end:/\)\)/,contains:[{begin:/\d+#[0-9a-f]+/,className:"number"},e.NUMBER_MODE,s]},i=e.SHEBANG({binary:"(fish|bash|zsh|sh|csh|ksh|tcsh|dash|scsh)",relevance:10}),c={className:"function",begin:/\w[\w\d_]*\s*\(\s*\)\s*\{/,returnBegin:!0,contains:[e.inherit(e.TITLE_MODE,{begin:/\w[\w\d_]*/})],relevance:0};return{name:"Bash",aliases:["sh","zsh"],keywords:{$pattern:/\b-?[a-z\._]+\b/,keyword:"if then else elif fi for while in do done case esac function",literal:"true false",built_in:"break cd continue eval exec exit export getopts hash pwd readonly return shift test times trap umask unset alias bind builtin caller command declare echo enable help let local logout mapfile printf read readarray source type typeset ulimit unalias set shopt autoload bg bindkey bye cap chdir clone comparguments compcall compctl compdescribe compfiles compgroups compquote comptags comptry compvalues dirs disable disown echotc echoti emulate fc fg float functions getcap getln history integer jobs kill limit log noglob popd print pushd pushln rehash sched setcap setopt stat suspend ttyctl unfunction unhash unlimit unsetopt vared wait whence where which zcompile zformat zftp zle zmodload zparseopts zprof zpty zregexparse zsocket zstyle ztcp",_:"-ne -eq -lt -gt -f -d -e -s -l -a"},contains:[i,e.SHEBANG(),c,a,e.HASH_COMMENT_MODE,n,{className:"",begin:/\\"/},{className:"string",begin:/'/,end:/'/},s]}}}());hljs.registerLanguage("c-like",function(){"use strict";return function(e){function t(e){return"(?:"+e+")?"}var n="(decltype\\(auto\\)|"+t("[a-zA-Z_]\\w*::")+"[a-zA-Z_]\\w*"+t("<.*?>")+")",r={className:"keyword",begin:"\\b[a-z\\d_]*_t\\b"},a={className:"string",variants:[{begin:'(u8?|U|L)?"',end:'"',illegal:"\\n",contains:[e.BACKSLASH_ESCAPE]},{begin:"(u8?|U|L)?'(\\\\(x[0-9A-Fa-f]{2}|u[0-9A-Fa-f]{4,8}|[0-7]{3}|\\S)|.)",end:"'",illegal:"."},e.END_SAME_AS_BEGIN({begin:/(?:u8?|U|L)?R"([^()\\ ]{0,16})\(/,end:/\)([^()\\ ]{0,16})"/})]},i={className:"number",variants:[{begin:"\\b(0b[01']+)"},{begin:"(-?)\\b([\\d']+(\\.[\\d']*)?|\\.[\\d']+)(u|U|l|L|ul|UL|f|F|b|B)"},{begin:"(-?)(\\b0[xX][a-fA-F0-9']+|(\\b[\\d']+(\\.[\\d']*)?|\\.[\\d']+)([eE][-+]?[\\d']+)?)"}],relevance:0},s={className:"meta",begin:/#\s*[a-z]+\b/,end:/$/,keywords:{"meta-keyword":"if else elif endif define undef warning error line pragma _Pragma ifdef ifndef include"},contains:[{begin:/\\\n/,relevance:0},e.inherit(a,{className:"meta-string"}),{className:"meta-string",begin:/<.*?>/,end:/$/,illegal:"\\n"},e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE]},o={className:"title",begin:t("[a-zA-Z_]\\w*::")+e.IDENT_RE,relevance:0},c=t("[a-zA-Z_]\\w*::")+e.IDENT_RE+"\\s*\\(",l={keyword:"int float while private char char8_t char16_t char32_t catch import module export virtual operator sizeof dynamic_cast|10 typedef const_cast|10 const for static_cast|10 union namespace unsigned long volatile static protected bool template mutable if public friend do goto auto void enum else break extern using asm case typeid wchar_t short reinterpret_cast|10 default double register explicit signed typename try this switch continue inline delete alignas alignof constexpr consteval constinit decltype concept co_await co_return co_yield requires noexcept static_assert thread_local restrict final override atomic_bool atomic_char atomic_schar atomic_uchar atomic_short atomic_ushort atomic_int atomic_uint atomic_long atomic_ulong atomic_llong atomic_ullong new throw return and and_eq bitand bitor compl not not_eq or or_eq xor xor_eq",built_in:"std string wstring cin cout cerr clog stdin stdout stderr stringstream istringstream ostringstream auto_ptr deque list queue stack vector map set pair bitset multiset multimap unordered_set unordered_map unordered_multiset unordered_multimap priority_queue make_pair array shared_ptr abort terminate abs acos asin atan2 atan calloc ceil cosh cos exit exp fabs floor fmod fprintf fputs free frexp fscanf future isalnum isalpha iscntrl isdigit isgraph islower isprint ispunct isspace isupper isxdigit tolower toupper labs ldexp log10 log malloc realloc memchr memcmp memcpy memset modf pow printf putchar puts scanf sinh sin snprintf sprintf sqrt sscanf strcat strchr strcmp strcpy strcspn strlen strncat strncmp strncpy strpbrk strrchr strspn strstr tanh tan vfprintf vprintf vsprintf endl initializer_list unique_ptr _Bool complex _Complex imaginary _Imaginary",literal:"true false nullptr NULL"},d=[r,e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE,i,a],_={variants:[{begin:/=/,end:/;/},{begin:/\(/,end:/\)/},{beginKeywords:"new throw return else",end:/;/}],keywords:l,contains:d.concat([{begin:/\(/,end:/\)/,keywords:l,contains:d.concat(["self"]),relevance:0}]),relevance:0},u={className:"function",begin:"("+n+"[\\*&\\s]+)+"+c,returnBegin:!0,end:/[{;=]/,excludeEnd:!0,keywords:l,illegal:/[^\w\s\*&:<>]/,contains:[{begin:"decltype\\(auto\\)",keywords:l,relevance:0},{begin:c,returnBegin:!0,contains:[o],relevance:0},{className:"params",begin:/\(/,end:/\)/,keywords:l,relevance:0,contains:[e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE,a,i,r,{begin:/\(/,end:/\)/,keywords:l,relevance:0,contains:["self",e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE,a,i,r]}]},r,e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE,s]};return{aliases:["c","cc","h","c++","h++","hpp","hh","hxx","cxx"],keywords:l,disableAutodetect:!0,illegal:"",keywords:l,contains:["self",r]},{begin:e.IDENT_RE+"::",keywords:l},{className:"class",beginKeywords:"class struct",end:/[{;:]/,contains:[{begin://,contains:["self"]},e.TITLE_MODE]}]),exports:{preprocessor:s,strings:a,keywords:l}}}}());hljs.registerLanguage("coffeescript",function(){"use strict";const e=["as","in","of","if","for","while","finally","var","new","function","do","return","void","else","break","catch","instanceof","with","throw","case","default","try","switch","continue","typeof","delete","let","yield","const","class","debugger","async","await","static","import","from","export","extends"],n=["true","false","null","undefined","NaN","Infinity"],a=[].concat(["setInterval","setTimeout","clearInterval","clearTimeout","require","exports","eval","isFinite","isNaN","parseFloat","parseInt","decodeURI","decodeURIComponent","encodeURI","encodeURIComponent","escape","unescape"],["arguments","this","super","console","window","document","localStorage","module","global"],["Intl","DataView","Number","Math","Date","String","RegExp","Object","Function","Boolean","Error","Symbol","Set","Map","WeakSet","WeakMap","Proxy","Reflect","JSON","Promise","Float64Array","Int16Array","Int32Array","Int8Array","Uint16Array","Uint32Array","Float32Array","Array","Uint8Array","Uint8ClampedArray","ArrayBuffer"],["EvalError","InternalError","RangeError","ReferenceError","SyntaxError","TypeError","URIError"]);return function(r){var t={keyword:e.concat(["then","unless","until","loop","by","when","and","or","is","isnt","not"]).filter((e=>n=>!e.includes(n))(["var","const","let","function","static"])).join(" "),literal:n.concat(["yes","no","on","off"]).join(" "),built_in:a.concat(["npm","print"]).join(" ")},i="[A-Za-z$_][0-9A-Za-z$_]*",s={className:"subst",begin:/#\{/,end:/}/,keywords:t},o=[r.BINARY_NUMBER_MODE,r.inherit(r.C_NUMBER_MODE,{starts:{end:"(\\s*/)?",relevance:0}}),{className:"string",variants:[{begin:/'''/,end:/'''/,contains:[r.BACKSLASH_ESCAPE]},{begin:/'/,end:/'/,contains:[r.BACKSLASH_ESCAPE]},{begin:/"""/,end:/"""/,contains:[r.BACKSLASH_ESCAPE,s]},{begin:/"/,end:/"/,contains:[r.BACKSLASH_ESCAPE,s]}]},{className:"regexp",variants:[{begin:"///",end:"///",contains:[s,r.HASH_COMMENT_MODE]},{begin:"//[gim]{0,3}(?=\\W)",relevance:0},{begin:/\/(?![ *]).*?(?![\\]).\/[gim]{0,3}(?=\W)/}]},{begin:"@"+i},{subLanguage:"javascript",excludeBegin:!0,excludeEnd:!0,variants:[{begin:"```",end:"```"},{begin:"`",end:"`"}]}];s.contains=o;var c=r.inherit(r.TITLE_MODE,{begin:i}),l={className:"params",begin:"\\([^\\(]",returnBegin:!0,contains:[{begin:/\(/,end:/\)/,keywords:t,contains:["self"].concat(o)}]};return{name:"CoffeeScript",aliases:["coffee","cson","iced"],keywords:t,illegal:/\/\*/,contains:o.concat([r.COMMENT("###","###"),r.HASH_COMMENT_MODE,{className:"function",begin:"^\\s*"+i+"\\s*=\\s*(\\(.*\\))?\\s*\\B[-=]>",end:"[-=]>",returnBegin:!0,contains:[c,l]},{begin:/[:\(,=]\s*/,relevance:0,contains:[{className:"function",begin:"(\\(.*\\))?\\s*\\B[-=]>",end:"[-=]>",returnBegin:!0,contains:[l]}]},{className:"class",beginKeywords:"class",end:"$",illegal:/[:="\[\]]/,contains:[{beginKeywords:"extends",endsWithParent:!0,illegal:/[:="\[\]]/,contains:[c]},c]},{begin:i+":",end:":",returnBegin:!0,returnEnd:!0,relevance:0}])}}}());hljs.registerLanguage("ruby",function(){"use strict";return function(e){var n="[a-zA-Z_]\\w*[!?=]?|[-+~]\\@|<<|>>|=~|===?|<=>|[<>]=?|\\*\\*|[-/+%^&*~`|]|\\[\\]=?",a={keyword:"and then defined module in return redo if BEGIN retry end for self when next until do begin unless END rescue else break undef not super class case require yield alias while ensure elsif or include attr_reader attr_writer attr_accessor",literal:"true false nil"},s={className:"doctag",begin:"@[A-Za-z]+"},i={begin:"#<",end:">"},r=[e.COMMENT("#","$",{contains:[s]}),e.COMMENT("^\\=begin","^\\=end",{contains:[s],relevance:10}),e.COMMENT("^__END__","\\n$")],c={className:"subst",begin:"#\\{",end:"}",keywords:a},t={className:"string",contains:[e.BACKSLASH_ESCAPE,c],variants:[{begin:/'/,end:/'/},{begin:/"/,end:/"/},{begin:/`/,end:/`/},{begin:"%[qQwWx]?\\(",end:"\\)"},{begin:"%[qQwWx]?\\[",end:"\\]"},{begin:"%[qQwWx]?{",end:"}"},{begin:"%[qQwWx]?<",end:">"},{begin:"%[qQwWx]?/",end:"/"},{begin:"%[qQwWx]?%",end:"%"},{begin:"%[qQwWx]?-",end:"-"},{begin:"%[qQwWx]?\\|",end:"\\|"},{begin:/\B\?(\\\d{1,3}|\\x[A-Fa-f0-9]{1,2}|\\u[A-Fa-f0-9]{4}|\\?\S)\b/},{begin:/<<[-~]?'?(\w+)(?:.|\n)*?\n\s*\1\b/,returnBegin:!0,contains:[{begin:/<<[-~]?'?/},e.END_SAME_AS_BEGIN({begin:/(\w+)/,end:/(\w+)/,contains:[e.BACKSLASH_ESCAPE,c]})]}]},b={className:"params",begin:"\\(",end:"\\)",endsParent:!0,keywords:a},d=[t,i,{className:"class",beginKeywords:"class module",end:"$|;",illegal:/=/,contains:[e.inherit(e.TITLE_MODE,{begin:"[A-Za-z_]\\w*(::\\w+)*(\\?|\\!)?"}),{begin:"<\\s*",contains:[{begin:"("+e.IDENT_RE+"::)?"+e.IDENT_RE}]}].concat(r)},{className:"function",beginKeywords:"def",end:"$|;",contains:[e.inherit(e.TITLE_MODE,{begin:n}),b].concat(r)},{begin:e.IDENT_RE+"::"},{className:"symbol",begin:e.UNDERSCORE_IDENT_RE+"(\\!|\\?)?:",relevance:0},{className:"symbol",begin:":(?!\\s)",contains:[t,{begin:n}],relevance:0},{className:"number",begin:"(\\b0[0-7_]+)|(\\b0x[0-9a-fA-F_]+)|(\\b[1-9][0-9_]*(\\.[0-9_]+)?)|[0_]\\b",relevance:0},{begin:"(\\$\\W)|((\\$|\\@\\@?)(\\w+))"},{className:"params",begin:/\|/,end:/\|/,keywords:a},{begin:"("+e.RE_STARTERS_RE+"|unless)\\s*",keywords:"unless",contains:[i,{className:"regexp",contains:[e.BACKSLASH_ESCAPE,c],illegal:/\n/,variants:[{begin:"/",end:"/[a-z]*"},{begin:"%r{",end:"}[a-z]*"},{begin:"%r\\(",end:"\\)[a-z]*"},{begin:"%r!",end:"![a-z]*"},{begin:"%r\\[",end:"\\][a-z]*"}]}].concat(r),relevance:0}].concat(r);c.contains=d,b.contains=d;var g=[{begin:/^\s*=>/,starts:{end:"$",contains:d}},{className:"meta",begin:"^([>?]>|[\\w#]+\\(\\w+\\):\\d+:\\d+>|(\\w+-)?\\d+\\.\\d+\\.\\d(p\\d+)?[^>]+>)",starts:{end:"$",contains:d}}];return{name:"Ruby",aliases:["rb","gemspec","podspec","thor","irb"],keywords:a,illegal:/\/\*/,contains:r.concat(g).concat(d)}}}());hljs.registerLanguage("yaml",function(){"use strict";return function(e){var n="true false yes no null",a="[\\w#;/?:@&=+$,.~*\\'()[\\]]+",s={className:"string",relevance:0,variants:[{begin:/'/,end:/'/},{begin:/"/,end:/"/},{begin:/\S+/}],contains:[e.BACKSLASH_ESCAPE,{className:"template-variable",variants:[{begin:"{{",end:"}}"},{begin:"%{",end:"}"}]}]},i=e.inherit(s,{variants:[{begin:/'/,end:/'/},{begin:/"/,end:/"/},{begin:/[^\s,{}[\]]+/}]}),l={end:",",endsWithParent:!0,excludeEnd:!0,contains:[],keywords:n,relevance:0},t={begin:"{",end:"}",contains:[l],illegal:"\\n",relevance:0},g={begin:"\\[",end:"\\]",contains:[l],illegal:"\\n",relevance:0},b=[{className:"attr",variants:[{begin:"\\w[\\w :\\/.-]*:(?=[ \t]|$)"},{begin:'"\\w[\\w :\\/.-]*":(?=[ \t]|$)'},{begin:"'\\w[\\w :\\/.-]*':(?=[ \t]|$)"}]},{className:"meta",begin:"^---s*$",relevance:10},{className:"string",begin:"[\\|>]([0-9]?[+-])?[ ]*\\n( *)[\\S ]+\\n(\\2[\\S ]+\\n?)*"},{begin:"<%[%=-]?",end:"[%-]?%>",subLanguage:"ruby",excludeBegin:!0,excludeEnd:!0,relevance:0},{className:"type",begin:"!\\w+!"+a},{className:"type",begin:"!<"+a+">"},{className:"type",begin:"!"+a},{className:"type",begin:"!!"+a},{className:"meta",begin:"&"+e.UNDERSCORE_IDENT_RE+"$"},{className:"meta",begin:"\\*"+e.UNDERSCORE_IDENT_RE+"$"},{className:"bullet",begin:"\\-(?=[ ]|$)",relevance:0},e.HASH_COMMENT_MODE,{beginKeywords:n,keywords:{literal:n}},{className:"number",begin:"\\b[0-9]{4}(-[0-9][0-9]){0,2}([Tt \\t][0-9][0-9]?(:[0-9][0-9]){2})?(\\.[0-9]*)?([ \\t])*(Z|[-+][0-9][0-9]?(:[0-9][0-9])?)?\\b"},{className:"number",begin:e.C_NUMBER_RE+"\\b"},t,g,s],c=[...b];return c.pop(),c.push(i),l.contains=c,{name:"YAML",case_insensitive:!0,aliases:["yml","YAML"],contains:b}}}());hljs.registerLanguage("d",function(){"use strict";return function(e){var a={$pattern:e.UNDERSCORE_IDENT_RE,keyword:"abstract alias align asm assert auto body break byte case cast catch class const continue debug default delete deprecated do else enum export extern final finally for foreach foreach_reverse|10 goto if immutable import in inout int interface invariant is lazy macro mixin module new nothrow out override package pragma private protected public pure ref return scope shared static struct super switch synchronized template this throw try typedef typeid typeof union unittest version void volatile while with __FILE__ __LINE__ __gshared|10 __thread __traits __DATE__ __EOF__ __TIME__ __TIMESTAMP__ __VENDOR__ __VERSION__",built_in:"bool cdouble cent cfloat char creal dchar delegate double dstring float function idouble ifloat ireal long real short string ubyte ucent uint ulong ushort wchar wstring",literal:"false null true"},d="((0|[1-9][\\d_]*)|0[bB][01_]+|0[xX]([\\da-fA-F][\\da-fA-F_]*|_[\\da-fA-F][\\da-fA-F_]*))",n="\\\\(['\"\\?\\\\abfnrtv]|u[\\dA-Fa-f]{4}|[0-7]{1,3}|x[\\dA-Fa-f]{2}|U[\\dA-Fa-f]{8})|&[a-zA-Z\\d]{2,};",t={className:"number",begin:"\\b"+d+"(L|u|U|Lu|LU|uL|UL)?",relevance:0},_={className:"number",begin:"\\b(((0[xX](([\\da-fA-F][\\da-fA-F_]*|_[\\da-fA-F][\\da-fA-F_]*)\\.([\\da-fA-F][\\da-fA-F_]*|_[\\da-fA-F][\\da-fA-F_]*)|\\.?([\\da-fA-F][\\da-fA-F_]*|_[\\da-fA-F][\\da-fA-F_]*))[pP][+-]?(0|[1-9][\\d_]*|\\d[\\d_]*|[\\d_]+?\\d))|((0|[1-9][\\d_]*|\\d[\\d_]*|[\\d_]+?\\d)(\\.\\d*|([eE][+-]?(0|[1-9][\\d_]*|\\d[\\d_]*|[\\d_]+?\\d)))|\\d+\\.(0|[1-9][\\d_]*|\\d[\\d_]*|[\\d_]+?\\d)(0|[1-9][\\d_]*|\\d[\\d_]*|[\\d_]+?\\d)|\\.(0|[1-9][\\d_]*)([eE][+-]?(0|[1-9][\\d_]*|\\d[\\d_]*|[\\d_]+?\\d))?))([fF]|L|i|[fF]i|Li)?|"+d+"(i|[fF]i|Li))",relevance:0},r={className:"string",begin:"'("+n+"|.)",end:"'",illegal:"."},i={className:"string",begin:'"',contains:[{begin:n,relevance:0}],end:'"[cwd]?'},s=e.COMMENT("\\/\\+","\\+\\/",{contains:["self"],relevance:10});return{name:"D",keywords:a,contains:[e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE,s,{className:"string",begin:'x"[\\da-fA-F\\s\\n\\r]*"[cwd]?',relevance:10},i,{className:"string",begin:'[rq]"',end:'"[cwd]?',relevance:5},{className:"string",begin:"`",end:"`[cwd]?"},{className:"string",begin:'q"\\{',end:'\\}"'},_,t,r,{className:"meta",begin:"^#!",end:"$",relevance:5},{className:"meta",begin:"#(line)",end:"$",relevance:5},{className:"keyword",begin:"@[a-zA-Z_][a-zA-Z_\\d]*"}]}}}());hljs.registerLanguage("properties",function(){"use strict";return function(e){var n="[ \\t\\f]*",t="("+n+"[:=]"+n+"|[ \\t\\f]+)",a="([^\\\\:= \\t\\f\\n]|\\\\.)+",s={end:t,relevance:0,starts:{className:"string",end:/$/,relevance:0,contains:[{begin:"\\\\\\n"}]}};return{name:".properties",case_insensitive:!0,illegal:/\S/,contains:[e.COMMENT("^\\s*[!#]","$"),{begin:"([^\\\\\\W:= \\t\\f\\n]|\\\\.)+"+t,returnBegin:!0,contains:[{className:"attr",begin:"([^\\\\\\W:= \\t\\f\\n]|\\\\.)+",endsParent:!0,relevance:0}],starts:s},{begin:a+t,returnBegin:!0,relevance:0,contains:[{className:"meta",begin:a,endsParent:!0,relevance:0}],starts:s},{className:"attr",relevance:0,begin:a+n+"$"}]}}}());hljs.registerLanguage("http",function(){"use strict";return function(e){var n="HTTP/[0-9\\.]+";return{name:"HTTP",aliases:["https"],illegal:"\\S",contains:[{begin:"^"+n,end:"$",contains:[{className:"number",begin:"\\b\\d{3}\\b"}]},{begin:"^[A-Z]+ (.*?) "+n+"$",returnBegin:!0,end:"$",contains:[{className:"string",begin:" ",end:" ",excludeBegin:!0,excludeEnd:!0},{begin:n},{className:"keyword",begin:"[A-Z]+"}]},{className:"attribute",begin:"^\\w",end:": ",excludeEnd:!0,illegal:"\\n|\\s|=",starts:{end:"$",relevance:0}},{begin:"\\n\\n",starts:{subLanguage:[],endsWithParent:!0}}]}}}());hljs.registerLanguage("haskell",function(){"use strict";return function(e){var n={variants:[e.COMMENT("--","$"),e.COMMENT("{-","-}",{contains:["self"]})]},i={className:"meta",begin:"{-#",end:"#-}"},a={className:"meta",begin:"^#",end:"$"},s={className:"type",begin:"\\b[A-Z][\\w']*",relevance:0},l={begin:"\\(",end:"\\)",illegal:'"',contains:[i,a,{className:"type",begin:"\\b[A-Z][\\w]*(\\((\\.\\.|,|\\w+)\\))?"},e.inherit(e.TITLE_MODE,{begin:"[_a-z][\\w']*"}),n]};return{name:"Haskell",aliases:["hs"],keywords:"let in if then else case of where do module import hiding qualified type data newtype deriving class instance as default infix infixl infixr foreign export ccall stdcall cplusplus jvm dotnet safe unsafe family forall mdo proc rec",contains:[{beginKeywords:"module",end:"where",keywords:"module where",contains:[l,n],illegal:"\\W\\.|;"},{begin:"\\bimport\\b",end:"$",keywords:"import qualified as hiding",contains:[l,n],illegal:"\\W\\.|;"},{className:"class",begin:"^(\\s*)?(class|instance)\\b",end:"where",keywords:"class family instance where",contains:[s,l,n]},{className:"class",begin:"\\b(data|(new)?type)\\b",end:"$",keywords:"data family type newtype deriving",contains:[i,s,l,{begin:"{",end:"}",contains:l.contains},n]},{beginKeywords:"default",end:"$",contains:[s,l,n]},{beginKeywords:"infix infixl infixr",end:"$",contains:[e.C_NUMBER_MODE,n]},{begin:"\\bforeign\\b",end:"$",keywords:"foreign import export ccall stdcall cplusplus jvm dotnet safe unsafe",contains:[s,e.QUOTE_STRING_MODE,n]},{className:"meta",begin:"#!\\/usr\\/bin\\/env runhaskell",end:"$"},i,a,e.QUOTE_STRING_MODE,e.C_NUMBER_MODE,s,e.inherit(e.TITLE_MODE,{begin:"^[_a-z][\\w']*"}),n,{begin:"->|<-"}]}}}());hljs.registerLanguage("handlebars",function(){"use strict";function e(...e){return e.map(e=>(function(e){return e?"string"==typeof e?e:e.source:null})(e)).join("")}return function(n){const a={"builtin-name":"action bindattr collection component concat debugger each each-in get hash if in input link-to loc log lookup mut outlet partial query-params render template textarea unbound unless view with yield"},t=/\[.*?\]/,s=/[^\s!"#%&'()*+,.\/;<=>@\[\\\]^`{|}~]+/,i=e("(",/'.*?'/,"|",/".*?"/,"|",t,"|",s,"|",/\.|\//,")+"),r=e("(",t,"|",s,")(?==)"),l={begin:i,lexemes:/[\w.\/]+/},c=n.inherit(l,{keywords:{literal:"true false undefined null"}}),o={begin:/\(/,end:/\)/},m={className:"attr",begin:r,relevance:0,starts:{begin:/=/,end:/=/,starts:{contains:[n.NUMBER_MODE,n.QUOTE_STRING_MODE,n.APOS_STRING_MODE,c,o]}}},d={contains:[n.NUMBER_MODE,n.QUOTE_STRING_MODE,n.APOS_STRING_MODE,{begin:/as\s+\|/,keywords:{keyword:"as"},end:/\|/,contains:[{begin:/\w+/}]},m,c,o],returnEnd:!0},g=n.inherit(l,{className:"name",keywords:a,starts:n.inherit(d,{end:/\)/})});o.contains=[g];const u=n.inherit(l,{keywords:a,className:"name",starts:n.inherit(d,{end:/}}/})}),b=n.inherit(l,{keywords:a,className:"name"}),h=n.inherit(l,{className:"name",keywords:a,starts:n.inherit(d,{end:/}}/})});return{name:"Handlebars",aliases:["hbs","html.hbs","html.handlebars","htmlbars"],case_insensitive:!0,subLanguage:"xml",contains:[{begin:/\\\{\{/,skip:!0},{begin:/\\\\(?=\{\{)/,skip:!0},n.COMMENT(/\{\{!--/,/--\}\}/),n.COMMENT(/\{\{!/,/\}\}/),{className:"template-tag",begin:/\{\{\{\{(?!\/)/,end:/\}\}\}\}/,contains:[u],starts:{end:/\{\{\{\{\//,returnEnd:!0,subLanguage:"xml"}},{className:"template-tag",begin:/\{\{\{\{\//,end:/\}\}\}\}/,contains:[b]},{className:"template-tag",begin:/\{\{#/,end:/\}\}/,contains:[u]},{className:"template-tag",begin:/\{\{(?=else\}\})/,end:/\}\}/,keywords:"else"},{className:"template-tag",begin:/\{\{\//,end:/\}\}/,contains:[b]},{className:"template-variable",begin:/\{\{\{/,end:/\}\}\}/,contains:[h]},{className:"template-variable",begin:/\{\{/,end:/\}\}/,contains:[h]}]}}}());hljs.registerLanguage("rust",function(){"use strict";return function(e){var n="([ui](8|16|32|64|128|size)|f(32|64))?",t="drop i8 i16 i32 i64 i128 isize u8 u16 u32 u64 u128 usize f32 f64 str char bool Box Option Result String Vec Copy Send Sized Sync Drop Fn FnMut FnOnce ToOwned Clone Debug PartialEq PartialOrd Eq Ord AsRef AsMut Into From Default Iterator Extend IntoIterator DoubleEndedIterator ExactSizeIterator SliceConcatExt ToString assert! assert_eq! bitflags! bytes! cfg! col! concat! concat_idents! debug_assert! debug_assert_eq! env! panic! file! format! format_args! include_bin! include_str! line! local_data_key! module_path! option_env! print! println! select! stringify! try! unimplemented! unreachable! vec! write! writeln! macro_rules! assert_ne! debug_assert_ne!";return{name:"Rust",aliases:["rs"],keywords:{$pattern:e.IDENT_RE+"!?",keyword:"abstract as async await become box break const continue crate do dyn else enum extern false final fn for if impl in let loop macro match mod move mut override priv pub ref return self Self static struct super trait true try type typeof unsafe unsized use virtual where while yield",literal:"true false Some None Ok Err",built_in:t},illegal:""}]}}}());hljs.registerLanguage("cpp",function(){"use strict";return function(e){var t=e.getLanguage("c-like").rawDefinition();return t.disableAutodetect=!1,t.name="C++",t.aliases=["cc","c++","h++","hpp","hh","hxx","cxx"],t}}());hljs.registerLanguage("ini",function(){"use strict";function e(e){return e?"string"==typeof e?e:e.source:null}function n(...n){return n.map(n=>e(n)).join("")}return function(a){var s={className:"number",relevance:0,variants:[{begin:/([\+\-]+)?[\d]+_[\d_]+/},{begin:a.NUMBER_RE}]},i=a.COMMENT();i.variants=[{begin:/;/,end:/$/},{begin:/#/,end:/$/}];var t={className:"variable",variants:[{begin:/\$[\w\d"][\w\d_]*/},{begin:/\$\{(.*?)}/}]},r={className:"literal",begin:/\bon|off|true|false|yes|no\b/},l={className:"string",contains:[a.BACKSLASH_ESCAPE],variants:[{begin:"'''",end:"'''",relevance:10},{begin:'"""',end:'"""',relevance:10},{begin:'"',end:'"'},{begin:"'",end:"'"}]},c={begin:/\[/,end:/\]/,contains:[i,r,t,l,s,"self"],relevance:0},g="("+[/[A-Za-z0-9_-]+/,/"(\\"|[^"])*"/,/'[^']*'/].map(n=>e(n)).join("|")+")";return{name:"TOML, also INI",aliases:["toml"],case_insensitive:!0,illegal:/\S/,contains:[i,{className:"section",begin:/\[+/,end:/\]+/},{begin:n(g,"(\\s*\\.\\s*",g,")*",n("(?=",/\s*=\s*[^#\s]/,")")),className:"attr",starts:{end:/$/,contains:[i,c,r,t,l,s]}}]}}}());hljs.registerLanguage("objectivec",function(){"use strict";return function(e){var n=/[a-zA-Z@][a-zA-Z0-9_]*/,_={$pattern:n,keyword:"@interface @class @protocol @implementation"};return{name:"Objective-C",aliases:["mm","objc","obj-c"],keywords:{$pattern:n,keyword:"int float while char export sizeof typedef const struct for union unsigned long volatile static bool mutable if do return goto void enum else break extern asm case short default double register explicit signed typename this switch continue wchar_t inline readonly assign readwrite self @synchronized id typeof nonatomic super unichar IBOutlet IBAction strong weak copy in out inout bycopy byref oneway __strong __weak __block __autoreleasing @private @protected @public @try @property @end @throw @catch @finally @autoreleasepool @synthesize @dynamic @selector @optional @required @encode @package @import @defs @compatibility_alias __bridge __bridge_transfer __bridge_retained __bridge_retain __covariant __contravariant __kindof _Nonnull _Nullable _Null_unspecified __FUNCTION__ __PRETTY_FUNCTION__ __attribute__ getter setter retain unsafe_unretained nonnull nullable null_unspecified null_resettable class instancetype NS_DESIGNATED_INITIALIZER NS_UNAVAILABLE NS_REQUIRES_SUPER NS_RETURNS_INNER_POINTER NS_INLINE NS_AVAILABLE NS_DEPRECATED NS_ENUM NS_OPTIONS NS_SWIFT_UNAVAILABLE NS_ASSUME_NONNULL_BEGIN NS_ASSUME_NONNULL_END NS_REFINED_FOR_SWIFT NS_SWIFT_NAME NS_SWIFT_NOTHROW NS_DURING NS_HANDLER NS_ENDHANDLER NS_VALUERETURN NS_VOIDRETURN",literal:"false true FALSE TRUE nil YES NO NULL",built_in:"BOOL dispatch_once_t dispatch_queue_t dispatch_sync dispatch_async dispatch_once"},illegal:"/,end:/$/,illegal:"\\n"},e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE]},{className:"class",begin:"("+_.keyword.split(" ").join("|")+")\\b",end:"({|$)",excludeEnd:!0,keywords:_,contains:[e.UNDERSCORE_TITLE_MODE]},{begin:"\\."+e.UNDERSCORE_IDENT_RE,relevance:0}]}}}());hljs.registerLanguage("apache",function(){"use strict";return function(e){var n={className:"number",begin:"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}(:\\d{1,5})?"};return{name:"Apache config",aliases:["apacheconf"],case_insensitive:!0,contains:[e.HASH_COMMENT_MODE,{className:"section",begin:"",contains:[n,{className:"number",begin:":\\d{1,5}"},e.inherit(e.QUOTE_STRING_MODE,{relevance:0})]},{className:"attribute",begin:/\w+/,relevance:0,keywords:{nomarkup:"order deny allow setenv rewriterule rewriteengine rewritecond documentroot sethandler errordocument loadmodule options header listen serverroot servername"},starts:{end:/$/,relevance:0,keywords:{literal:"on off all deny allow"},contains:[{className:"meta",begin:"\\s\\[",end:"\\]$"},{className:"variable",begin:"[\\$%]\\{",end:"\\}",contains:["self",{className:"number",begin:"[\\$%]\\d+"}]},n,{className:"number",begin:"\\d+"},e.QUOTE_STRING_MODE]}}],illegal:/\S/}}}());hljs.registerLanguage("java",function(){"use strict";function e(e){return e?"string"==typeof e?e:e.source:null}function n(e){return a("(",e,")?")}function a(...n){return n.map(n=>e(n)).join("")}function s(...n){return"("+n.map(n=>e(n)).join("|")+")"}return function(e){var t="false synchronized int abstract float private char boolean var static null if const for true while long strictfp finally protected import native final void enum else break transient catch instanceof byte super volatile case assert short package default double public try this switch continue throws protected public private module requires exports do",i={className:"meta",begin:"@[À-ʸa-zA-Z_$][À-ʸa-zA-Z_$0-9]*",contains:[{begin:/\(/,end:/\)/,contains:["self"]}]},r=e=>a("[",e,"]+([",e,"_]*[",e,"]+)?"),c={className:"number",variants:[{begin:`\\b(0[bB]${r("01")})[lL]?`},{begin:`\\b(0${r("0-7")})[dDfFlL]?`},{begin:a(/\b0[xX]/,s(a(r("a-fA-F0-9"),/\./,r("a-fA-F0-9")),a(r("a-fA-F0-9"),/\.?/),a(/\./,r("a-fA-F0-9"))),/([pP][+-]?(\d+))?/,/[fFdDlL]?/)},{begin:a(/\b/,s(a(/\d*\./,r("\\d")),r("\\d")),/[eE][+-]?[\d]+[dDfF]?/)},{begin:a(/\b/,r(/\d/),n(/\.?/),n(r(/\d/)),/[dDfFlL]?/)}],relevance:0};return{name:"Java",aliases:["jsp"],keywords:t,illegal:/<\/|#/,contains:[e.COMMENT("/\\*\\*","\\*/",{relevance:0,contains:[{begin:/\w+@/,relevance:0},{className:"doctag",begin:"@[A-Za-z]+"}]}),e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE,e.APOS_STRING_MODE,e.QUOTE_STRING_MODE,{className:"class",beginKeywords:"class interface",end:/[{;=]/,excludeEnd:!0,keywords:"class interface",illegal:/[:"\[\]]/,contains:[{beginKeywords:"extends implements"},e.UNDERSCORE_TITLE_MODE]},{beginKeywords:"new throw return else",relevance:0},{className:"function",begin:"([À-ʸa-zA-Z_$][À-ʸa-zA-Z_$0-9]*(<[À-ʸa-zA-Z_$][À-ʸa-zA-Z_$0-9]*(\\s*,\\s*[À-ʸa-zA-Z_$][À-ʸa-zA-Z_$0-9]*)*>)?\\s+)+"+e.UNDERSCORE_IDENT_RE+"\\s*\\(",returnBegin:!0,end:/[{;=]/,excludeEnd:!0,keywords:t,contains:[{begin:e.UNDERSCORE_IDENT_RE+"\\s*\\(",returnBegin:!0,relevance:0,contains:[e.UNDERSCORE_TITLE_MODE]},{className:"params",begin:/\(/,end:/\)/,keywords:t,relevance:0,contains:[i,e.APOS_STRING_MODE,e.QUOTE_STRING_MODE,e.C_NUMBER_MODE,e.C_BLOCK_COMMENT_MODE]},e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE]},c,i]}}}());hljs.registerLanguage("x86asm",function(){"use strict";return function(s){return{name:"Intel x86 Assembly",case_insensitive:!0,keywords:{$pattern:"[.%]?"+s.IDENT_RE,keyword:"lock rep repe repz repne repnz xaquire xrelease bnd nobnd aaa aad aam aas adc add and arpl bb0_reset bb1_reset bound bsf bsr bswap bt btc btr bts call cbw cdq cdqe clc cld cli clts cmc cmp cmpsb cmpsd cmpsq cmpsw cmpxchg cmpxchg486 cmpxchg8b cmpxchg16b cpuid cpu_read cpu_write cqo cwd cwde daa das dec div dmint emms enter equ f2xm1 fabs fadd faddp fbld fbstp fchs fclex fcmovb fcmovbe fcmove fcmovnb fcmovnbe fcmovne fcmovnu fcmovu fcom fcomi fcomip fcomp fcompp fcos fdecstp fdisi fdiv fdivp fdivr fdivrp femms feni ffree ffreep fiadd ficom ficomp fidiv fidivr fild fimul fincstp finit fist fistp fisttp fisub fisubr fld fld1 fldcw fldenv fldl2e fldl2t fldlg2 fldln2 fldpi fldz fmul fmulp fnclex fndisi fneni fninit fnop fnsave fnstcw fnstenv fnstsw fpatan fprem fprem1 fptan frndint frstor fsave fscale fsetpm fsin fsincos fsqrt fst fstcw fstenv fstp fstsw fsub fsubp fsubr fsubrp ftst fucom fucomi fucomip fucomp fucompp fxam fxch fxtract fyl2x fyl2xp1 hlt ibts icebp idiv imul in inc incbin insb insd insw int int01 int1 int03 int3 into invd invpcid invlpg invlpga iret iretd iretq iretw jcxz jecxz jrcxz jmp jmpe lahf lar lds lea leave les lfence lfs lgdt lgs lidt lldt lmsw loadall loadall286 lodsb lodsd lodsq lodsw loop loope loopne loopnz loopz lsl lss ltr mfence monitor mov movd movq movsb movsd movsq movsw movsx movsxd movzx mul mwait neg nop not or out outsb outsd outsw packssdw packsswb packuswb paddb paddd paddsb paddsiw paddsw paddusb paddusw paddw pand pandn pause paveb pavgusb pcmpeqb pcmpeqd pcmpeqw pcmpgtb pcmpgtd pcmpgtw pdistib pf2id pfacc pfadd pfcmpeq pfcmpge pfcmpgt pfmax pfmin pfmul pfrcp pfrcpit1 pfrcpit2 pfrsqit1 pfrsqrt pfsub pfsubr pi2fd pmachriw pmaddwd pmagw pmulhriw pmulhrwa pmulhrwc pmulhw pmullw pmvgezb pmvlzb pmvnzb pmvzb pop popa popad popaw popf popfd popfq popfw por prefetch prefetchw pslld psllq psllw psrad psraw psrld psrlq psrlw psubb psubd psubsb psubsiw psubsw psubusb psubusw psubw punpckhbw punpckhdq punpckhwd punpcklbw punpckldq punpcklwd push pusha pushad pushaw pushf pushfd pushfq pushfw pxor rcl rcr rdshr rdmsr rdpmc rdtsc rdtscp ret retf retn rol ror rdm rsdc rsldt rsm rsts sahf sal salc sar sbb scasb scasd scasq scasw sfence sgdt shl shld shr shrd sidt sldt skinit smi smint smintold smsw stc std sti stosb stosd stosq stosw str sub svdc svldt svts swapgs syscall sysenter sysexit sysret test ud0 ud1 ud2b ud2 ud2a umov verr verw fwait wbinvd wrshr wrmsr xadd xbts xchg xlatb xlat xor cmove cmovz cmovne cmovnz cmova cmovnbe cmovae cmovnb cmovb cmovnae cmovbe cmovna cmovg cmovnle cmovge cmovnl cmovl cmovnge cmovle cmovng cmovc cmovnc cmovo cmovno cmovs cmovns cmovp cmovpe cmovnp cmovpo je jz jne jnz ja jnbe jae jnb jb jnae jbe jna jg jnle jge jnl jl jnge jle jng jc jnc jo jno js jns jpo jnp jpe jp sete setz setne setnz seta setnbe setae setnb setnc setb setnae setcset setbe setna setg setnle setge setnl setl setnge setle setng sets setns seto setno setpe setp setpo setnp addps addss andnps andps cmpeqps cmpeqss cmpleps cmpless cmpltps cmpltss cmpneqps cmpneqss cmpnleps cmpnless cmpnltps cmpnltss cmpordps cmpordss cmpunordps cmpunordss cmpps cmpss comiss cvtpi2ps cvtps2pi cvtsi2ss cvtss2si cvttps2pi cvttss2si divps divss ldmxcsr maxps maxss minps minss movaps movhps movlhps movlps movhlps movmskps movntps movss movups mulps mulss orps rcpps rcpss rsqrtps rsqrtss shufps sqrtps sqrtss stmxcsr subps subss ucomiss unpckhps unpcklps xorps fxrstor fxrstor64 fxsave fxsave64 xgetbv xsetbv xsave xsave64 xsaveopt xsaveopt64 xrstor xrstor64 prefetchnta prefetcht0 prefetcht1 prefetcht2 maskmovq movntq pavgb pavgw pextrw pinsrw pmaxsw pmaxub pminsw pminub pmovmskb pmulhuw psadbw pshufw pf2iw pfnacc pfpnacc pi2fw pswapd maskmovdqu clflush movntdq movnti movntpd movdqa movdqu movdq2q movq2dq paddq pmuludq pshufd pshufhw pshuflw pslldq psrldq psubq punpckhqdq punpcklqdq addpd addsd andnpd andpd cmpeqpd cmpeqsd cmplepd cmplesd cmpltpd cmpltsd cmpneqpd cmpneqsd cmpnlepd cmpnlesd cmpnltpd cmpnltsd cmpordpd cmpordsd cmpunordpd cmpunordsd cmppd comisd cvtdq2pd cvtdq2ps cvtpd2dq cvtpd2pi cvtpd2ps cvtpi2pd cvtps2dq cvtps2pd cvtsd2si cvtsd2ss cvtsi2sd cvtss2sd cvttpd2pi cvttpd2dq cvttps2dq cvttsd2si divpd divsd maxpd maxsd minpd minsd movapd movhpd movlpd movmskpd movupd mulpd mulsd orpd shufpd sqrtpd sqrtsd subpd subsd ucomisd unpckhpd unpcklpd xorpd addsubpd addsubps haddpd haddps hsubpd hsubps lddqu movddup movshdup movsldup clgi stgi vmcall vmclear vmfunc vmlaunch vmload vmmcall vmptrld vmptrst vmread vmresume vmrun vmsave vmwrite vmxoff vmxon invept invvpid pabsb pabsw pabsd palignr phaddw phaddd phaddsw phsubw phsubd phsubsw pmaddubsw pmulhrsw pshufb psignb psignw psignd extrq insertq movntsd movntss lzcnt blendpd blendps blendvpd blendvps dppd dpps extractps insertps movntdqa mpsadbw packusdw pblendvb pblendw pcmpeqq pextrb pextrd pextrq phminposuw pinsrb pinsrd pinsrq pmaxsb pmaxsd pmaxud pmaxuw pminsb pminsd pminud pminuw pmovsxbw pmovsxbd pmovsxbq pmovsxwd pmovsxwq pmovsxdq pmovzxbw pmovzxbd pmovzxbq pmovzxwd pmovzxwq pmovzxdq pmuldq pmulld ptest roundpd roundps roundsd roundss crc32 pcmpestri pcmpestrm pcmpistri pcmpistrm pcmpgtq popcnt getsec pfrcpv pfrsqrtv movbe aesenc aesenclast aesdec aesdeclast aesimc aeskeygenassist vaesenc vaesenclast vaesdec vaesdeclast vaesimc vaeskeygenassist vaddpd vaddps vaddsd vaddss vaddsubpd vaddsubps vandpd vandps vandnpd vandnps vblendpd vblendps vblendvpd vblendvps vbroadcastss vbroadcastsd vbroadcastf128 vcmpeq_ospd vcmpeqpd vcmplt_ospd vcmpltpd vcmple_ospd vcmplepd vcmpunord_qpd vcmpunordpd vcmpneq_uqpd vcmpneqpd vcmpnlt_uspd vcmpnltpd vcmpnle_uspd vcmpnlepd vcmpord_qpd vcmpordpd vcmpeq_uqpd vcmpnge_uspd vcmpngepd vcmpngt_uspd vcmpngtpd vcmpfalse_oqpd vcmpfalsepd vcmpneq_oqpd vcmpge_ospd vcmpgepd vcmpgt_ospd vcmpgtpd vcmptrue_uqpd vcmptruepd vcmplt_oqpd vcmple_oqpd vcmpunord_spd vcmpneq_uspd vcmpnlt_uqpd vcmpnle_uqpd vcmpord_spd vcmpeq_uspd vcmpnge_uqpd vcmpngt_uqpd vcmpfalse_ospd vcmpneq_ospd vcmpge_oqpd vcmpgt_oqpd vcmptrue_uspd vcmppd vcmpeq_osps vcmpeqps vcmplt_osps vcmpltps vcmple_osps vcmpleps vcmpunord_qps vcmpunordps vcmpneq_uqps vcmpneqps vcmpnlt_usps vcmpnltps vcmpnle_usps vcmpnleps vcmpord_qps vcmpordps vcmpeq_uqps vcmpnge_usps vcmpngeps vcmpngt_usps vcmpngtps vcmpfalse_oqps vcmpfalseps vcmpneq_oqps vcmpge_osps vcmpgeps vcmpgt_osps vcmpgtps vcmptrue_uqps vcmptrueps vcmplt_oqps vcmple_oqps vcmpunord_sps vcmpneq_usps vcmpnlt_uqps vcmpnle_uqps vcmpord_sps vcmpeq_usps vcmpnge_uqps vcmpngt_uqps vcmpfalse_osps vcmpneq_osps vcmpge_oqps vcmpgt_oqps vcmptrue_usps vcmpps vcmpeq_ossd vcmpeqsd vcmplt_ossd vcmpltsd vcmple_ossd vcmplesd vcmpunord_qsd vcmpunordsd vcmpneq_uqsd vcmpneqsd vcmpnlt_ussd vcmpnltsd vcmpnle_ussd vcmpnlesd vcmpord_qsd vcmpordsd vcmpeq_uqsd vcmpnge_ussd vcmpngesd vcmpngt_ussd vcmpngtsd vcmpfalse_oqsd vcmpfalsesd vcmpneq_oqsd vcmpge_ossd vcmpgesd vcmpgt_ossd vcmpgtsd vcmptrue_uqsd vcmptruesd vcmplt_oqsd vcmple_oqsd vcmpunord_ssd vcmpneq_ussd vcmpnlt_uqsd vcmpnle_uqsd vcmpord_ssd vcmpeq_ussd vcmpnge_uqsd vcmpngt_uqsd vcmpfalse_ossd vcmpneq_ossd vcmpge_oqsd vcmpgt_oqsd vcmptrue_ussd vcmpsd vcmpeq_osss vcmpeqss vcmplt_osss vcmpltss vcmple_osss vcmpless vcmpunord_qss vcmpunordss vcmpneq_uqss vcmpneqss vcmpnlt_usss vcmpnltss vcmpnle_usss vcmpnless vcmpord_qss vcmpordss vcmpeq_uqss vcmpnge_usss vcmpngess vcmpngt_usss vcmpngtss vcmpfalse_oqss vcmpfalsess vcmpneq_oqss vcmpge_osss vcmpgess vcmpgt_osss vcmpgtss vcmptrue_uqss vcmptruess vcmplt_oqss vcmple_oqss vcmpunord_sss vcmpneq_usss vcmpnlt_uqss vcmpnle_uqss vcmpord_sss vcmpeq_usss vcmpnge_uqss vcmpngt_uqss vcmpfalse_osss vcmpneq_osss vcmpge_oqss vcmpgt_oqss vcmptrue_usss vcmpss vcomisd vcomiss vcvtdq2pd vcvtdq2ps vcvtpd2dq vcvtpd2ps vcvtps2dq vcvtps2pd vcvtsd2si vcvtsd2ss vcvtsi2sd vcvtsi2ss vcvtss2sd vcvtss2si vcvttpd2dq vcvttps2dq vcvttsd2si vcvttss2si vdivpd vdivps vdivsd vdivss vdppd vdpps vextractf128 vextractps vhaddpd vhaddps vhsubpd vhsubps vinsertf128 vinsertps vlddqu vldqqu vldmxcsr vmaskmovdqu vmaskmovps vmaskmovpd vmaxpd vmaxps vmaxsd vmaxss vminpd vminps vminsd vminss vmovapd vmovaps vmovd vmovq vmovddup vmovdqa vmovqqa vmovdqu vmovqqu vmovhlps vmovhpd vmovhps vmovlhps vmovlpd vmovlps vmovmskpd vmovmskps vmovntdq vmovntqq vmovntdqa vmovntpd vmovntps vmovsd vmovshdup vmovsldup vmovss vmovupd vmovups vmpsadbw vmulpd vmulps vmulsd vmulss vorpd vorps vpabsb vpabsw vpabsd vpacksswb vpackssdw vpackuswb vpackusdw vpaddb vpaddw vpaddd vpaddq vpaddsb vpaddsw vpaddusb vpaddusw vpalignr vpand vpandn vpavgb vpavgw vpblendvb vpblendw vpcmpestri vpcmpestrm vpcmpistri vpcmpistrm vpcmpeqb vpcmpeqw vpcmpeqd vpcmpeqq vpcmpgtb vpcmpgtw vpcmpgtd vpcmpgtq vpermilpd vpermilps vperm2f128 vpextrb vpextrw vpextrd vpextrq vphaddw vphaddd vphaddsw vphminposuw vphsubw vphsubd vphsubsw vpinsrb vpinsrw vpinsrd vpinsrq vpmaddwd vpmaddubsw vpmaxsb vpmaxsw vpmaxsd vpmaxub vpmaxuw vpmaxud vpminsb vpminsw vpminsd vpminub vpminuw vpminud vpmovmskb vpmovsxbw vpmovsxbd vpmovsxbq vpmovsxwd vpmovsxwq vpmovsxdq vpmovzxbw vpmovzxbd vpmovzxbq vpmovzxwd vpmovzxwq vpmovzxdq vpmulhuw vpmulhrsw vpmulhw vpmullw vpmulld vpmuludq vpmuldq vpor vpsadbw vpshufb vpshufd vpshufhw vpshuflw vpsignb vpsignw vpsignd vpslldq vpsrldq vpsllw vpslld vpsllq vpsraw vpsrad vpsrlw vpsrld vpsrlq vptest vpsubb vpsubw vpsubd vpsubq vpsubsb vpsubsw vpsubusb vpsubusw vpunpckhbw vpunpckhwd vpunpckhdq vpunpckhqdq vpunpcklbw vpunpcklwd vpunpckldq vpunpcklqdq vpxor vrcpps vrcpss vrsqrtps vrsqrtss vroundpd vroundps vroundsd vroundss vshufpd vshufps vsqrtpd vsqrtps vsqrtsd vsqrtss vstmxcsr vsubpd vsubps vsubsd vsubss vtestps vtestpd vucomisd vucomiss vunpckhpd vunpckhps vunpcklpd vunpcklps vxorpd vxorps vzeroall vzeroupper pclmullqlqdq pclmulhqlqdq pclmullqhqdq pclmulhqhqdq pclmulqdq vpclmullqlqdq vpclmulhqlqdq vpclmullqhqdq vpclmulhqhqdq vpclmulqdq vfmadd132ps vfmadd132pd vfmadd312ps vfmadd312pd vfmadd213ps vfmadd213pd vfmadd123ps vfmadd123pd vfmadd231ps vfmadd231pd vfmadd321ps vfmadd321pd vfmaddsub132ps vfmaddsub132pd vfmaddsub312ps vfmaddsub312pd vfmaddsub213ps vfmaddsub213pd vfmaddsub123ps vfmaddsub123pd vfmaddsub231ps vfmaddsub231pd vfmaddsub321ps vfmaddsub321pd vfmsub132ps vfmsub132pd vfmsub312ps vfmsub312pd vfmsub213ps vfmsub213pd vfmsub123ps vfmsub123pd vfmsub231ps vfmsub231pd vfmsub321ps vfmsub321pd vfmsubadd132ps vfmsubadd132pd vfmsubadd312ps vfmsubadd312pd vfmsubadd213ps vfmsubadd213pd vfmsubadd123ps vfmsubadd123pd vfmsubadd231ps vfmsubadd231pd vfmsubadd321ps vfmsubadd321pd vfnmadd132ps vfnmadd132pd vfnmadd312ps vfnmadd312pd vfnmadd213ps vfnmadd213pd vfnmadd123ps vfnmadd123pd vfnmadd231ps vfnmadd231pd vfnmadd321ps vfnmadd321pd vfnmsub132ps vfnmsub132pd vfnmsub312ps vfnmsub312pd vfnmsub213ps vfnmsub213pd vfnmsub123ps vfnmsub123pd vfnmsub231ps vfnmsub231pd vfnmsub321ps vfnmsub321pd vfmadd132ss vfmadd132sd vfmadd312ss vfmadd312sd vfmadd213ss vfmadd213sd vfmadd123ss vfmadd123sd vfmadd231ss vfmadd231sd vfmadd321ss vfmadd321sd vfmsub132ss vfmsub132sd vfmsub312ss vfmsub312sd vfmsub213ss vfmsub213sd vfmsub123ss vfmsub123sd vfmsub231ss vfmsub231sd vfmsub321ss vfmsub321sd vfnmadd132ss vfnmadd132sd vfnmadd312ss vfnmadd312sd vfnmadd213ss vfnmadd213sd vfnmadd123ss vfnmadd123sd vfnmadd231ss vfnmadd231sd vfnmadd321ss vfnmadd321sd vfnmsub132ss vfnmsub132sd vfnmsub312ss vfnmsub312sd vfnmsub213ss vfnmsub213sd vfnmsub123ss vfnmsub123sd vfnmsub231ss vfnmsub231sd vfnmsub321ss vfnmsub321sd rdfsbase rdgsbase rdrand wrfsbase wrgsbase vcvtph2ps vcvtps2ph adcx adox rdseed clac stac xstore xcryptecb xcryptcbc xcryptctr xcryptcfb xcryptofb montmul xsha1 xsha256 llwpcb slwpcb lwpval lwpins vfmaddpd vfmaddps vfmaddsd vfmaddss vfmaddsubpd vfmaddsubps vfmsubaddpd vfmsubaddps vfmsubpd vfmsubps vfmsubsd vfmsubss vfnmaddpd vfnmaddps vfnmaddsd vfnmaddss vfnmsubpd vfnmsubps vfnmsubsd vfnmsubss vfrczpd vfrczps vfrczsd vfrczss vpcmov vpcomb vpcomd vpcomq vpcomub vpcomud vpcomuq vpcomuw vpcomw vphaddbd vphaddbq vphaddbw vphadddq vphaddubd vphaddubq vphaddubw vphaddudq vphadduwd vphadduwq vphaddwd vphaddwq vphsubbw vphsubdq vphsubwd vpmacsdd vpmacsdqh vpmacsdql vpmacssdd vpmacssdqh vpmacssdql vpmacsswd vpmacssww vpmacswd vpmacsww vpmadcsswd vpmadcswd vpperm vprotb vprotd vprotq vprotw vpshab vpshad vpshaq vpshaw vpshlb vpshld vpshlq vpshlw vbroadcasti128 vpblendd vpbroadcastb vpbroadcastw vpbroadcastd vpbroadcastq vpermd vpermpd vpermps vpermq vperm2i128 vextracti128 vinserti128 vpmaskmovd vpmaskmovq vpsllvd vpsllvq vpsravd vpsrlvd vpsrlvq vgatherdpd vgatherqpd vgatherdps vgatherqps vpgatherdd vpgatherqd vpgatherdq vpgatherqq xabort xbegin xend xtest andn bextr blci blcic blsi blsic blcfill blsfill blcmsk blsmsk blsr blcs bzhi mulx pdep pext rorx sarx shlx shrx tzcnt tzmsk t1mskc valignd valignq vblendmpd vblendmps vbroadcastf32x4 vbroadcastf64x4 vbroadcasti32x4 vbroadcasti64x4 vcompresspd vcompressps vcvtpd2udq vcvtps2udq vcvtsd2usi vcvtss2usi vcvttpd2udq vcvttps2udq vcvttsd2usi vcvttss2usi vcvtudq2pd vcvtudq2ps vcvtusi2sd vcvtusi2ss vexpandpd vexpandps vextractf32x4 vextractf64x4 vextracti32x4 vextracti64x4 vfixupimmpd vfixupimmps vfixupimmsd vfixupimmss vgetexppd vgetexpps vgetexpsd vgetexpss vgetmantpd vgetmantps vgetmantsd vgetmantss vinsertf32x4 vinsertf64x4 vinserti32x4 vinserti64x4 vmovdqa32 vmovdqa64 vmovdqu32 vmovdqu64 vpabsq vpandd vpandnd vpandnq vpandq vpblendmd vpblendmq vpcmpltd vpcmpled vpcmpneqd vpcmpnltd vpcmpnled vpcmpd vpcmpltq vpcmpleq vpcmpneqq vpcmpnltq vpcmpnleq vpcmpq vpcmpequd vpcmpltud vpcmpleud vpcmpnequd vpcmpnltud vpcmpnleud vpcmpud vpcmpequq vpcmpltuq vpcmpleuq vpcmpnequq vpcmpnltuq vpcmpnleuq vpcmpuq vpcompressd vpcompressq vpermi2d vpermi2pd vpermi2ps vpermi2q vpermt2d vpermt2pd vpermt2ps vpermt2q vpexpandd vpexpandq vpmaxsq vpmaxuq vpminsq vpminuq vpmovdb vpmovdw vpmovqb vpmovqd vpmovqw vpmovsdb vpmovsdw vpmovsqb vpmovsqd vpmovsqw vpmovusdb vpmovusdw vpmovusqb vpmovusqd vpmovusqw vpord vporq vprold vprolq vprolvd vprolvq vprord vprorq vprorvd vprorvq vpscatterdd vpscatterdq vpscatterqd vpscatterqq vpsraq vpsravq vpternlogd vpternlogq vptestmd vptestmq vptestnmd vptestnmq vpxord vpxorq vrcp14pd vrcp14ps vrcp14sd vrcp14ss vrndscalepd vrndscaleps vrndscalesd vrndscaless vrsqrt14pd vrsqrt14ps vrsqrt14sd vrsqrt14ss vscalefpd vscalefps vscalefsd vscalefss vscatterdpd vscatterdps vscatterqpd vscatterqps vshuff32x4 vshuff64x2 vshufi32x4 vshufi64x2 kandnw kandw kmovw knotw kortestw korw kshiftlw kshiftrw kunpckbw kxnorw kxorw vpbroadcastmb2q vpbroadcastmw2d vpconflictd vpconflictq vplzcntd vplzcntq vexp2pd vexp2ps vrcp28pd vrcp28ps vrcp28sd vrcp28ss vrsqrt28pd vrsqrt28ps vrsqrt28sd vrsqrt28ss vgatherpf0dpd vgatherpf0dps vgatherpf0qpd vgatherpf0qps vgatherpf1dpd vgatherpf1dps vgatherpf1qpd vgatherpf1qps vscatterpf0dpd vscatterpf0dps vscatterpf0qpd vscatterpf0qps vscatterpf1dpd vscatterpf1dps vscatterpf1qpd vscatterpf1qps prefetchwt1 bndmk bndcl bndcu bndcn bndmov bndldx bndstx sha1rnds4 sha1nexte sha1msg1 sha1msg2 sha256rnds2 sha256msg1 sha256msg2 hint_nop0 hint_nop1 hint_nop2 hint_nop3 hint_nop4 hint_nop5 hint_nop6 hint_nop7 hint_nop8 hint_nop9 hint_nop10 hint_nop11 hint_nop12 hint_nop13 hint_nop14 hint_nop15 hint_nop16 hint_nop17 hint_nop18 hint_nop19 hint_nop20 hint_nop21 hint_nop22 hint_nop23 hint_nop24 hint_nop25 hint_nop26 hint_nop27 hint_nop28 hint_nop29 hint_nop30 hint_nop31 hint_nop32 hint_nop33 hint_nop34 hint_nop35 hint_nop36 hint_nop37 hint_nop38 hint_nop39 hint_nop40 hint_nop41 hint_nop42 hint_nop43 hint_nop44 hint_nop45 hint_nop46 hint_nop47 hint_nop48 hint_nop49 hint_nop50 hint_nop51 hint_nop52 hint_nop53 hint_nop54 hint_nop55 hint_nop56 hint_nop57 hint_nop58 hint_nop59 hint_nop60 hint_nop61 hint_nop62 hint_nop63",built_in:"ip eip rip al ah bl bh cl ch dl dh sil dil bpl spl r8b r9b r10b r11b r12b r13b r14b r15b ax bx cx dx si di bp sp r8w r9w r10w r11w r12w r13w r14w r15w eax ebx ecx edx esi edi ebp esp eip r8d r9d r10d r11d r12d r13d r14d r15d rax rbx rcx rdx rsi rdi rbp rsp r8 r9 r10 r11 r12 r13 r14 r15 cs ds es fs gs ss st st0 st1 st2 st3 st4 st5 st6 st7 mm0 mm1 mm2 mm3 mm4 mm5 mm6 mm7 xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 xmm16 xmm17 xmm18 xmm19 xmm20 xmm21 xmm22 xmm23 xmm24 xmm25 xmm26 xmm27 xmm28 xmm29 xmm30 xmm31 ymm0 ymm1 ymm2 ymm3 ymm4 ymm5 ymm6 ymm7 ymm8 ymm9 ymm10 ymm11 ymm12 ymm13 ymm14 ymm15 ymm16 ymm17 ymm18 ymm19 ymm20 ymm21 ymm22 ymm23 ymm24 ymm25 ymm26 ymm27 ymm28 ymm29 ymm30 ymm31 zmm0 zmm1 zmm2 zmm3 zmm4 zmm5 zmm6 zmm7 zmm8 zmm9 zmm10 zmm11 zmm12 zmm13 zmm14 zmm15 zmm16 zmm17 zmm18 zmm19 zmm20 zmm21 zmm22 zmm23 zmm24 zmm25 zmm26 zmm27 zmm28 zmm29 zmm30 zmm31 k0 k1 k2 k3 k4 k5 k6 k7 bnd0 bnd1 bnd2 bnd3 cr0 cr1 cr2 cr3 cr4 cr8 dr0 dr1 dr2 dr3 dr8 tr3 tr4 tr5 tr6 tr7 r0 r1 r2 r3 r4 r5 r6 r7 r0b r1b r2b r3b r4b r5b r6b r7b r0w r1w r2w r3w r4w r5w r6w r7w r0d r1d r2d r3d r4d r5d r6d r7d r0h r1h r2h r3h r0l r1l r2l r3l r4l r5l r6l r7l r8l r9l r10l r11l r12l r13l r14l r15l db dw dd dq dt ddq do dy dz resb resw resd resq rest resdq reso resy resz incbin equ times byte word dword qword nosplit rel abs seg wrt strict near far a32 ptr",meta:"%define %xdefine %+ %undef %defstr %deftok %assign %strcat %strlen %substr %rotate %elif %else %endif %if %ifmacro %ifctx %ifidn %ifidni %ifid %ifnum %ifstr %iftoken %ifempty %ifenv %error %warning %fatal %rep %endrep %include %push %pop %repl %pathsearch %depend %use %arg %stacksize %local %line %comment %endcomment .nolist __FILE__ __LINE__ __SECT__ __BITS__ __OUTPUT_FORMAT__ __DATE__ __TIME__ __DATE_NUM__ __TIME_NUM__ __UTC_DATE__ __UTC_TIME__ __UTC_DATE_NUM__ __UTC_TIME_NUM__ __PASS__ struc endstruc istruc at iend align alignb sectalign daz nodaz up down zero default option assume public bits use16 use32 use64 default section segment absolute extern global common cpu float __utf16__ __utf16le__ __utf16be__ __utf32__ __utf32le__ __utf32be__ __float8__ __float16__ __float32__ __float64__ __float80m__ __float80e__ __float128l__ __float128h__ __Infinity__ __QNaN__ __SNaN__ Inf NaN QNaN SNaN float8 float16 float32 float64 float80m float80e float128l float128h __FLOAT_DAZ__ __FLOAT_ROUND__ __FLOAT__"},contains:[s.COMMENT(";","$",{relevance:0}),{className:"number",variants:[{begin:"\\b(?:([0-9][0-9_]*)?\\.[0-9_]*(?:[eE][+-]?[0-9_]+)?|(0[Xx])?[0-9][0-9_]*\\.?[0-9_]*(?:[pP](?:[+-]?[0-9_]+)?)?)\\b",relevance:0},{begin:"\\$[0-9][0-9A-Fa-f]*",relevance:0},{begin:"\\b(?:[0-9A-Fa-f][0-9A-Fa-f_]*[Hh]|[0-9][0-9_]*[DdTt]?|[0-7][0-7_]*[QqOo]|[0-1][0-1_]*[BbYy])\\b"},{begin:"\\b(?:0[Xx][0-9A-Fa-f_]+|0[DdTt][0-9_]+|0[QqOo][0-7_]+|0[BbYy][0-1_]+)\\b"}]},s.QUOTE_STRING_MODE,{className:"string",variants:[{begin:"'",end:"[^\\\\]'"},{begin:"`",end:"[^\\\\]`"}],relevance:0},{className:"symbol",variants:[{begin:"^\\s*[A-Za-z._?][A-Za-z0-9_$#@~.?]*(:|\\s+label)"},{begin:"^\\s*%%[A-Za-z0-9_$#@~.?]*:"}],relevance:0},{className:"subst",begin:"%[0-9]+",relevance:0},{className:"subst",begin:"%!S+",relevance:0},{className:"meta",begin:/^\s*\.[\w_-]+/}]}}}());hljs.registerLanguage("kotlin",function(){"use strict";return function(e){var n={keyword:"abstract as val var vararg get set class object open private protected public noinline crossinline dynamic final enum if else do while for when throw try catch finally import package is in fun override companion reified inline lateinit init interface annotation data sealed internal infix operator out by constructor super tailrec where const inner suspend typealias external expect actual trait volatile transient native default",built_in:"Byte Short Char Int Long Boolean Float Double Void Unit Nothing",literal:"true false null"},a={className:"symbol",begin:e.UNDERSCORE_IDENT_RE+"@"},i={className:"subst",begin:"\\${",end:"}",contains:[e.C_NUMBER_MODE]},s={className:"variable",begin:"\\$"+e.UNDERSCORE_IDENT_RE},t={className:"string",variants:[{begin:'"""',end:'"""(?=[^"])',contains:[s,i]},{begin:"'",end:"'",illegal:/\n/,contains:[e.BACKSLASH_ESCAPE]},{begin:'"',end:'"',illegal:/\n/,contains:[e.BACKSLASH_ESCAPE,s,i]}]};i.contains.push(t);var r={className:"meta",begin:"@(?:file|property|field|get|set|receiver|param|setparam|delegate)\\s*:(?:\\s*"+e.UNDERSCORE_IDENT_RE+")?"},l={className:"meta",begin:"@"+e.UNDERSCORE_IDENT_RE,contains:[{begin:/\(/,end:/\)/,contains:[e.inherit(t,{className:"meta-string"})]}]},c=e.COMMENT("/\\*","\\*/",{contains:[e.C_BLOCK_COMMENT_MODE]}),o={variants:[{className:"type",begin:e.UNDERSCORE_IDENT_RE},{begin:/\(/,end:/\)/,contains:[]}]},d=o;return d.variants[1].contains=[o],o.variants[1].contains=[d],{name:"Kotlin",aliases:["kt"],keywords:n,contains:[e.COMMENT("/\\*\\*","\\*/",{relevance:0,contains:[{className:"doctag",begin:"@[A-Za-z]+"}]}),e.C_LINE_COMMENT_MODE,c,{className:"keyword",begin:/\b(break|continue|return|this)\b/,starts:{contains:[{className:"symbol",begin:/@\w+/}]}},a,r,l,{className:"function",beginKeywords:"fun",end:"[(]|$",returnBegin:!0,excludeEnd:!0,keywords:n,illegal:/fun\s+(<.*>)?[^\s\(]+(\s+[^\s\(]+)\s*=/,relevance:5,contains:[{begin:e.UNDERSCORE_IDENT_RE+"\\s*\\(",returnBegin:!0,relevance:0,contains:[e.UNDERSCORE_TITLE_MODE]},{className:"type",begin://,keywords:"reified",relevance:0},{className:"params",begin:/\(/,end:/\)/,endsParent:!0,keywords:n,relevance:0,contains:[{begin:/:/,end:/[=,\/]/,endsWithParent:!0,contains:[o,e.C_LINE_COMMENT_MODE,c],relevance:0},e.C_LINE_COMMENT_MODE,c,r,l,t,e.C_NUMBER_MODE]},c]},{className:"class",beginKeywords:"class interface trait",end:/[:\{(]|$/,excludeEnd:!0,illegal:"extends implements",contains:[{beginKeywords:"public protected internal private constructor"},e.UNDERSCORE_TITLE_MODE,{className:"type",begin://,excludeBegin:!0,excludeEnd:!0,relevance:0},{className:"type",begin:/[,:]\s*/,end:/[<\(,]|$/,excludeBegin:!0,returnEnd:!0},r,l]},t,{className:"meta",begin:"^#!/usr/bin/env",end:"$",illegal:"\n"},{className:"number",begin:"\\b(0[bB]([01]+[01_]+[01]+|[01]+)|0[xX]([a-fA-F0-9]+[a-fA-F0-9_]+[a-fA-F0-9]+|[a-fA-F0-9]+)|(([\\d]+[\\d_]+[\\d]+|[\\d]+)(\\.([\\d]+[\\d_]+[\\d]+|[\\d]+))?|\\.([\\d]+[\\d_]+[\\d]+|[\\d]+))([eE][-+]?\\d+)?)[lLfF]?",relevance:0}]}}}());hljs.registerLanguage("armasm",function(){"use strict";return function(s){const e={variants:[s.COMMENT("^[ \\t]*(?=#)","$",{relevance:0,excludeBegin:!0}),s.COMMENT("[;@]","$",{relevance:0}),s.C_LINE_COMMENT_MODE,s.C_BLOCK_COMMENT_MODE]};return{name:"ARM Assembly",case_insensitive:!0,aliases:["arm"],keywords:{$pattern:"\\.?"+s.IDENT_RE,meta:".2byte .4byte .align .ascii .asciz .balign .byte .code .data .else .end .endif .endm .endr .equ .err .exitm .extern .global .hword .if .ifdef .ifndef .include .irp .long .macro .rept .req .section .set .skip .space .text .word .arm .thumb .code16 .code32 .force_thumb .thumb_func .ltorg ALIAS ALIGN ARM AREA ASSERT ATTR CN CODE CODE16 CODE32 COMMON CP DATA DCB DCD DCDU DCDO DCFD DCFDU DCI DCQ DCQU DCW DCWU DN ELIF ELSE END ENDFUNC ENDIF ENDP ENTRY EQU EXPORT EXPORTAS EXTERN FIELD FILL FUNCTION GBLA GBLL GBLS GET GLOBAL IF IMPORT INCBIN INCLUDE INFO KEEP LCLA LCLL LCLS LTORG MACRO MAP MEND MEXIT NOFP OPT PRESERVE8 PROC QN READONLY RELOC REQUIRE REQUIRE8 RLIST FN ROUT SETA SETL SETS SN SPACE SUBT THUMB THUMBX TTL WHILE WEND ",built_in:"r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 pc lr sp ip sl sb fp a1 a2 a3 a4 v1 v2 v3 v4 v5 v6 v7 v8 f0 f1 f2 f3 f4 f5 f6 f7 p0 p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 q0 q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 q15 cpsr_c cpsr_x cpsr_s cpsr_f cpsr_cx cpsr_cxs cpsr_xs cpsr_xsf cpsr_sf cpsr_cxsf spsr_c spsr_x spsr_s spsr_f spsr_cx spsr_cxs spsr_xs spsr_xsf spsr_sf spsr_cxsf s0 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16 s17 s18 s19 s20 s21 s22 s23 s24 s25 s26 s27 s28 s29 s30 s31 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15 d16 d17 d18 d19 d20 d21 d22 d23 d24 d25 d26 d27 d28 d29 d30 d31 {PC} {VAR} {TRUE} {FALSE} {OPT} {CONFIG} {ENDIAN} {CODESIZE} {CPU} {FPU} {ARCHITECTURE} {PCSTOREOFFSET} {ARMASM_VERSION} {INTER} {ROPI} {RWPI} {SWST} {NOSWST} . @"},contains:[{className:"keyword",begin:"\\b(adc|(qd?|sh?|u[qh]?)?add(8|16)?|usada?8|(q|sh?|u[qh]?)?(as|sa)x|and|adrl?|sbc|rs[bc]|asr|b[lx]?|blx|bxj|cbn?z|tb[bh]|bic|bfc|bfi|[su]bfx|bkpt|cdp2?|clz|clrex|cmp|cmn|cpsi[ed]|cps|setend|dbg|dmb|dsb|eor|isb|it[te]{0,3}|lsl|lsr|ror|rrx|ldm(([id][ab])|f[ds])?|ldr((s|ex)?[bhd])?|movt?|mvn|mra|mar|mul|[us]mull|smul[bwt][bt]|smu[as]d|smmul|smmla|mla|umlaal|smlal?([wbt][bt]|d)|mls|smlsl?[ds]|smc|svc|sev|mia([bt]{2}|ph)?|mrr?c2?|mcrr2?|mrs|msr|orr|orn|pkh(tb|bt)|rbit|rev(16|sh)?|sel|[su]sat(16)?|nop|pop|push|rfe([id][ab])?|stm([id][ab])?|str(ex)?[bhd]?|(qd?)?sub|(sh?|q|u[qh]?)?sub(8|16)|[su]xt(a?h|a?b(16)?)|srs([id][ab])?|swpb?|swi|smi|tst|teq|wfe|wfi|yield)(eq|ne|cs|cc|mi|pl|vs|vc|hi|ls|ge|lt|gt|le|al|hs|lo)?[sptrx]?(?=\\s)"},e,s.QUOTE_STRING_MODE,{className:"string",begin:"'",end:"[^\\\\]'",relevance:0},{className:"title",begin:"\\|",end:"\\|",illegal:"\\n",relevance:0},{className:"number",variants:[{begin:"[#$=]?0x[0-9a-f]+"},{begin:"[#$=]?0b[01]+"},{begin:"[#$=]\\d+"},{begin:"\\b\\d+"}],relevance:0},{className:"symbol",variants:[{begin:"^[ \\t]*[a-z_\\.\\$][a-z0-9_\\.\\$]+:"},{begin:"^[a-z_\\.\\$][a-z0-9_\\.\\$]+"},{begin:"[=#]\\w+"}],relevance:0}]}}}());hljs.registerLanguage("go",function(){"use strict";return function(e){var n={keyword:"break default func interface select case map struct chan else goto package switch const fallthrough if range type continue for import return var go defer bool byte complex64 complex128 float32 float64 int8 int16 int32 int64 string uint8 uint16 uint32 uint64 int uint uintptr rune",literal:"true false iota nil",built_in:"append cap close complex copy imag len make new panic print println real recover delete"};return{name:"Go",aliases:["golang"],keywords:n,illegal:">>|\.\.\.) /},i={className:"subst",begin:/\{/,end:/\}/,keywords:n,illegal:/#/},s={begin:/\{\{/,relevance:0},r={className:"string",contains:[e.BACKSLASH_ESCAPE],variants:[{begin:/(u|b)?r?'''/,end:/'''/,contains:[e.BACKSLASH_ESCAPE,a],relevance:10},{begin:/(u|b)?r?"""/,end:/"""/,contains:[e.BACKSLASH_ESCAPE,a],relevance:10},{begin:/(fr|rf|f)'''/,end:/'''/,contains:[e.BACKSLASH_ESCAPE,a,s,i]},{begin:/(fr|rf|f)"""/,end:/"""/,contains:[e.BACKSLASH_ESCAPE,a,s,i]},{begin:/(u|r|ur)'/,end:/'/,relevance:10},{begin:/(u|r|ur)"/,end:/"/,relevance:10},{begin:/(b|br)'/,end:/'/},{begin:/(b|br)"/,end:/"/},{begin:/(fr|rf|f)'/,end:/'/,contains:[e.BACKSLASH_ESCAPE,s,i]},{begin:/(fr|rf|f)"/,end:/"/,contains:[e.BACKSLASH_ESCAPE,s,i]},e.APOS_STRING_MODE,e.QUOTE_STRING_MODE]},l={className:"number",relevance:0,variants:[{begin:e.BINARY_NUMBER_RE+"[lLjJ]?"},{begin:"\\b(0o[0-7]+)[lLjJ]?"},{begin:e.C_NUMBER_RE+"[lLjJ]?"}]},t={className:"params",variants:[{begin:/\(\s*\)/,skip:!0,className:null},{begin:/\(/,end:/\)/,excludeBegin:!0,excludeEnd:!0,contains:["self",a,l,r,e.HASH_COMMENT_MODE]}]};return i.contains=[r,l,a],{name:"Python",aliases:["py","gyp","ipython"],keywords:n,illegal:/(<\/|->|\?)|=>/,contains:[a,l,{beginKeywords:"if",relevance:0},r,e.HASH_COMMENT_MODE,{variants:[{className:"function",beginKeywords:"def"},{className:"class",beginKeywords:"class"}],end:/:/,illegal:/[${=;\n,]/,contains:[e.UNDERSCORE_TITLE_MODE,t,{begin:/->/,endsWithParent:!0,keywords:"None"}]},{className:"meta",begin:/^[\t ]*@/,end:/$/},{begin:/\b(print|exec)\(/}]}}}());hljs.registerLanguage("shell",function(){"use strict";return function(s){return{name:"Shell Session",aliases:["console"],contains:[{className:"meta",begin:"^\\s{0,3}[/\\w\\d\\[\\]()@-]*[>%$#]",starts:{end:"$",subLanguage:"bash"}}]}}}());hljs.registerLanguage("scala",function(){"use strict";return function(e){var n={className:"subst",variants:[{begin:"\\$[A-Za-z0-9_]+"},{begin:"\\${",end:"}"}]},a={className:"string",variants:[{begin:'"',end:'"',illegal:"\\n",contains:[e.BACKSLASH_ESCAPE]},{begin:'"""',end:'"""',relevance:10},{begin:'[a-z]+"',end:'"',illegal:"\\n",contains:[e.BACKSLASH_ESCAPE,n]},{className:"string",begin:'[a-z]+"""',end:'"""',contains:[n],relevance:10}]},s={className:"type",begin:"\\b[A-Z][A-Za-z0-9_]*",relevance:0},t={className:"title",begin:/[^0-9\n\t "'(),.`{}\[\]:;][^\n\t "'(),.`{}\[\]:;]+|[^0-9\n\t "'(),.`{}\[\]:;=]/,relevance:0},i={className:"class",beginKeywords:"class object trait type",end:/[:={\[\n;]/,excludeEnd:!0,contains:[{beginKeywords:"extends with",relevance:10},{begin:/\[/,end:/\]/,excludeBegin:!0,excludeEnd:!0,relevance:0,contains:[s]},{className:"params",begin:/\(/,end:/\)/,excludeBegin:!0,excludeEnd:!0,relevance:0,contains:[s]},t]},l={className:"function",beginKeywords:"def",end:/[:={\[(\n;]/,excludeEnd:!0,contains:[t]};return{name:"Scala",keywords:{literal:"true false null",keyword:"type yield lazy override def with val var sealed abstract private trait object if forSome for while throw finally protected extends import final return else break new catch super class case package default try this match continue throws implicit"},contains:[e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE,a,{className:"symbol",begin:"'\\w[\\w\\d_]*(?!')"},s,l,i,e.C_NUMBER_MODE,{className:"meta",begin:"@[A-Za-z]+"}]}}}());hljs.registerLanguage("julia",function(){"use strict";return function(e){var r="[A-Za-z_\\u00A1-\\uFFFF][A-Za-z_0-9\\u00A1-\\uFFFF]*",t={$pattern:r,keyword:"in isa where baremodule begin break catch ccall const continue do else elseif end export false finally for function global if import importall let local macro module quote return true try using while type immutable abstract bitstype typealias ",literal:"true false ARGS C_NULL DevNull ENDIAN_BOM ENV I Inf Inf16 Inf32 Inf64 InsertionSort JULIA_HOME LOAD_PATH MergeSort NaN NaN16 NaN32 NaN64 PROGRAM_FILE QuickSort RoundDown RoundFromZero RoundNearest RoundNearestTiesAway RoundNearestTiesUp RoundToZero RoundUp STDERR STDIN STDOUT VERSION catalan e|0 eu|0 eulergamma golden im nothing pi γ π φ ",built_in:"ANY AbstractArray AbstractChannel AbstractFloat AbstractMatrix AbstractRNG AbstractSerializer AbstractSet AbstractSparseArray AbstractSparseMatrix AbstractSparseVector AbstractString AbstractUnitRange AbstractVecOrMat AbstractVector Any ArgumentError Array AssertionError Associative Base64DecodePipe Base64EncodePipe Bidiagonal BigFloat BigInt BitArray BitMatrix BitVector Bool BoundsError BufferStream CachingPool CapturedException CartesianIndex CartesianRange Cchar Cdouble Cfloat Channel Char Cint Cintmax_t Clong Clonglong ClusterManager Cmd CodeInfo Colon Complex Complex128 Complex32 Complex64 CompositeException Condition ConjArray ConjMatrix ConjVector Cptrdiff_t Cshort Csize_t Cssize_t Cstring Cuchar Cuint Cuintmax_t Culong Culonglong Cushort Cwchar_t Cwstring DataType Date DateFormat DateTime DenseArray DenseMatrix DenseVecOrMat DenseVector Diagonal Dict DimensionMismatch Dims DirectIndexString Display DivideError DomainError EOFError EachLine Enum Enumerate ErrorException Exception ExponentialBackOff Expr Factorization FileMonitor Float16 Float32 Float64 Function Future GlobalRef GotoNode HTML Hermitian IO IOBuffer IOContext IOStream IPAddr IPv4 IPv6 IndexCartesian IndexLinear IndexStyle InexactError InitError Int Int128 Int16 Int32 Int64 Int8 IntSet Integer InterruptException InvalidStateException Irrational KeyError LabelNode LinSpace LineNumberNode LoadError LowerTriangular MIME Matrix MersenneTwister Method MethodError MethodTable Module NTuple NewvarNode NullException Nullable Number ObjectIdDict OrdinalRange OutOfMemoryError OverflowError Pair ParseError PartialQuickSort PermutedDimsArray Pipe PollingFileWatcher ProcessExitedException Ptr QuoteNode RandomDevice Range RangeIndex Rational RawFD ReadOnlyMemoryError Real ReentrantLock Ref Regex RegexMatch RemoteChannel RemoteException RevString RoundingMode RowVector SSAValue SegmentationFault SerializationState Set SharedArray SharedMatrix SharedVector Signed SimpleVector Slot SlotNumber SparseMatrixCSC SparseVector StackFrame StackOverflowError StackTrace StepRange StepRangeLen StridedArray StridedMatrix StridedVecOrMat StridedVector String SubArray SubString SymTridiagonal Symbol Symmetric SystemError TCPSocket Task Text TextDisplay Timer Tridiagonal Tuple Type TypeError TypeMapEntry TypeMapLevel TypeName TypeVar TypedSlot UDPSocket UInt UInt128 UInt16 UInt32 UInt64 UInt8 UndefRefError UndefVarError UnicodeError UniformScaling Union UnionAll UnitRange Unsigned UpperTriangular Val Vararg VecElement VecOrMat Vector VersionNumber Void WeakKeyDict WeakRef WorkerConfig WorkerPool "},a={keywords:t,illegal:/<\//},n={className:"subst",begin:/\$\(/,end:/\)/,keywords:t},o={className:"variable",begin:"\\$"+r},i={className:"string",contains:[e.BACKSLASH_ESCAPE,n,o],variants:[{begin:/\w*"""/,end:/"""\w*/,relevance:10},{begin:/\w*"/,end:/"\w*/}]},l={className:"string",contains:[e.BACKSLASH_ESCAPE,n,o],begin:"`",end:"`"},s={className:"meta",begin:"@"+r};return a.name="Julia",a.contains=[{className:"number",begin:/(\b0x[\d_]*(\.[\d_]*)?|0x\.\d[\d_]*)p[-+]?\d+|\b0[box][a-fA-F0-9][a-fA-F0-9_]*|(\b\d[\d_]*(\.[\d_]*)?|\.\d[\d_]*)([eEfF][-+]?\d+)?/,relevance:0},{className:"string",begin:/'(.|\\[xXuU][a-zA-Z0-9]+)'/},i,l,s,{className:"comment",variants:[{begin:"#=",end:"=#",relevance:10},{begin:"#",end:"$"}]},e.HASH_COMMENT_MODE,{className:"keyword",begin:"\\b(((abstract|primitive)\\s+)type|(mutable\\s+)?struct)\\b"},{begin:/<:/}],n.contains=a.contains,a}}());hljs.registerLanguage("php-template",function(){"use strict";return function(n){return{name:"PHP template",subLanguage:"xml",contains:[{begin:/<\?(php|=)?/,end:/\?>/,subLanguage:"php",contains:[{begin:"/\\*",end:"\\*/",skip:!0},{begin:'b"',end:'"',skip:!0},{begin:"b'",end:"'",skip:!0},n.inherit(n.APOS_STRING_MODE,{illegal:null,className:null,contains:null,skip:!0}),n.inherit(n.QUOTE_STRING_MODE,{illegal:null,className:null,contains:null,skip:!0})]}]}}}());hljs.registerLanguage("scss",function(){"use strict";return function(e){var t={className:"variable",begin:"(\\$[a-zA-Z-][a-zA-Z0-9_-]*)\\b"},i={className:"number",begin:"#[0-9A-Fa-f]+"};return e.CSS_NUMBER_MODE,e.QUOTE_STRING_MODE,e.APOS_STRING_MODE,e.C_BLOCK_COMMENT_MODE,{name:"SCSS",case_insensitive:!0,illegal:"[=/|']",contains:[e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE,{className:"selector-id",begin:"\\#[A-Za-z0-9_-]+",relevance:0},{className:"selector-class",begin:"\\.[A-Za-z0-9_-]+",relevance:0},{className:"selector-attr",begin:"\\[",end:"\\]",illegal:"$"},{className:"selector-tag",begin:"\\b(a|abbr|acronym|address|area|article|aside|audio|b|base|big|blockquote|body|br|button|canvas|caption|cite|code|col|colgroup|command|datalist|dd|del|details|dfn|div|dl|dt|em|embed|fieldset|figcaption|figure|footer|form|frame|frameset|(h[1-6])|head|header|hgroup|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|map|mark|meta|meter|nav|noframes|noscript|object|ol|optgroup|option|output|p|param|pre|progress|q|rp|rt|ruby|samp|script|section|select|small|span|strike|strong|style|sub|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|tt|ul|var|video)\\b",relevance:0},{className:"selector-pseudo",begin:":(visited|valid|root|right|required|read-write|read-only|out-range|optional|only-of-type|only-child|nth-of-type|nth-last-of-type|nth-last-child|nth-child|not|link|left|last-of-type|last-child|lang|invalid|indeterminate|in-range|hover|focus|first-of-type|first-line|first-letter|first-child|first|enabled|empty|disabled|default|checked|before|after|active)"},{className:"selector-pseudo",begin:"::(after|before|choices|first-letter|first-line|repeat-index|repeat-item|selection|value)"},t,{className:"attribute",begin:"\\b(src|z-index|word-wrap|word-spacing|word-break|width|widows|white-space|visibility|vertical-align|unicode-bidi|transition-timing-function|transition-property|transition-duration|transition-delay|transition|transform-style|transform-origin|transform|top|text-underline-position|text-transform|text-shadow|text-rendering|text-overflow|text-indent|text-decoration-style|text-decoration-line|text-decoration-color|text-decoration|text-align-last|text-align|tab-size|table-layout|right|resize|quotes|position|pointer-events|perspective-origin|perspective|page-break-inside|page-break-before|page-break-after|padding-top|padding-right|padding-left|padding-bottom|padding|overflow-y|overflow-x|overflow-wrap|overflow|outline-width|outline-style|outline-offset|outline-color|outline|orphans|order|opacity|object-position|object-fit|normal|none|nav-up|nav-right|nav-left|nav-index|nav-down|min-width|min-height|max-width|max-height|mask|marks|margin-top|margin-right|margin-left|margin-bottom|margin|list-style-type|list-style-position|list-style-image|list-style|line-height|letter-spacing|left|justify-content|initial|inherit|ime-mode|image-orientation|image-resolution|image-rendering|icon|hyphens|height|font-weight|font-variant-ligatures|font-variant|font-style|font-stretch|font-size-adjust|font-size|font-language-override|font-kerning|font-feature-settings|font-family|font|float|flex-wrap|flex-shrink|flex-grow|flex-flow|flex-direction|flex-basis|flex|filter|empty-cells|display|direction|cursor|counter-reset|counter-increment|content|column-width|column-span|column-rule-width|column-rule-style|column-rule-color|column-rule|column-gap|column-fill|column-count|columns|color|clip-path|clip|clear|caption-side|break-inside|break-before|break-after|box-sizing|box-shadow|box-decoration-break|bottom|border-width|border-top-width|border-top-style|border-top-right-radius|border-top-left-radius|border-top-color|border-top|border-style|border-spacing|border-right-width|border-right-style|border-right-color|border-right|border-radius|border-left-width|border-left-style|border-left-color|border-left|border-image-width|border-image-source|border-image-slice|border-image-repeat|border-image-outset|border-image|border-color|border-collapse|border-bottom-width|border-bottom-style|border-bottom-right-radius|border-bottom-left-radius|border-bottom-color|border-bottom|border|background-size|background-repeat|background-position|background-origin|background-image|background-color|background-clip|background-attachment|background-blend-mode|background|backface-visibility|auto|animation-timing-function|animation-play-state|animation-name|animation-iteration-count|animation-fill-mode|animation-duration|animation-direction|animation-delay|animation|align-self|align-items|align-content)\\b",illegal:"[^\\s]"},{begin:"\\b(whitespace|wait|w-resize|visible|vertical-text|vertical-ideographic|uppercase|upper-roman|upper-alpha|underline|transparent|top|thin|thick|text|text-top|text-bottom|tb-rl|table-header-group|table-footer-group|sw-resize|super|strict|static|square|solid|small-caps|separate|se-resize|scroll|s-resize|rtl|row-resize|ridge|right|repeat|repeat-y|repeat-x|relative|progress|pointer|overline|outside|outset|oblique|nowrap|not-allowed|normal|none|nw-resize|no-repeat|no-drop|newspaper|ne-resize|n-resize|move|middle|medium|ltr|lr-tb|lowercase|lower-roman|lower-alpha|loose|list-item|line|line-through|line-edge|lighter|left|keep-all|justify|italic|inter-word|inter-ideograph|inside|inset|inline|inline-block|inherit|inactive|ideograph-space|ideograph-parenthesis|ideograph-numeric|ideograph-alpha|horizontal|hidden|help|hand|groove|fixed|ellipsis|e-resize|double|dotted|distribute|distribute-space|distribute-letter|distribute-all-lines|disc|disabled|default|decimal|dashed|crosshair|collapse|col-resize|circle|char|center|capitalize|break-word|break-all|bottom|both|bolder|bold|block|bidi-override|below|baseline|auto|always|all-scroll|absolute|table|table-cell)\\b"},{begin:":",end:";",contains:[t,i,e.CSS_NUMBER_MODE,e.QUOTE_STRING_MODE,e.APOS_STRING_MODE,{className:"meta",begin:"!important"}]},{begin:"@(page|font-face)",lexemes:"@[a-z-]+",keywords:"@page @font-face"},{begin:"@",end:"[{;]",returnBegin:!0,keywords:"and or not only",contains:[{begin:"@[a-z-]+",className:"keyword"},t,e.QUOTE_STRING_MODE,e.APOS_STRING_MODE,i,e.CSS_NUMBER_MODE]}]}}}());hljs.registerLanguage("r",function(){"use strict";return function(e){var n="([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*";return{name:"R",contains:[e.HASH_COMMENT_MODE,{begin:n,keywords:{$pattern:n,keyword:"function if in break next repeat else for return switch while try tryCatch stop warning require library attach detach source setMethod setGeneric setGroupGeneric setClass ...",literal:"NULL NA TRUE FALSE T F Inf NaN NA_integer_|10 NA_real_|10 NA_character_|10 NA_complex_|10"},relevance:0},{className:"number",begin:"0[xX][0-9a-fA-F]+[Li]?\\b",relevance:0},{className:"number",begin:"\\d+(?:[eE][+\\-]?\\d*)?L\\b",relevance:0},{className:"number",begin:"\\d+\\.(?!\\d)(?:i\\b)?",relevance:0},{className:"number",begin:"\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",relevance:0},{className:"number",begin:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",relevance:0},{begin:"`",end:"`",relevance:0},{className:"string",contains:[e.BACKSLASH_ESCAPE],variants:[{begin:'"',end:'"'},{begin:"'",end:"'"}]}]}}}());hljs.registerLanguage("sql",function(){"use strict";return function(e){var t=e.COMMENT("--","$");return{name:"SQL",case_insensitive:!0,illegal:/[<>{}*]/,contains:[{beginKeywords:"begin end start commit rollback savepoint lock alter create drop rename call delete do handler insert load replace select truncate update set show pragma grant merge describe use explain help declare prepare execute deallocate release unlock purge reset change stop analyze cache flush optimize repair kill install uninstall checksum restore check backup revoke comment values with",end:/;/,endsWithParent:!0,keywords:{$pattern:/[\w\.]+/,keyword:"as abort abs absolute acc acce accep accept access accessed accessible account acos action activate add addtime admin administer advanced advise aes_decrypt aes_encrypt after agent aggregate ali alia alias all allocate allow alter always analyze ancillary and anti any anydata anydataset anyschema anytype apply archive archived archivelog are as asc ascii asin assembly assertion associate asynchronous at atan atn2 attr attri attrib attribu attribut attribute attributes audit authenticated authentication authid authors auto autoallocate autodblink autoextend automatic availability avg backup badfile basicfile before begin beginning benchmark between bfile bfile_base big bigfile bin binary_double binary_float binlog bit_and bit_count bit_length bit_or bit_xor bitmap blob_base block blocksize body both bound bucket buffer_cache buffer_pool build bulk by byte byteordermark bytes cache caching call calling cancel capacity cascade cascaded case cast catalog category ceil ceiling chain change changed char_base char_length character_length characters characterset charindex charset charsetform charsetid check checksum checksum_agg child choose chr chunk class cleanup clear client clob clob_base clone close cluster_id cluster_probability cluster_set clustering coalesce coercibility col collate collation collect colu colum column column_value columns columns_updated comment commit compact compatibility compiled complete composite_limit compound compress compute concat concat_ws concurrent confirm conn connec connect connect_by_iscycle connect_by_isleaf connect_by_root connect_time connection consider consistent constant constraint constraints constructor container content contents context contributors controlfile conv convert convert_tz corr corr_k corr_s corresponding corruption cos cost count count_big counted covar_pop covar_samp cpu_per_call cpu_per_session crc32 create creation critical cross cube cume_dist curdate current current_date current_time current_timestamp current_user cursor curtime customdatum cycle data database databases datafile datafiles datalength date_add date_cache date_format date_sub dateadd datediff datefromparts datename datepart datetime2fromparts day day_to_second dayname dayofmonth dayofweek dayofyear days db_role_change dbtimezone ddl deallocate declare decode decompose decrement decrypt deduplicate def defa defau defaul default defaults deferred defi defin define degrees delayed delegate delete delete_all delimited demand dense_rank depth dequeue des_decrypt des_encrypt des_key_file desc descr descri describ describe descriptor deterministic diagnostics difference dimension direct_load directory disable disable_all disallow disassociate discardfile disconnect diskgroup distinct distinctrow distribute distributed div do document domain dotnet double downgrade drop dumpfile duplicate duration each edition editionable editions element ellipsis else elsif elt empty enable enable_all enclosed encode encoding encrypt end end-exec endian enforced engine engines enqueue enterprise entityescaping eomonth error errors escaped evalname evaluate event eventdata events except exception exceptions exchange exclude excluding execu execut execute exempt exists exit exp expire explain explode export export_set extended extent external external_1 external_2 externally extract failed failed_login_attempts failover failure far fast feature_set feature_value fetch field fields file file_name_convert filesystem_like_logging final finish first first_value fixed flash_cache flashback floor flush following follows for forall force foreign form forma format found found_rows freelist freelists freepools fresh from from_base64 from_days ftp full function general generated get get_format get_lock getdate getutcdate global global_name globally go goto grant grants greatest group group_concat group_id grouping grouping_id groups gtid_subtract guarantee guard handler hash hashkeys having hea head headi headin heading heap help hex hierarchy high high_priority hosts hour hours http id ident_current ident_incr ident_seed identified identity idle_time if ifnull ignore iif ilike ilm immediate import in include including increment index indexes indexing indextype indicator indices inet6_aton inet6_ntoa inet_aton inet_ntoa infile initial initialized initially initrans inmemory inner innodb input insert install instance instantiable instr interface interleaved intersect into invalidate invisible is is_free_lock is_ipv4 is_ipv4_compat is_not is_not_null is_used_lock isdate isnull isolation iterate java join json json_exists keep keep_duplicates key keys kill language large last last_day last_insert_id last_value lateral lax lcase lead leading least leaves left len lenght length less level levels library like like2 like4 likec limit lines link list listagg little ln load load_file lob lobs local localtime localtimestamp locate locator lock locked log log10 log2 logfile logfiles logging logical logical_reads_per_call logoff logon logs long loop low low_priority lower lpad lrtrim ltrim main make_set makedate maketime managed management manual map mapping mask master master_pos_wait match matched materialized max maxextents maximize maxinstances maxlen maxlogfiles maxloghistory maxlogmembers maxsize maxtrans md5 measures median medium member memcompress memory merge microsecond mid migration min minextents minimum mining minus minute minutes minvalue missing mod mode model modification modify module monitoring month months mount move movement multiset mutex name name_const names nan national native natural nav nchar nclob nested never new newline next nextval no no_write_to_binlog noarchivelog noaudit nobadfile nocheck nocompress nocopy nocycle nodelay nodiscardfile noentityescaping noguarantee nokeep nologfile nomapping nomaxvalue nominimize nominvalue nomonitoring none noneditionable nonschema noorder nopr nopro noprom nopromp noprompt norely noresetlogs noreverse normal norowdependencies noschemacheck noswitch not nothing notice notnull notrim novalidate now nowait nth_value nullif nulls num numb numbe nvarchar nvarchar2 object ocicoll ocidate ocidatetime ociduration ociinterval ociloblocator ocinumber ociref ocirefcursor ocirowid ocistring ocitype oct octet_length of off offline offset oid oidindex old on online only opaque open operations operator optimal optimize option optionally or oracle oracle_date oradata ord ordaudio orddicom orddoc order ordimage ordinality ordvideo organization orlany orlvary out outer outfile outline output over overflow overriding package pad parallel parallel_enable parameters parent parse partial partition partitions pascal passing password password_grace_time password_lock_time password_reuse_max password_reuse_time password_verify_function patch path patindex pctincrease pctthreshold pctused pctversion percent percent_rank percentile_cont percentile_disc performance period period_add period_diff permanent physical pi pipe pipelined pivot pluggable plugin policy position post_transaction pow power pragma prebuilt precedes preceding precision prediction prediction_cost prediction_details prediction_probability prediction_set prepare present preserve prior priority private private_sga privileges procedural procedure procedure_analyze processlist profiles project prompt protection public publishingservername purge quarter query quick quiesce quota quotename radians raise rand range rank raw read reads readsize rebuild record records recover recovery recursive recycle redo reduced ref reference referenced references referencing refresh regexp_like register regr_avgx regr_avgy regr_count regr_intercept regr_r2 regr_slope regr_sxx regr_sxy reject rekey relational relative relaylog release release_lock relies_on relocate rely rem remainder rename repair repeat replace replicate replication required reset resetlogs resize resource respect restore restricted result result_cache resumable resume retention return returning returns reuse reverse revoke right rlike role roles rollback rolling rollup round row row_count rowdependencies rowid rownum rows rtrim rules safe salt sample save savepoint sb1 sb2 sb4 scan schema schemacheck scn scope scroll sdo_georaster sdo_topo_geometry search sec_to_time second seconds section securefile security seed segment select self semi sequence sequential serializable server servererror session session_user sessions_per_user set sets settings sha sha1 sha2 share shared shared_pool short show shrink shutdown si_averagecolor si_colorhistogram si_featurelist si_positionalcolor si_stillimage si_texture siblings sid sign sin size size_t sizes skip slave sleep smalldatetimefromparts smallfile snapshot some soname sort soundex source space sparse spfile split sql sql_big_result sql_buffer_result sql_cache sql_calc_found_rows sql_small_result sql_variant_property sqlcode sqldata sqlerror sqlname sqlstate sqrt square standalone standby start starting startup statement static statistics stats_binomial_test stats_crosstab stats_ks_test stats_mode stats_mw_test stats_one_way_anova stats_t_test_ stats_t_test_indep stats_t_test_one stats_t_test_paired stats_wsr_test status std stddev stddev_pop stddev_samp stdev stop storage store stored str str_to_date straight_join strcmp strict string struct stuff style subdate subpartition subpartitions substitutable substr substring subtime subtring_index subtype success sum suspend switch switchoffset switchover sync synchronous synonym sys sys_xmlagg sysasm sysaux sysdate sysdatetimeoffset sysdba sysoper system system_user sysutcdatetime table tables tablespace tablesample tan tdo template temporary terminated tertiary_weights test than then thread through tier ties time time_format time_zone timediff timefromparts timeout timestamp timestampadd timestampdiff timezone_abbr timezone_minute timezone_region to to_base64 to_date to_days to_seconds todatetimeoffset trace tracking transaction transactional translate translation treat trigger trigger_nestlevel triggers trim truncate try_cast try_convert try_parse type ub1 ub2 ub4 ucase unarchived unbounded uncompress under undo unhex unicode uniform uninstall union unique unix_timestamp unknown unlimited unlock unnest unpivot unrecoverable unsafe unsigned until untrusted unusable unused update updated upgrade upped upper upsert url urowid usable usage use use_stored_outlines user user_data user_resources users using utc_date utc_timestamp uuid uuid_short validate validate_password_strength validation valist value values var var_samp varcharc vari varia variab variabl variable variables variance varp varraw varrawc varray verify version versions view virtual visible void wait wallet warning warnings week weekday weekofyear wellformed when whene whenev wheneve whenever where while whitespace window with within without work wrapped xdb xml xmlagg xmlattributes xmlcast xmlcolattval xmlelement xmlexists xmlforest xmlindex xmlnamespaces xmlpi xmlquery xmlroot xmlschema xmlserialize xmltable xmltype xor year year_to_month years yearweek",literal:"true false null unknown",built_in:"array bigint binary bit blob bool boolean char character date dec decimal float int int8 integer interval number numeric real record serial serial8 smallint text time timestamp tinyint varchar varchar2 varying void"},contains:[{className:"string",begin:"'",end:"'",contains:[{begin:"''"}]},{className:"string",begin:'"',end:'"',contains:[{begin:'""'}]},{className:"string",begin:"`",end:"`"},e.C_NUMBER_MODE,e.C_BLOCK_COMMENT_MODE,t,e.HASH_COMMENT_MODE]},e.C_BLOCK_COMMENT_MODE,t,e.HASH_COMMENT_MODE]}}}());hljs.registerLanguage("c",function(){"use strict";return function(e){var n=e.getLanguage("c-like").rawDefinition();return n.name="C",n.aliases=["c","h"],n}}());hljs.registerLanguage("json",function(){"use strict";return function(n){var e={literal:"true false null"},i=[n.C_LINE_COMMENT_MODE,n.C_BLOCK_COMMENT_MODE],t=[n.QUOTE_STRING_MODE,n.C_NUMBER_MODE],a={end:",",endsWithParent:!0,excludeEnd:!0,contains:t,keywords:e},l={begin:"{",end:"}",contains:[{className:"attr",begin:/"/,end:/"/,contains:[n.BACKSLASH_ESCAPE],illegal:"\\n"},n.inherit(a,{begin:/:/})].concat(i),illegal:"\\S"},s={begin:"\\[",end:"\\]",contains:[n.inherit(a)],illegal:"\\S"};return t.push(l,s),i.forEach((function(n){t.push(n)})),{name:"JSON",contains:t,keywords:e,illegal:"\\S"}}}());hljs.registerLanguage("python-repl",function(){"use strict";return function(n){return{aliases:["pycon"],contains:[{className:"meta",starts:{end:/ |$/,starts:{end:"$",subLanguage:"python"}},variants:[{begin:/^>>>(?=[ ]|$)/},{begin:/^\.\.\.(?=[ ]|$)/}]}]}}}());hljs.registerLanguage("markdown",function(){"use strict";return function(n){const e={begin:"<",end:">",subLanguage:"xml",relevance:0},a={begin:"\\[.+?\\][\\(\\[].*?[\\)\\]]",returnBegin:!0,contains:[{className:"string",begin:"\\[",end:"\\]",excludeBegin:!0,returnEnd:!0,relevance:0},{className:"link",begin:"\\]\\(",end:"\\)",excludeBegin:!0,excludeEnd:!0},{className:"symbol",begin:"\\]\\[",end:"\\]",excludeBegin:!0,excludeEnd:!0}],relevance:10},i={className:"strong",contains:[],variants:[{begin:/_{2}/,end:/_{2}/},{begin:/\*{2}/,end:/\*{2}/}]},s={className:"emphasis",contains:[],variants:[{begin:/\*(?!\*)/,end:/\*/},{begin:/_(?!_)/,end:/_/,relevance:0}]};i.contains.push(s),s.contains.push(i);var c=[e,a];return i.contains=i.contains.concat(c),s.contains=s.contains.concat(c),{name:"Markdown",aliases:["md","mkdown","mkd"],contains:[{className:"section",variants:[{begin:"^#{1,6}",end:"$",contains:c=c.concat(i,s)},{begin:"(?=^.+?\\n[=-]{2,}$)",contains:[{begin:"^[=-]*$"},{begin:"^",end:"\\n",contains:c}]}]},e,{className:"bullet",begin:"^[ \t]*([*+-]|(\\d+\\.))(?=\\s+)",end:"\\s+",excludeEnd:!0},i,s,{className:"quote",begin:"^>\\s+",contains:c,end:"$"},{className:"code",variants:[{begin:"(`{3,})(.|\\n)*?\\1`*[ ]*"},{begin:"(~{3,})(.|\\n)*?\\1~*[ ]*"},{begin:"```",end:"```+[ ]*$"},{begin:"~~~",end:"~~~+[ ]*$"},{begin:"`.+?`"},{begin:"(?=^( {4}|\\t))",contains:[{begin:"^( {4}|\\t)",end:"(\\n)$"}],relevance:0}]},{begin:"^[-\\*]{3,}",end:"$"},a,{begin:/^\[[^\n]+\]:/,returnBegin:!0,contains:[{className:"symbol",begin:/\[/,end:/\]/,excludeBegin:!0,excludeEnd:!0},{className:"link",begin:/:\s*/,end:/$/,excludeBegin:!0}]}]}}}());hljs.registerLanguage("javascript",function(){"use strict";const e=["as","in","of","if","for","while","finally","var","new","function","do","return","void","else","break","catch","instanceof","with","throw","case","default","try","switch","continue","typeof","delete","let","yield","const","class","debugger","async","await","static","import","from","export","extends"],n=["true","false","null","undefined","NaN","Infinity"],a=[].concat(["setInterval","setTimeout","clearInterval","clearTimeout","require","exports","eval","isFinite","isNaN","parseFloat","parseInt","decodeURI","decodeURIComponent","encodeURI","encodeURIComponent","escape","unescape"],["arguments","this","super","console","window","document","localStorage","module","global"],["Intl","DataView","Number","Math","Date","String","RegExp","Object","Function","Boolean","Error","Symbol","Set","Map","WeakSet","WeakMap","Proxy","Reflect","JSON","Promise","Float64Array","Int16Array","Int32Array","Int8Array","Uint16Array","Uint32Array","Float32Array","Array","Uint8Array","Uint8ClampedArray","ArrayBuffer"],["EvalError","InternalError","RangeError","ReferenceError","SyntaxError","TypeError","URIError"]);function s(e){return r("(?=",e,")")}function r(...e){return e.map(e=>(function(e){return e?"string"==typeof e?e:e.source:null})(e)).join("")}return function(t){var i="[A-Za-z$_][0-9A-Za-z$_]*",c={begin:/<[A-Za-z0-9\\._:-]+/,end:/\/[A-Za-z0-9\\._:-]+>|\/>/},o={$pattern:"[A-Za-z$_][0-9A-Za-z$_]*",keyword:e.join(" "),literal:n.join(" "),built_in:a.join(" ")},l={className:"number",variants:[{begin:"\\b(0[bB][01]+)n?"},{begin:"\\b(0[oO][0-7]+)n?"},{begin:t.C_NUMBER_RE+"n?"}],relevance:0},E={className:"subst",begin:"\\$\\{",end:"\\}",keywords:o,contains:[]},d={begin:"html`",end:"",starts:{end:"`",returnEnd:!1,contains:[t.BACKSLASH_ESCAPE,E],subLanguage:"xml"}},g={begin:"css`",end:"",starts:{end:"`",returnEnd:!1,contains:[t.BACKSLASH_ESCAPE,E],subLanguage:"css"}},u={className:"string",begin:"`",end:"`",contains:[t.BACKSLASH_ESCAPE,E]};E.contains=[t.APOS_STRING_MODE,t.QUOTE_STRING_MODE,d,g,u,l,t.REGEXP_MODE];var b=E.contains.concat([{begin:/\(/,end:/\)/,contains:["self"].concat(E.contains,[t.C_BLOCK_COMMENT_MODE,t.C_LINE_COMMENT_MODE])},t.C_BLOCK_COMMENT_MODE,t.C_LINE_COMMENT_MODE]),_={className:"params",begin:/\(/,end:/\)/,excludeBegin:!0,excludeEnd:!0,contains:b};return{name:"JavaScript",aliases:["js","jsx","mjs","cjs"],keywords:o,contains:[t.SHEBANG({binary:"node",relevance:5}),{className:"meta",relevance:10,begin:/^\s*['"]use (strict|asm)['"]/},t.APOS_STRING_MODE,t.QUOTE_STRING_MODE,d,g,u,t.C_LINE_COMMENT_MODE,t.COMMENT("/\\*\\*","\\*/",{relevance:0,contains:[{className:"doctag",begin:"@[A-Za-z]+",contains:[{className:"type",begin:"\\{",end:"\\}",relevance:0},{className:"variable",begin:i+"(?=\\s*(-)|$)",endsParent:!0,relevance:0},{begin:/(?=[^\n])\s/,relevance:0}]}]}),t.C_BLOCK_COMMENT_MODE,l,{begin:r(/[{,\n]\s*/,s(r(/(((\/\/.*)|(\/\*(.|\n)*\*\/))\s*)*/,i+"\\s*:"))),relevance:0,contains:[{className:"attr",begin:i+s("\\s*:"),relevance:0}]},{begin:"("+t.RE_STARTERS_RE+"|\\b(case|return|throw)\\b)\\s*",keywords:"return throw case",contains:[t.C_LINE_COMMENT_MODE,t.C_BLOCK_COMMENT_MODE,t.REGEXP_MODE,{className:"function",begin:"(\\([^(]*(\\([^(]*(\\([^(]*\\))?\\))?\\)|"+t.UNDERSCORE_IDENT_RE+")\\s*=>",returnBegin:!0,end:"\\s*=>",contains:[{className:"params",variants:[{begin:t.UNDERSCORE_IDENT_RE},{className:null,begin:/\(\s*\)/,skip:!0},{begin:/\(/,end:/\)/,excludeBegin:!0,excludeEnd:!0,keywords:o,contains:b}]}]},{begin:/,/,relevance:0},{className:"",begin:/\s/,end:/\s*/,skip:!0},{variants:[{begin:"<>",end:""},{begin:c.begin,end:c.end}],subLanguage:"xml",contains:[{begin:c.begin,end:c.end,skip:!0,contains:["self"]}]}],relevance:0},{className:"function",beginKeywords:"function",end:/\{/,excludeEnd:!0,contains:[t.inherit(t.TITLE_MODE,{begin:i}),_],illegal:/\[|%/},{begin:/\$[(.]/},t.METHOD_GUARD,{className:"class",beginKeywords:"class",end:/[{;=]/,excludeEnd:!0,illegal:/[:"\[\]]/,contains:[{beginKeywords:"extends"},t.UNDERSCORE_TITLE_MODE]},{beginKeywords:"constructor",end:/\{/,excludeEnd:!0},{begin:"(get|set)\\s+(?="+i+"\\()",end:/{/,keywords:"get set",contains:[t.inherit(t.TITLE_MODE,{begin:i}),{begin:/\(\)/},_]}],illegal:/#(?!!)/}}}());hljs.registerLanguage("typescript",function(){"use strict";const e=["as","in","of","if","for","while","finally","var","new","function","do","return","void","else","break","catch","instanceof","with","throw","case","default","try","switch","continue","typeof","delete","let","yield","const","class","debugger","async","await","static","import","from","export","extends"],n=["true","false","null","undefined","NaN","Infinity"],a=[].concat(["setInterval","setTimeout","clearInterval","clearTimeout","require","exports","eval","isFinite","isNaN","parseFloat","parseInt","decodeURI","decodeURIComponent","encodeURI","encodeURIComponent","escape","unescape"],["arguments","this","super","console","window","document","localStorage","module","global"],["Intl","DataView","Number","Math","Date","String","RegExp","Object","Function","Boolean","Error","Symbol","Set","Map","WeakSet","WeakMap","Proxy","Reflect","JSON","Promise","Float64Array","Int16Array","Int32Array","Int8Array","Uint16Array","Uint32Array","Float32Array","Array","Uint8Array","Uint8ClampedArray","ArrayBuffer"],["EvalError","InternalError","RangeError","ReferenceError","SyntaxError","TypeError","URIError"]);return function(r){var t={$pattern:"[A-Za-z$_][0-9A-Za-z$_]*",keyword:e.concat(["type","namespace","typedef","interface","public","private","protected","implements","declare","abstract","readonly"]).join(" "),literal:n.join(" "),built_in:a.concat(["any","void","number","boolean","string","object","never","enum"]).join(" ")},s={className:"meta",begin:"@[A-Za-z$_][0-9A-Za-z$_]*"},i={className:"number",variants:[{begin:"\\b(0[bB][01]+)n?"},{begin:"\\b(0[oO][0-7]+)n?"},{begin:r.C_NUMBER_RE+"n?"}],relevance:0},o={className:"subst",begin:"\\$\\{",end:"\\}",keywords:t,contains:[]},c={begin:"html`",end:"",starts:{end:"`",returnEnd:!1,contains:[r.BACKSLASH_ESCAPE,o],subLanguage:"xml"}},l={begin:"css`",end:"",starts:{end:"`",returnEnd:!1,contains:[r.BACKSLASH_ESCAPE,o],subLanguage:"css"}},E={className:"string",begin:"`",end:"`",contains:[r.BACKSLASH_ESCAPE,o]};o.contains=[r.APOS_STRING_MODE,r.QUOTE_STRING_MODE,c,l,E,i,r.REGEXP_MODE];var d={begin:"\\(",end:/\)/,keywords:t,contains:["self",r.QUOTE_STRING_MODE,r.APOS_STRING_MODE,r.NUMBER_MODE]},u={className:"params",begin:/\(/,end:/\)/,excludeBegin:!0,excludeEnd:!0,keywords:t,contains:[r.C_LINE_COMMENT_MODE,r.C_BLOCK_COMMENT_MODE,s,d]};return{name:"TypeScript",aliases:["ts"],keywords:t,contains:[r.SHEBANG(),{className:"meta",begin:/^\s*['"]use strict['"]/},r.APOS_STRING_MODE,r.QUOTE_STRING_MODE,c,l,E,r.C_LINE_COMMENT_MODE,r.C_BLOCK_COMMENT_MODE,i,{begin:"("+r.RE_STARTERS_RE+"|\\b(case|return|throw)\\b)\\s*",keywords:"return throw case",contains:[r.C_LINE_COMMENT_MODE,r.C_BLOCK_COMMENT_MODE,r.REGEXP_MODE,{className:"function",begin:"(\\([^(]*(\\([^(]*(\\([^(]*\\))?\\))?\\)|"+r.UNDERSCORE_IDENT_RE+")\\s*=>",returnBegin:!0,end:"\\s*=>",contains:[{className:"params",variants:[{begin:r.UNDERSCORE_IDENT_RE},{className:null,begin:/\(\s*\)/,skip:!0},{begin:/\(/,end:/\)/,excludeBegin:!0,excludeEnd:!0,keywords:t,contains:d.contains}]}]}],relevance:0},{className:"function",beginKeywords:"function",end:/[\{;]/,excludeEnd:!0,keywords:t,contains:["self",r.inherit(r.TITLE_MODE,{begin:"[A-Za-z$_][0-9A-Za-z$_]*"}),u],illegal:/%/,relevance:0},{beginKeywords:"constructor",end:/[\{;]/,excludeEnd:!0,contains:["self",u]},{begin:/module\./,keywords:{built_in:"module"},relevance:0},{beginKeywords:"module",end:/\{/,excludeEnd:!0},{beginKeywords:"interface",end:/\{/,excludeEnd:!0,keywords:"interface extends"},{begin:/\$[(.]/},{begin:"\\."+r.IDENT_RE,relevance:0},s,d]}}}());hljs.registerLanguage("plaintext",function(){"use strict";return function(t){return{name:"Plain text",aliases:["text","txt"],disableAutodetect:!0}}}());hljs.registerLanguage("less",function(){"use strict";return function(e){var n="([\\w-]+|@{[\\w-]+})",a=[],s=[],t=function(e){return{className:"string",begin:"~?"+e+".*?"+e}},r=function(e,n,a){return{className:e,begin:n,relevance:a}},i={begin:"\\(",end:"\\)",contains:s,relevance:0};s.push(e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE,t("'"),t('"'),e.CSS_NUMBER_MODE,{begin:"(url|data-uri)\\(",starts:{className:"string",end:"[\\)\\n]",excludeEnd:!0}},r("number","#[0-9A-Fa-f]+\\b"),i,r("variable","@@?[\\w-]+",10),r("variable","@{[\\w-]+}"),r("built_in","~?`[^`]*?`"),{className:"attribute",begin:"[\\w-]+\\s*:",end:":",returnBegin:!0,excludeEnd:!0},{className:"meta",begin:"!important"});var c=s.concat({begin:"{",end:"}",contains:a}),l={beginKeywords:"when",endsWithParent:!0,contains:[{beginKeywords:"and not"}].concat(s)},o={begin:n+"\\s*:",returnBegin:!0,end:"[;}]",relevance:0,contains:[{className:"attribute",begin:n,end:":",excludeEnd:!0,starts:{endsWithParent:!0,illegal:"[<=$]",relevance:0,contains:s}}]},g={className:"keyword",begin:"@(import|media|charset|font-face|(-[a-z]+-)?keyframes|supports|document|namespace|page|viewport|host)\\b",starts:{end:"[;{}]",returnEnd:!0,contains:s,relevance:0}},d={className:"variable",variants:[{begin:"@[\\w-]+\\s*:",relevance:15},{begin:"@[\\w-]+"}],starts:{end:"[;}]",returnEnd:!0,contains:c}},b={variants:[{begin:"[\\.#:&\\[>]",end:"[;{}]"},{begin:n,end:"{"}],returnBegin:!0,returnEnd:!0,illegal:"[<='$\"]",relevance:0,contains:[e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE,l,r("keyword","all\\b"),r("variable","@{[\\w-]+}"),r("selector-tag",n+"%?",0),r("selector-id","#"+n),r("selector-class","\\."+n,0),r("selector-tag","&",0),{className:"selector-attr",begin:"\\[",end:"\\]"},{className:"selector-pseudo",begin:/:(:)?[a-zA-Z0-9\_\-\+\(\)"'.]+/},{begin:"\\(",end:"\\)",contains:c},{begin:"!important"}]};return a.push(e.C_LINE_COMMENT_MODE,e.C_BLOCK_COMMENT_MODE,g,d,o,b),{name:"Less",case_insensitive:!0,illegal:"[=>'/<($\"]",contains:a}}}());hljs.registerLanguage("lua",function(){"use strict";return function(e){var t={begin:"\\[=*\\[",end:"\\]=*\\]",contains:["self"]},a=[e.COMMENT("--(?!\\[=*\\[)","$"),e.COMMENT("--\\[=*\\[","\\]=*\\]",{contains:[t],relevance:10})];return{name:"Lua",keywords:{$pattern:e.UNDERSCORE_IDENT_RE,literal:"true false nil",keyword:"and break do else elseif end for goto if in local not or repeat return then until while",built_in:"_G _ENV _VERSION __index __newindex __mode __call __metatable __tostring __len __gc __add __sub __mul __div __mod __pow __concat __unm __eq __lt __le assert collectgarbage dofile error getfenv getmetatable ipairs load loadfile loadstring module next pairs pcall print rawequal rawget rawset require select setfenv setmetatable tonumber tostring type unpack xpcall arg self coroutine resume yield status wrap create running debug getupvalue debug sethook getmetatable gethook setmetatable setlocal traceback setfenv getinfo setupvalue getlocal getregistry getfenv io lines write close flush open output type read stderr stdin input stdout popen tmpfile math log max acos huge ldexp pi cos tanh pow deg tan cosh sinh random randomseed frexp ceil floor rad abs sqrt modf asin min mod fmod log10 atan2 exp sin atan os exit setlocale date getenv difftime remove time clock tmpname rename execute package preload loadlib loaded loaders cpath config path seeall string sub upper len gfind rep find match char dump gmatch reverse byte format gsub lower table setn insert getn foreachi maxn foreach concat sort remove"},contains:a.concat([{className:"function",beginKeywords:"function",end:"\\)",contains:[e.inherit(e.TITLE_MODE,{begin:"([_a-zA-Z]\\w*\\.)*([_a-zA-Z]\\w*:)?[_a-zA-Z]\\w*"}),{className:"params",begin:"\\(",endsWithParent:!0,contains:a}].concat(a)},e.C_NUMBER_MODE,e.APOS_STRING_MODE,e.QUOTE_STRING_MODE,{className:"string",begin:"\\[=*\\[",end:"\\]=*\\]",contains:[t],relevance:5}])}}}()); diff --git a/images/cli_coreutils_pt.png b/images/cli_coreutils_pt.png new file mode 100644 index 0000000..09aefeb Binary files /dev/null and b/images/cli_coreutils_pt.png differ diff --git a/images/info.svg b/images/info.svg new file mode 100644 index 0000000..c5e4cd4 --- /dev/null +++ b/images/info.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/images/warning.svg b/images/warning.svg new file mode 100644 index 0000000..b3a38c0 --- /dev/null +++ b/images/warning.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/index.html b/index.html new file mode 100644 index 0000000..6574265 --- /dev/null +++ b/index.html @@ -0,0 +1,31 @@ +Cover - CLI text processing with GNU Coreutils
\ No newline at end of file diff --git a/introduction.html b/introduction.html new file mode 100644 index 0000000..ee51a51 --- /dev/null +++ b/introduction.html @@ -0,0 +1,31 @@ +Introduction - CLI text processing with GNU Coreutils

Introduction

I've been using Linux since 2007, but it took me ten more years to really explore coreutils when I wrote tutorials for the Command Line Text Processing repository.

Any beginner learning Linux command line tools would come across the cat command within the first week. Sooner or later, they'll come to know popular text processing tools like grep, head, tail, tr, sort, etc. If you were like me, you'd come across sed and awk, shudder at their complexity and prefer to use a scripting language like Perl and text editors like Vim instead (don't worry, I've already corrected that mistake).

Knowing power tools like grep, sed and awk can help solve most of your text processing needs. So, why would you want to learn text processing tools from the coreutils package? The biggest motivation would be faster execution since these tools are optimized for the use cases they solve. And there's always the advantage of not having to write code (and test that solution) if there's an existing tool to solve the problem.

This book will teach you more than twenty of such specialized text processing tools provided by the GNU coreutils package. Plenty of examples and exercise are provided to make it easier to understand a particular tool and its various features.

Writing a book always has a few pleasant surprises for me. For this one, it was discovering a sort option for calendar months, regular expressions in the tac and nl commands, etc.

Installation

On a GNU/Linux based OS, you are most likely to already have GNU coreutils installed. This book covers the version 9.1 of the coreutils package. To install a newer/particular version, see the coreutils download section for details.

If you are not using a Linux distribution, you may be able to access coreutils using these options:

Documentation

It is always a good idea to know where to find the documentation. From the command line, you can use the man and info commands for brief manuals and full documentation respectively. I prefer using the online GNU coreutils manual which feels much easier to use and navigate.

See also:

\ No newline at end of file diff --git a/join.html b/join.html new file mode 100644 index 0000000..9bb0776 --- /dev/null +++ b/join.html @@ -0,0 +1,361 @@ +join - CLI text processing with GNU Coreutils

join

The join command helps you to combine lines from two files based on a common field. This works best when the input is already sorted by that field.

Default join

By default, join combines two files based on the first field content (also referred as key). Only the lines with common keys will be part of the output.

The key field will be displayed first in the output (this distinction will come into play if the first field isn't the key). Rest of the line will have the remaining fields from the first and second files, in that order. One or more blanks (space or tab) will be considered as the input field separator and a single space will be used as the output field separator. If present, blank characters at the start of the input lines will be ignored.

# sample sorted input files
+$ cat shopping_jan.txt
+apple   10
+banana  20
+soap    3
+tshirt  3
+$ cat shopping_feb.txt
+banana  15
+fig     100
+pen     2
+soap    1
+
+# combine common lines based on the first field
+$ join shopping_jan.txt shopping_feb.txt
+banana 20 15
+soap 3 1
+

If a field value is present multiple times in the same input file, all possible combinations will be present in the output. As shown below, join will also ensure to add a final newline character even if it wasn't present in the input.

$ join <(printf 'a f1_x\na f1_y') <(printf 'a f2_x\na f2_y')
+a f1_x f2_x
+a f1_x f2_y
+a f1_y f2_x
+a f1_y f2_y
+

info Note that the collating order used for join should be same as the one used to sort the input files. Use join -i to ignore case, similar to sort -f usage.

info If the input files are not sorted, join will produce an error if there are unpairable lines. You can use the --nocheck-order option to ignore this error. However, as per the documentation, this option "is not guaranteed to produce any particular output."

Non-matching lines

By default, only the lines having common keys are part of the output. You can use the -a option to also include the non-matching lines from the input files. Use 1 and 2 as the argument for the first and second file respectively. You'll later see how to fill missing fields with a custom string.

# includes non-matching lines from the first file
+$ join -a1 shopping_jan.txt shopping_feb.txt
+apple 10
+banana 20 15
+soap 3 1
+tshirt 3
+
+# includes non-matching lines from both the files
+$ join -a1 -a2 shopping_jan.txt shopping_feb.txt
+apple 10
+banana 20 15
+fig 100
+pen 2
+soap 3 1
+tshirt 3
+

If you use -v instead of -a, the output will have only the non-matching lines.

$ join -v2 shopping_jan.txt shopping_feb.txt
+fig 100
+pen 2
+
+$ join -v1 -v2 shopping_jan.txt shopping_feb.txt
+apple 10
+fig 100
+pen 2
+tshirt 3
+

Change field separator

You can use the -t option to specify a single byte character as the field separator. The output field separator will be same as the value used for the -t option. Use \0 to specify NUL as the separator. Empty string will cause entire input line content to be considered as keys. Depending on your shell you can use ANSI-C quoting to use escapes like \t instead of a literal tab character.

$ cat marks.csv
+ECE,Raj,53
+ECE,Joel,72
+EEE,Moi,68
+CSE,Surya,81
+EEE,Raj,88
+CSE,Moi,62
+EEE,Tia,72
+ECE,Om,92
+CSE,Amy,67
+$ cat dept.txt
+CSE
+ECE
+
+# get all lines from marks.csv based on the first field keys in dept.txt
+$ join -t, <(sort marks.csv) dept.txt
+CSE,Amy,67
+CSE,Moi,62
+CSE,Surya,81
+ECE,Joel,72
+ECE,Om,92
+ECE,Raj,53
+

Files with headers

Use the --header option to ignore first lines of both the input files from sorting consideration. Without this option, the join command might still work correctly if unpairable lines aren't found, but it is preferable to use --header when applicable. This option will also help when --check-order option is active.

$ cat report_1.csv
+Name,Maths,Physics
+Amy,78,95
+Moi,88,75
+Raj,67,76
+$ cat report_2.csv
+Name,Chemistry
+Amy,85
+Joel,78
+Raj,72
+
+$ join --check-order -t, report_1.csv report_2.csv
+join: report_1.csv:2: is not sorted: Amy,78,95
+$ join --check-order --header -t, report_1.csv report_2.csv
+Name,Maths,Physics,Chemistry
+Amy,78,95,85
+Raj,67,76,72
+

Change key field

By default, the first field of both the input files are used to combine the lines. You can use -1 and -2 options followed by a field number to specify a different field number. You can use the -j option if the field number is the same for both the files.

Recall that the key field is the first field in the output. You'll later see how to customize the output field order.

$ cat names.txt
+Amy
+Raj
+Tia
+
+# combine based on the second field of the first file
+# and the first field of the second file (default)
+$ join -t, -1 2 <(sort -t, -k2,2 marks.csv) names.txt
+Amy,CSE,67
+Raj,ECE,53
+Raj,EEE,88
+Tia,EEE,72
+

Customize output field list

Use the -o option to customize the fields required in the output and their order. Especially useful when the first field isn't the key. Each output field is specified as file number followed by a . character and then the field number. You can specify multiple fields separated by a , character. As a special case, you can use 0 to indicate the key field.

# output field order is 1st, 2nd and 3rd fields from the first file
+$ join -t, -1 2 -o 1.1,1.2,1.3 <(sort -t, -k2,2 marks.csv) names.txt
+CSE,Amy,67
+ECE,Raj,53
+EEE,Raj,88
+EEE,Tia,72
+
+# 1st field from the first file, 2nd field from the second file
+# and then 2nd and 3rd fields from the first file
+$ join --header -t, -o 1.1,2.2,1.2,1.3 report_1.csv report_2.csv
+Name,Chemistry,Maths,Physics
+Amy,85,78,95
+Raj,72,67,76
+

Same number of output fields

If you use auto as the argument for the -o option, first line of both the input files will be used to determine the number of output fields. If the other lines have extra fields, they will be discarded.

$ join <(printf 'a 1 2\nb p q r') <(printf 'a 3 4\nb x y z')
+a 1 2 3 4
+b p q r x y z
+
+$ join -o auto <(printf 'a 1 2\nb p q r') <(printf 'a 3 4\nb x y z')
+a 1 2 3 4
+b p q x y
+

If the other lines have lesser number of fields, the -e option will determine the string to be used as a filler (empty string is the default).

# the second line has two empty fields
+$ join -o auto <(printf 'a 1 2\nb p') <(printf 'a 3 4\nb x')
+a 1 2 3 4
+b p  x 
+
+$ join -o auto -e '-' <(printf 'a 1 2\nb p') <(printf 'a 3 4\nb x')
+a 1 2 3 4
+b p - x -
+

As promised earlier, here are some examples of filling fields for non-matching lines:

$ join -o auto -a1 -e 'NA' shopping_jan.txt shopping_feb.txt
+apple 10 NA
+banana 20 15
+soap 3 1
+tshirt 3 NA
+
+$ join -o auto -a1 -a2 -e 'NA' shopping_jan.txt shopping_feb.txt
+apple 10 NA
+banana 20 15
+fig NA 100
+pen NA 2
+soap 3 1
+tshirt 3 NA
+

Set operations

This section covers whole line set operations you can perform on already sorted input files. Equivalent sort and uniq solutions will also be mentioned as comments (useful for unsorted inputs). Assume that there are no duplicate lines within an input file.

These two sorted input files will be used for the examples to follow:

$ paste colors_1.txt colors_2.txt
+Blue    Black
+Brown   Blue
+Orange  Green
+Purple  Orange
+Red     Pink
+Teal    Red
+White   White
+

Here's how you can get union and symmetric difference results. Recall that -t '' will cause the entire input line content to be considered as keys.

# union
+# unsorted input: sort -u colors_1.txt colors_2.txt
+$ join -t '' -a1 -a2 colors_1.txt colors_2.txt
+Black
+Blue
+Brown
+Green
+Orange
+Pink
+Purple
+Red
+Teal
+White
+
+# symmetric difference
+# unsorted input: sort colors_1.txt colors_2.txt | uniq -u
+$ join -t '' -v1 -v2 colors_1.txt colors_2.txt
+Black
+Brown
+Green
+Pink
+Purple
+Teal
+

Here's how you can get intersection and difference results. The equivalent comm solutions for sorted input is also mentioned in the comments.

# intersection, same as: comm -12 colors_1.txt colors_2.txt
+# unsorted input: sort colors_1.txt colors_2.txt | uniq -d
+$ join -t '' colors_1.txt colors_2.txt
+Blue
+Orange
+Red
+White
+
+# difference, same as: comm -13 colors_1.txt colors_2.txt
+# unsorted input: sort colors_1.txt colors_1.txt colors_2.txt | uniq -u
+$ join -t '' -v2 colors_1.txt colors_2.txt
+Black
+Green
+Pink
+
+# difference, same as: comm -23 colors_1.txt colors_2.txt
+# unsorted input: sort colors_1.txt colors_2.txt colors_2.txt | uniq -u
+$ join -t '' -v1 colors_1.txt colors_2.txt
+Brown
+Purple
+Teal
+

As mentioned before, join will display all the combinations if there are duplicate entries. Here's an example to show the differences between sort, comm and join solutions for displaying common lines:

$ paste list_1.txt list_2.txt
+apple   cherry
+banana  cherry
+cherry  mango
+cherry  papaya
+cherry  
+cherry  
+
+# only one entry per common line
+$ sort list_1.txt list_2.txt | uniq -d
+cherry
+
+# minimum of 'no. of entries in file1' and 'no. of entries in file2'
+$ comm -12 list_1.txt list_2.txt
+cherry
+cherry
+
+# 'no. of entries in file1' multiplied by 'no. of entries in file2'
+$ join -t '' list_1.txt list_2.txt
+cherry
+cherry
+cherry
+cherry
+cherry
+cherry
+cherry
+cherry
+

NUL separator

Use the -z option if you want to use NUL character as the line separator. In this scenario, join will ensure to add a final NUL character even if not present in the input.

$ join -z <(printf 'a 1\0b x') <(printf 'a 2\0b y') | cat -v
+a 1 2^@b x y^@
+

Alternatives

Here are some alternate commands you can explore if join isn't enough to solve your task. These alternatives do not require input to be sorted.

Exercises

info The exercises directory has all the files used in this section.

info Assume that the input files are already sorted for these exercises.

1) Use appropriate options to get the expected outputs shown below.

# no output
+$ join <(printf 'apple 2\nfig 5') <(printf 'Fig 10\nmango 4')
+
+# expected output 1
+##### add your solution here
+fig 5 10
+
+# expected output 2
+##### add your solution here
+apple 2
+fig 5 10
+mango 4
+

2) Use the join command to display only the non-matching lines based on the first field.

$ cat j1.txt
+apple   2
+fig     5
+lemon   10
+tomato  22
+$ cat j2.txt
+almond  33
+fig     115
+mango   20
+pista   42
+
+# first field items present in j1.txt but not j2.txt
+##### add your solution here
+apple 2
+lemon 10
+tomato 22
+
+# first field items present in j2.txt but not j1.txt
+##### add your solution here
+almond 33
+mango 20
+pista 42
+

3) Filter lines from j1.txt and j2.txt that match the items from s1.txt.

$ cat s1.txt
+apple
+coffee
+fig
+honey
+mango
+pasta
+sugar
+tea
+
+##### add your solution here
+apple 2
+fig 115
+fig 5
+mango 20
+

4) Join the marks_1.csv and marks_2.csv files to get the expected output shown below.

$ cat marks_1.csv
+Name,Biology,Programming
+Er,92,77
+Ith,100,100
+Lin,92,100
+Sil,86,98
+$ cat marks_2.csv
+Name,Maths,Physics,Chemistry
+Cy,97,98,95
+Ith,100,100,100
+Lin,78,83,80
+
+##### add your solution here
+Name,Biology,Programming,Maths,Physics,Chemistry
+Ith,100,100,100,100,100
+Lin,92,100,78,83,80
+

5) By default, the first field is used to combine the lines. Which options are helpful if you want to change the key field to be used for joining?

6) Join the marks_1.csv and marks_2.csv files to get the expected output with specific fields as shown below.

##### add your solution here
+Name,Programming,Maths,Biology
+Ith,100,100,100
+Lin,100,78,92
+

7) Join the marks_1.csv and marks_2.csv files to get the expected output shown below. Use 50 as the filler data.

##### add your solution here
+Name,Biology,Programming,Maths,Physics,Chemistry
+Cy,50,50,97,98,95
+Er,92,77,50,50,50
+Ith,100,100,100,100,100
+Lin,92,100,78,83,80
+Sil,86,98,50,50,50
+

8) When you use the -o auto option, what'd happen to the extra fields compared to those in the first lines of the input data?

9) From the input files j3.txt and j4.txt, filter only the lines are unique — i.e. lines that are not common to these files. Assume that the input files do not have duplicate entries.

$ cat j3.txt
+almond
+apple pie
+cold coffee
+honey
+mango shake
+pasta
+sugar
+tea
+$ cat j4.txt
+apple
+banana shake
+coffee
+fig
+honey
+mango shake
+milk
+tea
+yeast
+
+##### add your solution here
+almond
+apple
+apple pie
+banana shake
+coffee
+cold coffee
+fig
+milk
+pasta
+sugar
+yeast
+

10) From the input files j3.txt and j4.txt, filter only the lines are common to these files.

##### add your solution here
+honey
+mango shake
+tea
+
\ No newline at end of file diff --git a/mark.min.js b/mark.min.js new file mode 100644 index 0000000..1636231 --- /dev/null +++ b/mark.min.js @@ -0,0 +1,7 @@ +/*!*************************************************** +* mark.js v8.11.1 +* https://markjs.io/ +* Copyright (c) 2014–2018, Julian Kühnel +* Released under the MIT license https://git.io/vwTVl +*****************************************************/ +!function(e,t){"object"==typeof exports&&"undefined"!=typeof module?module.exports=t():"function"==typeof define&&define.amd?define(t):e.Mark=t()}(this,function(){"use strict";var e="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},t=function(e,t){if(!(e instanceof t))throw new TypeError("Cannot call a class as a function")},n=function(){function e(e,t){for(var n=0;n1&&void 0!==arguments[1])||arguments[1],i=arguments.length>2&&void 0!==arguments[2]?arguments[2]:[],o=arguments.length>3&&void 0!==arguments[3]?arguments[3]:5e3;t(this,e),this.ctx=n,this.iframes=r,this.exclude=i,this.iframesTimeout=o}return n(e,[{key:"getContexts",value:function(){var e=[];return(void 0!==this.ctx&&this.ctx?NodeList.prototype.isPrototypeOf(this.ctx)?Array.prototype.slice.call(this.ctx):Array.isArray(this.ctx)?this.ctx:"string"==typeof this.ctx?Array.prototype.slice.call(document.querySelectorAll(this.ctx)):[this.ctx]:[]).forEach(function(t){var n=e.filter(function(e){return e.contains(t)}).length>0;-1!==e.indexOf(t)||n||e.push(t)}),e}},{key:"getIframeContents",value:function(e,t){var n=arguments.length>2&&void 0!==arguments[2]?arguments[2]:function(){},r=void 0;try{var i=e.contentWindow;if(r=i.document,!i||!r)throw new Error("iframe inaccessible")}catch(e){n()}r&&t(r)}},{key:"isIframeBlank",value:function(e){var t="about:blank",n=e.getAttribute("src").trim();return e.contentWindow.location.href===t&&n!==t&&n}},{key:"observeIframeLoad",value:function(e,t,n){var r=this,i=!1,o=null,a=function a(){if(!i){i=!0,clearTimeout(o);try{r.isIframeBlank(e)||(e.removeEventListener("load",a),r.getIframeContents(e,t,n))}catch(e){n()}}};e.addEventListener("load",a),o=setTimeout(a,this.iframesTimeout)}},{key:"onIframeReady",value:function(e,t,n){try{"complete"===e.contentWindow.document.readyState?this.isIframeBlank(e)?this.observeIframeLoad(e,t,n):this.getIframeContents(e,t,n):this.observeIframeLoad(e,t,n)}catch(e){n()}}},{key:"waitForIframes",value:function(e,t){var n=this,r=0;this.forEachIframe(e,function(){return!0},function(e){r++,n.waitForIframes(e.querySelector("html"),function(){--r||t()})},function(e){e||t()})}},{key:"forEachIframe",value:function(t,n,r){var i=this,o=arguments.length>3&&void 0!==arguments[3]?arguments[3]:function(){},a=t.querySelectorAll("iframe"),s=a.length,c=0;a=Array.prototype.slice.call(a);var u=function(){--s<=0&&o(c)};s||u(),a.forEach(function(t){e.matches(t,i.exclude)?u():i.onIframeReady(t,function(e){n(t)&&(c++,r(e)),u()},u)})}},{key:"createIterator",value:function(e,t,n){return document.createNodeIterator(e,t,n,!1)}},{key:"createInstanceOnIframe",value:function(t){return new e(t.querySelector("html"),this.iframes)}},{key:"compareNodeIframe",value:function(e,t,n){if(e.compareDocumentPosition(n)&Node.DOCUMENT_POSITION_PRECEDING){if(null===t)return!0;if(t.compareDocumentPosition(n)&Node.DOCUMENT_POSITION_FOLLOWING)return!0}return!1}},{key:"getIteratorNode",value:function(e){var t=e.previousNode();return{prevNode:t,node:null===t?e.nextNode():e.nextNode()&&e.nextNode()}}},{key:"checkIframeFilter",value:function(e,t,n,r){var i=!1,o=!1;return r.forEach(function(e,t){e.val===n&&(i=t,o=e.handled)}),this.compareNodeIframe(e,t,n)?(!1!==i||o?!1===i||o||(r[i].handled=!0):r.push({val:n,handled:!0}),!0):(!1===i&&r.push({val:n,handled:!1}),!1)}},{key:"handleOpenIframes",value:function(e,t,n,r){var i=this;e.forEach(function(e){e.handled||i.getIframeContents(e.val,function(e){i.createInstanceOnIframe(e).forEachNode(t,n,r)})})}},{key:"iterateThroughNodes",value:function(e,t,n,r,i){for(var o,a=this,s=this.createIterator(t,e,r),c=[],u=[],l=void 0,h=void 0;void 0,o=a.getIteratorNode(s),h=o.prevNode,l=o.node;)this.iframes&&this.forEachIframe(t,function(e){return a.checkIframeFilter(l,h,e,c)},function(t){a.createInstanceOnIframe(t).forEachNode(e,function(e){return u.push(e)},r)}),u.push(l);u.forEach(function(e){n(e)}),this.iframes&&this.handleOpenIframes(c,e,n,r),i()}},{key:"forEachNode",value:function(e,t,n){var r=this,i=arguments.length>3&&void 0!==arguments[3]?arguments[3]:function(){},o=this.getContexts(),a=o.length;a||i(),o.forEach(function(o){var s=function(){r.iterateThroughNodes(e,o,t,n,function(){--a<=0&&i()})};r.iframes?r.waitForIframes(o,s):s()})}}],[{key:"matches",value:function(e,t){var n="string"==typeof t?[t]:t,r=e.matches||e.matchesSelector||e.msMatchesSelector||e.mozMatchesSelector||e.oMatchesSelector||e.webkitMatchesSelector;if(r){var i=!1;return n.every(function(t){return!r.call(e,t)||(i=!0,!1)}),i}return!1}}]),e}(),o=function(){function e(n){t(this,e),this.opt=r({},{diacritics:!0,synonyms:{},accuracy:"partially",caseSensitive:!1,ignoreJoiners:!1,ignorePunctuation:[],wildcards:"disabled"},n)}return n(e,[{key:"create",value:function(e){return"disabled"!==this.opt.wildcards&&(e=this.setupWildcardsRegExp(e)),e=this.escapeStr(e),Object.keys(this.opt.synonyms).length&&(e=this.createSynonymsRegExp(e)),(this.opt.ignoreJoiners||this.opt.ignorePunctuation.length)&&(e=this.setupIgnoreJoinersRegExp(e)),this.opt.diacritics&&(e=this.createDiacriticsRegExp(e)),e=this.createMergedBlanksRegExp(e),(this.opt.ignoreJoiners||this.opt.ignorePunctuation.length)&&(e=this.createJoinersRegExp(e)),"disabled"!==this.opt.wildcards&&(e=this.createWildcardsRegExp(e)),e=this.createAccuracyRegExp(e),new RegExp(e,"gm"+(this.opt.caseSensitive?"":"i"))}},{key:"escapeStr",value:function(e){return e.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g,"\\$&")}},{key:"createSynonymsRegExp",value:function(e){var t=this.opt.synonyms,n=this.opt.caseSensitive?"":"i",r=this.opt.ignoreJoiners||this.opt.ignorePunctuation.length?"\0":"";for(var i in t)if(t.hasOwnProperty(i)){var o=t[i],a="disabled"!==this.opt.wildcards?this.setupWildcardsRegExp(i):this.escapeStr(i),s="disabled"!==this.opt.wildcards?this.setupWildcardsRegExp(o):this.escapeStr(o);""!==a&&""!==s&&(e=e.replace(new RegExp("("+this.escapeStr(a)+"|"+this.escapeStr(s)+")","gm"+n),r+"("+this.processSynonyms(a)+"|"+this.processSynonyms(s)+")"+r))}return e}},{key:"processSynonyms",value:function(e){return(this.opt.ignoreJoiners||this.opt.ignorePunctuation.length)&&(e=this.setupIgnoreJoinersRegExp(e)),e}},{key:"setupWildcardsRegExp",value:function(e){return(e=e.replace(/(?:\\)*\?/g,function(e){return"\\"===e.charAt(0)?"?":""})).replace(/(?:\\)*\*/g,function(e){return"\\"===e.charAt(0)?"*":""})}},{key:"createWildcardsRegExp",value:function(e){var t="withSpaces"===this.opt.wildcards;return e.replace(/\u0001/g,t?"[\\S\\s]?":"\\S?").replace(/\u0002/g,t?"[\\S\\s]*?":"\\S*")}},{key:"setupIgnoreJoinersRegExp",value:function(e){return e.replace(/[^(|)\\]/g,function(e,t,n){var r=n.charAt(t+1);return/[(|)\\]/.test(r)||""===r?e:e+"\0"})}},{key:"createJoinersRegExp",value:function(e){var t=[],n=this.opt.ignorePunctuation;return Array.isArray(n)&&n.length&&t.push(this.escapeStr(n.join(""))),this.opt.ignoreJoiners&&t.push("\\u00ad\\u200b\\u200c\\u200d"),t.length?e.split(/\u0000+/).join("["+t.join("")+"]*"):e}},{key:"createDiacriticsRegExp",value:function(e){var t=this.opt.caseSensitive?"":"i",n=this.opt.caseSensitive?["aàáảãạăằắẳẵặâầấẩẫậäåāą","AÀÁẢÃẠĂẰẮẲẴẶÂẦẤẨẪẬÄÅĀĄ","cçćč","CÇĆČ","dđď","DĐĎ","eèéẻẽẹêềếểễệëěēę","EÈÉẺẼẸÊỀẾỂỄỆËĚĒĘ","iìíỉĩịîïī","IÌÍỈĨỊÎÏĪ","lł","LŁ","nñňń","NÑŇŃ","oòóỏõọôồốổỗộơởỡớờợöøō","OÒÓỎÕỌÔỒỐỔỖỘƠỞỠỚỜỢÖØŌ","rř","RŘ","sšśșş","SŠŚȘŞ","tťțţ","TŤȚŢ","uùúủũụưừứửữựûüůū","UÙÚỦŨỤƯỪỨỬỮỰÛÜŮŪ","yýỳỷỹỵÿ","YÝỲỶỸỴŸ","zžżź","ZŽŻŹ"]:["aàáảãạăằắẳẵặâầấẩẫậäåāąAÀÁẢÃẠĂẰẮẲẴẶÂẦẤẨẪẬÄÅĀĄ","cçćčCÇĆČ","dđďDĐĎ","eèéẻẽẹêềếểễệëěēęEÈÉẺẼẸÊỀẾỂỄỆËĚĒĘ","iìíỉĩịîïīIÌÍỈĨỊÎÏĪ","lłLŁ","nñňńNÑŇŃ","oòóỏõọôồốổỗộơởỡớờợöøōOÒÓỎÕỌÔỒỐỔỖỘƠỞỠỚỜỢÖØŌ","rřRŘ","sšśșşSŠŚȘŞ","tťțţTŤȚŢ","uùúủũụưừứửữựûüůūUÙÚỦŨỤƯỪỨỬỮỰÛÜŮŪ","yýỳỷỹỵÿYÝỲỶỸỴŸ","zžżźZŽŻŹ"],r=[];return e.split("").forEach(function(i){n.every(function(n){if(-1!==n.indexOf(i)){if(r.indexOf(n)>-1)return!1;e=e.replace(new RegExp("["+n+"]","gm"+t),"["+n+"]"),r.push(n)}return!0})}),e}},{key:"createMergedBlanksRegExp",value:function(e){return e.replace(/[\s]+/gim,"[\\s]+")}},{key:"createAccuracyRegExp",value:function(e){var t=this,n=this.opt.accuracy,r="string"==typeof n?n:n.value,i="";switch(("string"==typeof n?[]:n.limiters).forEach(function(e){i+="|"+t.escapeStr(e)}),r){case"partially":default:return"()("+e+")";case"complementary":return"()([^"+(i="\\s"+(i||this.escapeStr("!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~¡¿")))+"]*"+e+"[^"+i+"]*)";case"exactly":return"(^|\\s"+i+")("+e+")(?=$|\\s"+i+")"}}}]),e}(),a=function(){function a(e){t(this,a),this.ctx=e,this.ie=!1;var n=window.navigator.userAgent;(n.indexOf("MSIE")>-1||n.indexOf("Trident")>-1)&&(this.ie=!0)}return n(a,[{key:"log",value:function(t){var n=arguments.length>1&&void 0!==arguments[1]?arguments[1]:"debug",r=this.opt.log;this.opt.debug&&"object"===(void 0===r?"undefined":e(r))&&"function"==typeof r[n]&&r[n]("mark.js: "+t)}},{key:"getSeparatedKeywords",value:function(e){var t=this,n=[];return e.forEach(function(e){t.opt.separateWordSearch?e.split(" ").forEach(function(e){e.trim()&&-1===n.indexOf(e)&&n.push(e)}):e.trim()&&-1===n.indexOf(e)&&n.push(e)}),{keywords:n.sort(function(e,t){return t.length-e.length}),length:n.length}}},{key:"isNumeric",value:function(e){return Number(parseFloat(e))==e}},{key:"checkRanges",value:function(e){var t=this;if(!Array.isArray(e)||"[object Object]"!==Object.prototype.toString.call(e[0]))return this.log("markRanges() will only accept an array of objects"),this.opt.noMatch(e),[];var n=[],r=0;return e.sort(function(e,t){return e.start-t.start}).forEach(function(e){var i=t.callNoMatchOnInvalidRanges(e,r),o=i.start,a=i.end;i.valid&&(e.start=o,e.length=a-o,n.push(e),r=a)}),n}},{key:"callNoMatchOnInvalidRanges",value:function(e,t){var n=void 0,r=void 0,i=!1;return e&&void 0!==e.start?(r=(n=parseInt(e.start,10))+parseInt(e.length,10),this.isNumeric(e.start)&&this.isNumeric(e.length)&&r-t>0&&r-n>0?i=!0:(this.log("Ignoring invalid or overlapping range: "+JSON.stringify(e)),this.opt.noMatch(e))):(this.log("Ignoring invalid range: "+JSON.stringify(e)),this.opt.noMatch(e)),{start:n,end:r,valid:i}}},{key:"checkWhitespaceRanges",value:function(e,t,n){var r=void 0,i=!0,o=n.length,a=t-o,s=parseInt(e.start,10)-a;return(r=(s=s>o?o:s)+parseInt(e.length,10))>o&&(r=o,this.log("End range automatically set to the max value of "+o)),s<0||r-s<0||s>o||r>o?(i=!1,this.log("Invalid range: "+JSON.stringify(e)),this.opt.noMatch(e)):""===n.substring(s,r).replace(/\s+/g,"")&&(i=!1,this.log("Skipping whitespace only range: "+JSON.stringify(e)),this.opt.noMatch(e)),{start:s,end:r,valid:i}}},{key:"getTextNodes",value:function(e){var t=this,n="",r=[];this.iterator.forEachNode(NodeFilter.SHOW_TEXT,function(e){r.push({start:n.length,end:(n+=e.textContent).length,node:e})},function(e){return t.matchesExclude(e.parentNode)?NodeFilter.FILTER_REJECT:NodeFilter.FILTER_ACCEPT},function(){e({value:n,nodes:r})})}},{key:"matchesExclude",value:function(e){return i.matches(e,this.opt.exclude.concat(["script","style","title","head","html"]))}},{key:"wrapRangeInTextNode",value:function(e,t,n){var r=this.opt.element?this.opt.element:"mark",i=e.splitText(t),o=i.splitText(n-t),a=document.createElement(r);return a.setAttribute("data-markjs","true"),this.opt.className&&a.setAttribute("class",this.opt.className),a.textContent=i.textContent,i.parentNode.replaceChild(a,i),o}},{key:"wrapRangeInMappedTextNode",value:function(e,t,n,r,i){var o=this;e.nodes.every(function(a,s){var c=e.nodes[s+1];if(void 0===c||c.start>t){if(!r(a.node))return!1;var u=t-a.start,l=(n>a.end?a.end:n)-a.start,h=e.value.substr(0,a.start),f=e.value.substr(l+a.start);if(a.node=o.wrapRangeInTextNode(a.node,u,l),e.value=h+f,e.nodes.forEach(function(t,n){n>=s&&(e.nodes[n].start>0&&n!==s&&(e.nodes[n].start-=l),e.nodes[n].end-=l)}),n-=l,i(a.node.previousSibling,a.start),!(n>a.end))return!1;t=a.end}return!0})}},{key:"wrapGroups",value:function(e,t,n,r){return r((e=this.wrapRangeInTextNode(e,t,t+n)).previousSibling),e}},{key:"separateGroups",value:function(e,t,n,r,i){for(var o=t.length,a=1;a-1&&r(t[a],e)&&(e=this.wrapGroups(e,s,t[a].length,i))}return e}},{key:"wrapMatches",value:function(e,t,n,r,i){var o=this,a=0===t?0:t+1;this.getTextNodes(function(t){t.nodes.forEach(function(t){t=t.node;for(var i=void 0;null!==(i=e.exec(t.textContent))&&""!==i[a];){if(o.opt.separateGroups)t=o.separateGroups(t,i,a,n,r);else{if(!n(i[a],t))continue;var s=i.index;if(0!==a)for(var c=1;cnl - CLI text processing with GNU Coreutils

nl

If the numbering options provided by cat isn't enough, nl might suit you better. Apart from options to customize the number formatting and the separator, you can also filter which lines should be numbered. Additionally, you can divide your input into sections and number them separately.

Default numbering

By default, nl will prefix line numbers and a tab character to every non-empty input lines. The default number formatting is 6 characters wide and right justified with spaces. Similar to cat, the nl command will concatenate multiple inputs.

# same as: cat -n greeting.txt fruits.txt nums.txt
+$ nl greeting.txt fruits.txt nums.txt
+     1  Hi there
+     2  Have a nice day
+     3  banana
+     4  papaya
+     5  mango
+     6  3.14
+     7  42
+     8  1000
+
+# example for input with empty lines, same as: cat -b
+$ printf 'apple\n\nbanana\n\ncherry\n' | nl
+     1  apple
+
+     2  banana
+
+     3  cherry
+

Number formatting

You can use the -n option to customize the number formatting. The available styles are:

  • rn right justified with space fillers (default)
  • rz right justified with leading zeros
  • ln left justified with space fillers
# right justified with space fillers
+$ nl -n'rn' greeting.txt
+     1  Hi there
+     2  Have a nice day
+
+# right justified with leading zeros
+$ nl -n'rz' greeting.txt
+000001  Hi there
+000002  Have a nice day
+
+# left justified with space fillers
+$ nl -n'ln' greeting.txt
+1       Hi there
+2       Have a nice day
+

Customize width

You can use the -w option to specify the width to be used for the numbers (default is 6).

$ nl greeting.txt
+     1  Hi there
+     2  Have a nice day
+
+$ nl -w2 greeting.txt
+ 1      Hi there
+ 2      Have a nice day
+

Customize separator

By default, a tab character is used to separate the line number and the line content. You can use the -s option to specify your own custom string separator.

$ nl -w2 -s' ' greeting.txt
+ 1 Hi there
+ 2 Have a nice day
+
+$ nl -w1 -s' --> ' greeting.txt
+1 --> Hi there
+2 --> Have a nice day
+

Starting number and step value

The -v option allows you to specify a different starting integer. Negative integer is also allowed.

$ nl -v10 greeting.txt
+    10  Hi there
+    11  Have a nice day
+
+$ nl -v-1 fruits.txt
+    -1  banana
+     0  papaya
+     1  mango
+

The -i option allows you to specify an integer as the step value (default is 1).

$ nl -w2 -s') ' -i2 greeting.txt fruits.txt nums.txt
+ 1) Hi there
+ 3) Have a nice day
+ 5) banana
+ 7) papaya
+ 9) mango
+11) 3.14
+13) 42
+15) 1000
+
+$ nl -w1 -s'. ' -v8 -i-1 greeting.txt fruits.txt
+8. Hi there
+7. Have a nice day
+6. banana
+5. papaya
+4. mango
+

Section wise numbering

If you organize your input with lines conforming to specific patterns, you can control their numbering separately. nl recognizes three types of sections with the following default patterns:

  • \:\:\: as header
  • \:\: as body
  • \: as footer

These special lines will be replaced with an empty line after numbering. The numbering will be reset at the start of every section. Here's an example with multiple body sections:

$ cat body.txt
+\:\:
+Hi there
+How are you
+\:\:
+banana
+papaya
+mango
+
+$ nl -w1 -s' ' body.txt
+
+1 Hi there
+2 How are you
+
+1 banana
+2 papaya
+3 mango
+

Here's an example with both header and body sections. By default, header and footer section lines are not numbered (you'll see options to enable them later).

$ cat header_body.txt
+\:\:\:
+Header
+teal
+\:\:
+Hi there
+How are you
+\:\:
+banana
+papaya
+mango
+\:\:\:
+Header
+green
+
+$ nl -w1 -s' ' header_body.txt
+
+  Header
+  teal
+
+1 Hi there
+2 How are you
+
+1 banana
+2 papaya
+3 mango
+
+  Header
+  green
+

And here's an example with all the three types of sections:

$ cat all_sections.txt
+\:\:\:
+Header
+teal
+\:\:
+Hi there
+How are you
+\:\:
+banana
+papaya
+mango
+\:
+Footer
+
+$ nl -w1 -s' ' all_sections.txt
+
+  Header
+  teal
+
+1 Hi there
+2 How are you
+
+1 banana
+2 papaya
+3 mango
+
+  Footer
+

The -b, -h and -f options control which lines should be numbered for the three types of sections. Use a to number all lines of a particular section (other features will discussed later).

$ nl -w1 -s' ' -ha -fa all_sections.txt
+
+1 Header
+2 teal
+
+1 Hi there
+2 How are you
+
+1 banana
+2 papaya
+3 mango
+
+1 Footer
+

If you use the -p option, the numbering will not be reset on encountering a new section.

$ nl -p -w1 -s' ' all_sections.txt
+
+  Header
+  teal
+
+1 Hi there
+2 How are you
+
+3 banana
+4 papaya
+5 mango
+
+  Footer
+
+$ nl -p -w1 -s' ' -ha -fa all_sections.txt
+
+1 Header
+2 teal
+
+3 Hi there
+4 How are you
+
+5 banana
+6 papaya
+7 mango
+
+8 Footer
+

The -d option allows you to customize the two character pattern used for sections.

# pattern changed from \: to %=
+$ cat body_sep.txt
+%=%=
+apple
+banana
+%=%=
+teal
+green
+
+$ nl -w1 -s' ' -d'%=' body_sep.txt
+
+1 apple
+2 banana
+
+1 teal
+2 green
+

Section numbering criteria

As mentioned earlier, the -b, -h and -f options control which lines should be numbered for the three types of sections. These options accept the following arguments:

  • a number all lines, including empty lines
  • t number lines except empty ones (default for body sections)
  • n do not number lines (default for header and footer sections)
  • pBRE use basic regular expressions (BRE) to filter lines for numbering

If the input doesn't have special patterns to identify the different sections, it will be treated as if it has a single body section. Here's an example to include empty lines for numbering:

$ printf 'apple\n\nbanana\n\ncherry\n' | nl -w1 -s' ' -ba
+1 apple
+2 
+3 banana
+4 
+5 cherry
+

The -l option controls how many consecutive empty lines should be considered as a single entry. Only the last empty line of such groupings will be numbered.

# only the 2nd consecutive empty line will be considered for numbering
+$ printf 'a\n\n\n\n\nb\n\nc' | nl -w1 -s' ' -ba -l2
+1 a
+  
+2 
+  
+3 
+4 b
+  
+5 c
+

Here's an example which uses regular expressions to identify the lines to be numbered:

# number lines starting with 'c' or 't'
+$ nl -w1 -s' ' -bp'^[ct]' purchases.txt
+1 coffee
+2 tea
+  washing powder
+3 coffee
+4 toothpaste
+5 tea
+  soap
+6 tea
+

info See the Regular Expressions chapter from my GNU grep ebook if you want to learn more about regexp syntax and features.

Exercises

info The exercises directory has all the files used in this section.

1) nl and cat -n are always equivalent for numbering lines. True or False?

2) What does the -n option do?

3) Use nl to produce the two expected outputs shown below.

$ cat greeting.txt
+Hi there
+Have a nice day
+
+# expected output 1
+##### add your solution here
+001     Hi there
+002     Have a nice day
+
+# expected output 2
+##### add your solution here
+001) Hi there
+002) Have a nice day
+

4) Figure out the logic based on the given input and output data.

$ cat s1.txt
+apple
+coffee
+fig
+honey
+mango
+pasta
+sugar
+tea
+
+##### add your solution here
+15. apple
+13. coffee
+11. fig
+ 9. honey
+ 7. mango
+ 5. pasta
+ 3. sugar
+ 1. tea
+

5) What are the three types of sections supported by nl?

6) Only number the lines that start with ---- in the format shown below.

$ cat blocks.txt
+----
+apple--banana
+mango---fig
+----
+3.14
+-42
+1000
+----
+sky blue
+dark green
+----
+hi hello
+
+##### add your solution here
+ 1) ----
+    apple--banana
+    mango---fig
+ 2) ----
+    3.14
+    -42
+    1000
+ 3) ----
+    sky blue
+    dark green
+ 4) ----
+    hi hello
+

7) For the blocks.txt file, determine the logic to produce the expected output shown below.

##### add your solution here
+
+1. apple--banana
+2. mango---fig
+
+1. 3.14
+2. -42
+3. 1000
+
+1. sky blue
+2. dark green
+
+1. hi hello
+

8) What does the -l option do?

9) Figure out the logic based on the given input and output data.

$ cat all_sections.txt
+\:\:\:
+Header
+teal
+\:\:
+Hi there
+How are you
+\:\:
+banana
+papaya
+mango
+\:
+Footer
+
+##### add your solution here
+
+ 1) Header
+ 2) teal
+
+ 3) Hi there
+ 4) How are you
+
+ 5) banana
+ 6) papaya
+ 7) mango
+
+    Footer
+
\ No newline at end of file diff --git a/paste.html b/paste.html new file mode 100644 index 0000000..07bd339 --- /dev/null +++ b/paste.html @@ -0,0 +1,202 @@ +paste - CLI text processing with GNU Coreutils

paste

paste is typically used to merge two or more files column wise. It also has a handy feature for serializing data.

Concatenating files column wise

Consider these two input files:

$ cat colors_1.txt
+Blue
+Brown
+Orange
+Purple
+Red
+Teal
+White
+
+$ cat colors_2.txt
+Black
+Blue
+Green
+Orange
+Pink
+Red
+White
+

By default, paste adds a tab character between corresponding lines of the input files.

$ paste colors_1.txt colors_2.txt
+Blue    Black
+Brown   Blue
+Orange  Green
+Purple  Orange
+Red     Pink
+Teal    Red
+White   White
+

You can use the -d option to change the delimiter between the columns. The separator is added even if the data has been exhausted for some of the input files. Here are some examples with single character delimiters. Multicharacter separation will be discussed later.

$ seq 4 | paste -d, - <(seq 6 9)
+1,6
+2,7
+3,8
+4,9
+
+# quote the delimiter if it is a shell metacharacter
+$ paste -d'|' <(seq 3) <(seq 4 5) <(seq 6 8)
+1|4|6
+2|5|7
+3||8
+

Use an empty string if you don't want any delimiter between the columns. You can also use \0 for this case, but that'd be confusing since it is typically used to mean the NUL character.

# note that the space between -d and empty string is necessary here
+$ paste -d '' <(seq 3) <(seq 6 8)
+16
+27
+38
+

info You can pass the same filename multiple times too — they will be treated as if they are separate inputs. This doesn't apply for stdin data though, which is a special case as discussed in a later section.

Interleaving lines

By setting the newline character as the delimiter, you'll get interleaved lines.

$ paste -d'\n' <(seq 11 13) <(seq 101 103)
+11
+101
+12
+102
+13
+103
+

Multiple columns from single input

If you use - multiple times, paste will consume a line from stdin data every time - is encountered. This is different from using the same filename multiple times, in which case they are treated as separate inputs.

This special case for stdin data is useful to combine consecutive lines using the given delimiter. Here are some examples to help you understand this feature better:

# two columns
+$ seq 10 | paste -d, - -
+1,2
+3,4
+5,6
+7,8
+9,10
+# five columns
+$ seq 10 | paste -d: - - - - -
+1:2:3:4:5
+6:7:8:9:10
+
+# use shell redirection for file input
+$ <colors_1.txt paste -d: - - -
+Blue:Brown:Orange
+Purple:Red:Teal
+White::
+

Here's an example with both stdin and file arguments:

$ seq 6 | paste - nums.txt -
+1       3.14    2
+3       42      4
+5       1000    6
+

If you don't want to manually type the number of - required, you can use this printf trick:

# the string before %.s is repeated based on the number of arguments
+$ printf 'x %.s' a b c
+x x x 
+$ printf -- '- %.s' {1..5}
+- - - - - 
+
+$ seq 10 | paste -d, $(printf -- '- %.s' {1..5})
+1,2,3,4,5
+6,7,8,9,10
+

info See this stackoverflow thread for more details about the printf solution and other alternatives.

Multicharacter delimiters

The -d option accepts a list of characters (bytes to be precise) to be used one by one between the different columns. If the number of characters is less than the number of separators required, the characters are reused from the beginning and this cycle repeats until all the columns are done. If the number of characters is greater than the number of separators required, the extra characters are simply discarded.

# , is used between the 1st and 2nd columns
+# - is used between the 2nd and 3rd columns
+$ paste -d',-' <(seq 3) <(seq 4 6) <(seq 7 9)
+1,4-7
+2,5-8
+3,6-9
+
+# only 3 separators are needed, the rest are discarded
+$ paste -d',-:;.[]' <(seq 3) <(seq 4 6) <(seq 7 9) <(seq 10 12)
+1,4-7:10
+2,5-8:11
+3,6-9:12
+
+# 2 characters given, 4 separators needed
+# paste will reuse from the start of the list
+$ seq 10 | paste -d':,' - - - - -
+1:2,3:4,5
+6:7,8:9,10
+

You can use empty files to get multicharacter separation between the columns.

$ paste -d' : ' <(seq 3) /dev/null /dev/null <(seq 4 6)
+1 : 4
+2 : 5
+3 : 6
+
+# create an empty file to avoid typing /dev/null too many times
+$ > e
+$ paste -d' :  - ' <(seq 3) e e <(seq 4 6) e e <(seq 7 9)
+1 : 4 - 7
+2 : 5 - 8
+3 : 6 - 9
+

Serialize

The -s option allows you to combine all the input lines from a file into a single line using the given delimiter. paste will ensure to add a final newline character even if it wasn't present in the input.

# this will give you a trailing comma
+# and there won't be a newline character at the end
+$ <colors_1.txt tr '\n' ','
+Blue,Brown,Orange,Purple,Red,Teal,White,
+# paste changes the separator between the lines only
+# and there will be a newline character at the end
+$ paste -sd, colors_1.txt
+Blue,Brown,Orange,Purple,Red,Teal,White
+
+# newline gets added at the end even if not present in the input
+$ printf 'apple\nbanana\ncherry' | paste -sd-
+apple-banana-cherry
+

If multiple files are passed, serialization of each file is displayed on separate lines.

$ paste -sd: colors_1.txt colors_2.txt
+Blue:Brown:Orange:Purple:Red:Teal:White
+Black:Blue:Green:Orange:Pink:Red:White
+
+$ paste -sd, <(seq 3) <(seq 5 9)
+1,2,3
+5,6,7,8,9
+

NUL separator

Use the -z option if you want to use NUL character as the line separator. In this scenario, paste will ensure to add a final NUL character even if not present in the input.

$ printf 'a\0b\0c\0d\0e\0f\0g\0h' | paste -z -d: - - - - | cat -v
+a:b:c:d^@e:f:g:h^@
+

Exercises

info The exercises directory has all the files used in this section.

1) What's the default delimiter character added by the paste command? Which option would you use to customize this separator?

2) Will the following two commands produce equivalent output? If not, why not?

$ paste -d, <(seq 3) <(printf '%s\n' item_{1..3})
+
+$ printf '%s\n' {1..3},item_{1..3}
+

3) Combine the two data sources as shown below.

$ printf '1)\n2)\n3)'
+1)
+2)
+3)
+$ cat fruits.txt
+banana
+papaya
+mango
+
+##### add your solution here
+1)banana
+2)papaya
+3)mango
+

4) Interleave the contents of fruits.txt and books.txt.

##### add your solution here
+banana
+Cradle:::Mage Errant::The Weirkey Chronicles
+papaya
+Mother of Learning::Eight:::::Dear Spellbook:Ascendant
+mango
+Mark of the Fool:Super Powereds:::Ends of Magic
+

5) Generate numbers 1 to 9 in two different formats as shown below.

##### add your solution here
+1:2:3
+4:5:6
+7:8:9
+
+##### add your solution here
+1 : 4 : 7
+2 : 5 : 8
+3 : 6 : 9
+

6) Combine the contents of fruits.txt and colors.txt as shown below.

$ cat fruits.txt
+banana
+papaya
+mango
+$ cat colors.txt
+deep blue
+light orange
+blue delight
+
+##### add your solution here
+banana,deep blue,papaya,light orange,mango,blue delight
+
\ No newline at end of file diff --git a/pr.html b/pr.html new file mode 100644 index 0000000..c86bf16 --- /dev/null +++ b/pr.html @@ -0,0 +1,182 @@ +pr - CLI text processing with GNU Coreutils

pr

Paginate or columnate FILE(s) for printing.

As stated in the above quote from the manual, the pr command is mainly used for those two tasks. This book will discuss only the columnate features and some miscellaneous tasks.

Here's a pagination example if you are interested in exploring further. The pr command will add blank lines, a header and so on to make it suitable for printing.

$ pr greeting.txt | head
+
+
+2024-02-26 15:07                   greeting.txt                   Page 1
+
+
+Hi there
+Have a nice day
+
+
+
+

Columnate

The --columns and -a options can be used to merge the input lines in two different ways:

  • split the input file and then merge them as columns
  • merge consecutive lines, similar to the paste command

Here's an example to get started. Note that -N is same as using --columns=N where N is the number of columns you want in the output. The default page width is 72, which means each column can only have a maximum of 72/N characters (including the separator). Tab and space characters will be used to fill the columns as needed. You can use the -J option to prevent pr from truncating longer columns. The -t option is used here to turn off the pagination features.

# split input into three parts
+# each column width is 72/3 = 24 characters max
+$ seq 9 | pr -3t
+1                       4                       7
+2                       5                       8
+3                       6                       9
+

You can customize the separator using the -s option. The default is a tab character which you can change to any other string value. The -s option also turns off line truncation, so the -J option isn't needed. However, the default page width of 72 can still cause issues, which will be discussed later.

# tab separator
+$ seq 9 | pr -3ts
+1       4       7
+2       5       8
+3       6       9
+
+# comma separator
+$ seq 9 | pr -3ts,
+1,4,7
+2,5,8
+3,6,9
+
+# multicharacter separator
+$ seq 9 | pr -3ts' : '
+1 : 4 : 7
+2 : 5 : 8
+3 : 6 : 9
+

Use the -a option to merge consecutive lines, similar to the paste command. One advantage is that the -s option supports a string value, whereas with paste you'd need to use workarounds to get multicharacter separation.

# four consecutive lines are merged
+# same as: paste -d: - - - -
+$ seq 8 | pr -4ats:
+1:2:3:4
+5:6:7:8
+

There are other differences between the pr and paste commands as well. Unlike paste, the pr command doesn't add the separator if the last row doesn't have enough columns. Another difference is that pr doesn't support an option to use the NUL character as the line separator.

$ seq 10 | pr -4ats,
+1,2,3,4
+5,6,7,8
+9,10
+
+$ seq 10 | paste -d, - - - -
+1,2,3,4
+5,6,7,8
+9,10,,
+

Customizing page width

As mentioned before, the default page width is 72. This can cause lines to be truncated, unless the -s or -J options are used. There's another issue you might run into, for example:

$ seq 100 | pr -50ats,
+pr: page width too narrow
+

(N-1)*length(separator) + N is the minimum page width you need, where N is the number of columns required. So, for 50 columns and a separator of length 1, you'll need a minimum width of 99. This calculation doesn't make any assumption about the size of input lines, so you may need -J to ensure input lines aren't truncated.

You can use the -w option to change the page width. The -w option overrides the effect of -s option on line truncation, so use -J option as well unless you really need truncation. If truncation is active, maximum column width is (PageWidth - (N-1)*length(separator)) / N rounded down to an integer value. Here are some examples:

# minimum width needed is 3 for N=2 and length=1
+# maximum column width: (6 - 1) / 2 = 2
+$ pr -w6 -2ts, greeting.txt
+Hi,Ha
+# use -J to avoid truncation
+$ pr -J -w6 -2ts, greeting.txt
+Hi there,Have a nice day
+
+# N=3 and length=4, so minimum width needed is (3-1)*4 + 3 = 11
+$ seq 6 | pr -J -w10 -3ats'::::'
+pr: page width too narrow
+$ seq 6 | pr -J -w11 -3ats'::::'
+1::::2::::3
+4::::5::::6
+
+# you can also just use a large number to avoid having to calculate the width
+$ seq 6 | pr -J -w500 -3ats'::::'
+1::::2::::3
+4::::5::::6
+

Concatenating files column wise

Two or more input files can be merged column wise using the -m option. As seen before, -t is needed to ignore pagination features and -s can be used to customize the separator.

# same as: paste colors_1.txt colors_2.txt
+$ pr -mts colors_1.txt colors_2.txt
+Blue    Black
+Brown   Blue
+Orange  Green
+Purple  Orange
+Red     Pink
+Teal    Red
+White   White
+
+# same as: paste -d' : ' <(seq 3) /dev/null /dev/null <(seq 4 6)
+$ pr -mts' : ' <(seq 3) <(seq 4 6)
+1 : 4
+2 : 5
+3 : 6
+

You can prefix the output with line numbers using the -n option. By default, this option supports up to 5 digit numbers and uses the tab character to separate the numbering and line contents. You can optionally pass two arguments to this option — maximum number of digits and the separator character. If both arguments are used, the separator should be specified first. If you want to customize the starting line number, use the -N option as well.

# maximum of 1 digit for numbering
+# use : as the separator between the line number and line contents
+$ pr -n:1 -mts, colors_1.txt colors_2.txt
+1:Blue,Black
+2:Brown,Blue
+3:Orange,Green
+4:Purple,Orange
+5:Red,Pink
+6:Teal,Red
+7:White,White
+

The string passed to -s is treated literally. Depending on your shell you can use ANSI-C quoting to allow escape sequences. Unlike columnate, the separator is added even if the data is missing for some of the files.

# greeting.txt has 2 lines
+# fruits.txt has 3 lines
+# same as: paste -d$'\n' greeting.txt fruits.txt
+$ pr -mts$'\n' greeting.txt fruits.txt
+Hi there
+banana
+Have a nice day
+papaya
+
+mango
+

Miscellaneous

You can use the -d option to double space the input contents. That is, every newline character is doubled.

$ pr -dt fruits.txt
+banana
+
+papaya
+
+mango
+
+

The -v option will convert non-printing characters like carriage return, backspace, etc to their octal representations (\NNN).

$ printf 'car\bt\r\nbike\0p\r\n' | pr -vt
+car\010t\015
+bike\000p\015
+

pr -t is a roundabout way of concatenating input files. But one advantage is that this will add a newline character at the end if not present in the input.

# 'cat' will not add a newline character
+# so, use 'pr' if newline is needed at the end
+$ printf 'a\nb\nc' | pr -t
+a
+b
+c
+

Exercises

info The exercises directory has all the files used in this section.

1) What does the -t option do?

2) Generate numbers 1 to 16 in two different formats as shown below.

$ seq -w 16 | ##### add your solution here
+01,02,03,04
+05,06,07,08
+09,10,11,12
+13,14,15,16
+
+$ seq -w 16 | ##### add your solution here
+01,05,09,13
+02,06,10,14
+03,07,11,15
+04,08,12,16
+

3) How'd you solve the issue shown below?

$ seq 100 | pr -37ats,
+pr: page width too narrow
+

4) Combine the contents of fruits.txt and colors.txt in two different formats as shown below.

$ cat fruits.txt
+banana
+papaya
+mango
+$ cat colors.txt
+deep blue
+light orange
+blue delight
+
+##### add your solution here
+banana : deep blue
+papaya : light orange
+mango : blue delight
+
+##### add your solution here
+ 1:banana,deep blue
+ 2:papaya,light orange
+ 3:mango,blue delight
+

5) What does the -d option do?

\ No newline at end of file diff --git a/preface.html b/preface.html new file mode 100644 index 0000000..286bb67 --- /dev/null +++ b/preface.html @@ -0,0 +1,31 @@ +Preface - CLI text processing with GNU Coreutils

Preface

You might be already aware of popular coreutils commands like head, tail, tr, sort and so on. This book will teach you more than twenty of such specialized text processing tools provided by the GNU coreutils package.

My Command Line Text Processing repo includes chapters on some of these coreutils commands. Those chapters have been significantly edited for this book and new chapters have been added to cover more commands.

Prerequisites

You should be familiar with command line usage in a Unix-like environment. You should be comfortable with concepts like file redirection and command pipelines.

If you are new to the world of command line, check out my Computing from the Command Line ebook and curated resources on Linux CLI and Shell scripting before starting this book.

Conventions

  • The examples presented here have been tested on the GNU bash shell and version 9.1 of the GNU coreutils package.
  • Code snippets shown are copy pasted from the bash shell and modified for presentation purposes. Some commands are preceded by comments to provide context and explanations. Blank lines have been added to improve readability, only real time is shown for speed comparisons and so on.
  • Unless otherwise noted, all examples and explanations are meant for ASCII characters.
  • External links are provided throughout the book for you to explore certain topics in more depth.
  • The cli_text_processing_coreutils repo has all the code snippets and files used in examples, exercises and other details related to the book. If you are not familiar with the git command, click the Code button on the webpage to get the files.

Acknowledgements

Feedback and Errata

I would highly appreciate it if you'd let me know how you felt about this book. It could be anything from a simple thank you, pointing out a typo, mistakes in code snippets, which aspects of the book worked for you (or didn't!) and so on. Reader feedback is essential and especially so for self-published authors.

You can reach me via:

Author info

Sundeep Agarwal is a lazy being who prefers to work just enough to support his modest lifestyle. He accumulated vast wealth working as a Design Engineer at Analog Devices and retired from the corporate world at the ripe age of twenty-eight. Unfortunately, he squandered his savings within a few years and had to scramble trying to earn a living. Against all odds, selling programming ebooks saved his lazy self from having to look for a job again. He can now afford all the fantasy ebooks he wants to read and spends unhealthy amount of time browsing the internet.

When the creative muse strikes, he can be found working on yet another programming ebook (which invariably ends up having at least one example with regular expressions). Researching materials for his ebooks and everyday social media usage drowned his bookmarks, so he maintains curated resource lists for sanity sake. He is thankful for free learning resources and open source tools. His own contributions can be found at https://github.com/learnbyexample.

List of books: https://learnbyexample.github.io/books/

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Code snippets are available under MIT License.

Resources mentioned in Acknowledgements section above are available under original licenses.

Book version

2.0

See Version_changes.md to track changes across book versions.

\ No newline at end of file diff --git a/searcher.js b/searcher.js new file mode 100644 index 0000000..d2b0aee --- /dev/null +++ b/searcher.js @@ -0,0 +1,483 @@ +"use strict"; +window.search = window.search || {}; +(function search(search) { + // Search functionality + // + // You can use !hasFocus() to prevent keyhandling in your key + // event handlers while the user is typing their search. + + if (!Mark || !elasticlunr) { + return; + } + + //IE 11 Compatibility from https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/startsWith + if (!String.prototype.startsWith) { + String.prototype.startsWith = function(search, pos) { + return this.substr(!pos || pos < 0 ? 0 : +pos, search.length) === search; + }; + } + + var search_wrap = document.getElementById('search-wrapper'), + searchbar = document.getElementById('searchbar'), + searchbar_outer = document.getElementById('searchbar-outer'), + searchresults = document.getElementById('searchresults'), + searchresults_outer = document.getElementById('searchresults-outer'), + searchresults_header = document.getElementById('searchresults-header'), + searchicon = document.getElementById('search-toggle'), + content = document.getElementById('content'), + + searchindex = null, + doc_urls = [], + results_options = { + teaser_word_count: 30, + limit_results: 30, + }, + search_options = { + bool: "AND", + expand: true, + fields: { + title: {boost: 1}, + body: {boost: 1}, + breadcrumbs: {boost: 0} + } + }, + mark_exclude = [], + marker = new Mark(content), + current_searchterm = "", + URL_SEARCH_PARAM = 'search', + URL_MARK_PARAM = 'highlight', + teaser_count = 0, + + SEARCH_HOTKEY_KEYCODE = 83, + ESCAPE_KEYCODE = 27, + DOWN_KEYCODE = 40, + UP_KEYCODE = 38, + SELECT_KEYCODE = 13; + + function hasFocus() { + return searchbar === document.activeElement; + } + + function removeChildren(elem) { + while (elem.firstChild) { + elem.removeChild(elem.firstChild); + } + } + + // Helper to parse a url into its building blocks. + function parseURL(url) { + var a = document.createElement('a'); + a.href = url; + return { + source: url, + protocol: a.protocol.replace(':',''), + host: a.hostname, + port: a.port, + params: (function(){ + var ret = {}; + var seg = a.search.replace(/^\?/,'').split('&'); + var len = seg.length, i = 0, s; + for (;i': '>', + '"': '"', + "'": ''' + }; + var repl = function(c) { return MAP[c]; }; + return function(s) { + return s.replace(/[&<>'"]/g, repl); + }; + })(); + + function formatSearchMetric(count, searchterm) { + if (count == 1) { + return count + " search result for '" + searchterm + "':"; + } else if (count == 0) { + return "No search results for '" + searchterm + "'."; + } else { + return count + " search results for '" + searchterm + "':"; + } + } + + function formatSearchResult(result, searchterms) { + var teaser = makeTeaser(escapeHTML(result.doc.body), searchterms); + teaser_count++; + + // The ?URL_MARK_PARAM= parameter belongs inbetween the page and the #heading-anchor + var url = doc_urls[result.ref].split("#"); + if (url.length == 1) { // no anchor found + url.push(""); + } + + // encodeURIComponent escapes all chars that could allow an XSS except + // for '. Due to that we also manually replace ' with its url-encoded + // representation (%27). + var searchterms = encodeURIComponent(searchterms.join(" ")).replace(/\'/g, "%27"); + + return '' + result.doc.breadcrumbs + '' + + '' + + teaser + ''; + } + + function makeTeaser(body, searchterms) { + // The strategy is as follows: + // First, assign a value to each word in the document: + // Words that correspond to search terms (stemmer aware): 40 + // Normal words: 2 + // First word in a sentence: 8 + // Then use a sliding window with a constant number of words and count the + // sum of the values of the words within the window. Then use the window that got the + // maximum sum. If there are multiple maximas, then get the last one. + // Enclose the terms in . + var stemmed_searchterms = searchterms.map(function(w) { + return elasticlunr.stemmer(w.toLowerCase()); + }); + var searchterm_weight = 40; + var weighted = []; // contains elements of ["word", weight, index_in_document] + // split in sentences, then words + var sentences = body.toLowerCase().split('. '); + var index = 0; + var value = 0; + var searchterm_found = false; + for (var sentenceindex in sentences) { + var words = sentences[sentenceindex].split(' '); + value = 8; + for (var wordindex in words) { + var word = words[wordindex]; + if (word.length > 0) { + for (var searchtermindex in stemmed_searchterms) { + if (elasticlunr.stemmer(word).startsWith(stemmed_searchterms[searchtermindex])) { + value = searchterm_weight; + searchterm_found = true; + } + }; + weighted.push([word, value, index]); + value = 2; + } + index += word.length; + index += 1; // ' ' or '.' if last word in sentence + }; + index += 1; // because we split at a two-char boundary '. ' + }; + + if (weighted.length == 0) { + return body; + } + + var window_weight = []; + var window_size = Math.min(weighted.length, results_options.teaser_word_count); + + var cur_sum = 0; + for (var wordindex = 0; wordindex < window_size; wordindex++) { + cur_sum += weighted[wordindex][1]; + }; + window_weight.push(cur_sum); + for (var wordindex = 0; wordindex < weighted.length - window_size; wordindex++) { + cur_sum -= weighted[wordindex][1]; + cur_sum += weighted[wordindex + window_size][1]; + window_weight.push(cur_sum); + }; + + if (searchterm_found) { + var max_sum = 0; + var max_sum_window_index = 0; + // backwards + for (var i = window_weight.length - 1; i >= 0; i--) { + if (window_weight[i] > max_sum) { + max_sum = window_weight[i]; + max_sum_window_index = i; + } + }; + } else { + max_sum_window_index = 0; + } + + // add around searchterms + var teaser_split = []; + var index = weighted[max_sum_window_index][2]; + for (var i = max_sum_window_index; i < max_sum_window_index+window_size; i++) { + var word = weighted[i]; + if (index < word[2]) { + // missing text from index to start of `word` + teaser_split.push(body.substring(index, word[2])); + index = word[2]; + } + if (word[1] == searchterm_weight) { + teaser_split.push("") + } + index = word[2] + word[0].length; + teaser_split.push(body.substring(word[2], index)); + if (word[1] == searchterm_weight) { + teaser_split.push("") + } + }; + + return teaser_split.join(''); + } + + function init(config) { + results_options = config.results_options; + search_options = config.search_options; + searchbar_outer = config.searchbar_outer; + doc_urls = config.doc_urls; + searchindex = elasticlunr.Index.load(config.index); + + // Set up events + searchicon.addEventListener('click', function(e) { searchIconClickHandler(); }, false); + searchbar.addEventListener('keyup', function(e) { searchbarKeyUpHandler(); }, false); + document.addEventListener('keydown', function(e) { globalKeyHandler(e); }, false); + // If the user uses the browser buttons, do the same as if a reload happened + window.onpopstate = function(e) { doSearchOrMarkFromUrl(); }; + // Suppress "submit" events so the page doesn't reload when the user presses Enter + document.addEventListener('submit', function(e) { e.preventDefault(); }, false); + + // If reloaded, do the search or mark again, depending on the current url parameters + doSearchOrMarkFromUrl(); + } + + function unfocusSearchbar() { + // hacky, but just focusing a div only works once + var tmp = document.createElement('input'); + tmp.setAttribute('style', 'position: absolute; opacity: 0;'); + searchicon.appendChild(tmp); + tmp.focus(); + tmp.remove(); + } + + // On reload or browser history backwards/forwards events, parse the url and do search or mark + function doSearchOrMarkFromUrl() { + // Check current URL for search request + var url = parseURL(window.location.href); + if (url.params.hasOwnProperty(URL_SEARCH_PARAM) + && url.params[URL_SEARCH_PARAM] != "") { + showSearch(true); + searchbar.value = decodeURIComponent( + (url.params[URL_SEARCH_PARAM]+'').replace(/\+/g, '%20')); + searchbarKeyUpHandler(); // -> doSearch() + } else { + showSearch(false); + } + + if (url.params.hasOwnProperty(URL_MARK_PARAM)) { + var words = decodeURIComponent(url.params[URL_MARK_PARAM]).split(' '); + marker.mark(words, { + exclude: mark_exclude + }); + + var markers = document.querySelectorAll("mark"); + function hide() { + for (var i = 0; i < markers.length; i++) { + markers[i].classList.add("fade-out"); + window.setTimeout(function(e) { marker.unmark(); }, 300); + } + } + for (var i = 0; i < markers.length; i++) { + markers[i].addEventListener('click', hide); + } + } + } + + // Eventhandler for keyevents on `document` + function globalKeyHandler(e) { + if (e.altKey || e.ctrlKey || e.metaKey || e.shiftKey || e.target.type === 'textarea' || e.target.type === 'text') { return; } + + if (e.keyCode === ESCAPE_KEYCODE) { + e.preventDefault(); + searchbar.classList.remove("active"); + setSearchUrlParameters("", + (searchbar.value.trim() !== "") ? "push" : "replace"); + if (hasFocus()) { + unfocusSearchbar(); + } + showSearch(false); + marker.unmark(); + } else if (!hasFocus() && e.keyCode === SEARCH_HOTKEY_KEYCODE) { + e.preventDefault(); + showSearch(true); + window.scrollTo(0, 0); + searchbar.select(); + } else if (hasFocus() && e.keyCode === DOWN_KEYCODE) { + e.preventDefault(); + unfocusSearchbar(); + searchresults.firstElementChild.classList.add("focus"); + } else if (!hasFocus() && (e.keyCode === DOWN_KEYCODE + || e.keyCode === UP_KEYCODE + || e.keyCode === SELECT_KEYCODE)) { + // not `:focus` because browser does annoying scrolling + var focused = searchresults.querySelector("li.focus"); + if (!focused) return; + e.preventDefault(); + if (e.keyCode === DOWN_KEYCODE) { + var next = focused.nextElementSibling; + if (next) { + focused.classList.remove("focus"); + next.classList.add("focus"); + } + } else if (e.keyCode === UP_KEYCODE) { + focused.classList.remove("focus"); + var prev = focused.previousElementSibling; + if (prev) { + prev.classList.add("focus"); + } else { + searchbar.select(); + } + } else { // SELECT_KEYCODE + window.location.assign(focused.querySelector('a')); + } + } + } + + function showSearch(yes) { + if (yes) { + search_wrap.classList.remove('hidden'); + searchicon.setAttribute('aria-expanded', 'true'); + } else { + search_wrap.classList.add('hidden'); + searchicon.setAttribute('aria-expanded', 'false'); + var results = searchresults.children; + for (var i = 0; i < results.length; i++) { + results[i].classList.remove("focus"); + } + } + } + + function showResults(yes) { + if (yes) { + searchresults_outer.classList.remove('hidden'); + } else { + searchresults_outer.classList.add('hidden'); + } + } + + // Eventhandler for search icon + function searchIconClickHandler() { + if (search_wrap.classList.contains('hidden')) { + showSearch(true); + window.scrollTo(0, 0); + searchbar.select(); + } else { + showSearch(false); + } + } + + // Eventhandler for keyevents while the searchbar is focused + function searchbarKeyUpHandler() { + var searchterm = searchbar.value.trim(); + if (searchterm != "") { + searchbar.classList.add("active"); + doSearch(searchterm); + } else { + searchbar.classList.remove("active"); + showResults(false); + removeChildren(searchresults); + } + + setSearchUrlParameters(searchterm, "push_if_new_search_else_replace"); + + // Remove marks + marker.unmark(); + } + + // Update current url with ?URL_SEARCH_PARAM= parameter, remove ?URL_MARK_PARAM and #heading-anchor . + // `action` can be one of "push", "replace", "push_if_new_search_else_replace" + // and replaces or pushes a new browser history item. + // "push_if_new_search_else_replace" pushes if there is no `?URL_SEARCH_PARAM=abc` yet. + function setSearchUrlParameters(searchterm, action) { + var url = parseURL(window.location.href); + var first_search = ! url.params.hasOwnProperty(URL_SEARCH_PARAM); + if (searchterm != "" || action == "push_if_new_search_else_replace") { + url.params[URL_SEARCH_PARAM] = searchterm; + delete url.params[URL_MARK_PARAM]; + url.hash = ""; + } else { + delete url.params[URL_MARK_PARAM]; + delete url.params[URL_SEARCH_PARAM]; + } + // A new search will also add a new history item, so the user can go back + // to the page prior to searching. A updated search term will only replace + // the url. + if (action == "push" || (action == "push_if_new_search_else_replace" && first_search) ) { + history.pushState({}, document.title, renderURL(url)); + } else if (action == "replace" || (action == "push_if_new_search_else_replace" && !first_search) ) { + history.replaceState({}, document.title, renderURL(url)); + } + } + + function doSearch(searchterm) { + + // Don't search the same twice + if (current_searchterm == searchterm) { return; } + else { current_searchterm = searchterm; } + + if (searchindex == null) { return; } + + // Do the actual search + var results = searchindex.search(searchterm, search_options); + var resultcount = Math.min(results.length, results_options.limit_results); + + // Display search metrics + searchresults_header.innerText = formatSearchMetric(resultcount, searchterm); + + // Clear and insert results + var searchterms = searchterm.split(' '); + removeChildren(searchresults); + for(var i = 0; i < resultcount ; i++){ + var resultElem = document.createElement('li'); + resultElem.innerHTML = formatSearchResult(results[i], searchterms); + searchresults.appendChild(resultElem); + } + + // Display results + showResults(true); + } + + fetch(path_to_root + 'searchindex.json') + .then(response => response.json()) + .then(json => init(json)) + .catch(error => { // Try to load searchindex.js if fetch failed + var script = document.createElement('script'); + script.src = path_to_root + 'searchindex.js'; + script.onload = () => init(window.search); + document.head.appendChild(script); + }); + + // Exported functions + search.hasFocus = hasFocus; +})(window.search); diff --git a/searchindex.js b/searchindex.js new file mode 100644 index 0000000..fb6e422 --- /dev/null +++ b/searchindex.js @@ -0,0 +1 @@ +Object.assign(window.search, {"doc_urls":["cover.html","buy.html#buy-pdfepub-versions","buy.html#purchase-links","buy.html#bundles","buy.html#testimonials","buy.html#book-list","preface.html#preface","preface.html#prerequisites","preface.html#conventions","preface.html#acknowledgements","preface.html#feedback-and-errata","preface.html#author-info","preface.html#license","preface.html#book-version","introduction.html#introduction","introduction.html#installation","introduction.html#documentation","cat-tac.html#cat-and-tac","cat-tac.html#creating-text-files","cat-tac.html#concatenate-files","cat-tac.html#accepting-stdin-data","cat-tac.html#squeeze-consecutive-empty-lines","cat-tac.html#prefix-line-numbers","cat-tac.html#viewing-special-characters","cat-tac.html#useless-use-of-cat","cat-tac.html#tac","cat-tac.html#customize-line-separator-for-tac","cat-tac.html#exercises","head-tail.html#head-and-tail","head-tail.html#leading-and-trailing-lines","head-tail.html#excluding-the-last-n-lines","head-tail.html#starting-from-the-nth-line","head-tail.html#multiple-input-files","head-tail.html#byte-selection","head-tail.html#range-of-lines","head-tail.html#nul-separator","head-tail.html#further-reading","head-tail.html#exercises","tr.html#tr","tr.html#transliteration","tr.html#different-length-sets","tr.html#escape-sequences-and-character-sets","tr.html#deleting-characters","tr.html#complement","tr.html#squeeze","tr.html#exercises","cut.html#cut","cut.html#individual-field-selections","cut.html#field-ranges","cut.html#input-field-delimiter","cut.html#output-field-delimiter","cut.html#complement","cut.html#suppress-lines-without-delimiters","cut.html#character-selections","cut.html#nul-separator","cut.html#alternatives","cut.html#exercises","seq.html#seq","seq.html#integer-sequences","seq.html#floating-point-sequences","seq.html#customizing-separator","seq.html#leading-zeros","seq.html#printf-style-formatting","seq.html#limitations","seq.html#exercises","shuf.html#shuf","shuf.html#randomize-input-lines","shuf.html#limit-output-lines","shuf.html#repeated-lines","shuf.html#specify-input-lines-as-arguments","shuf.html#generate-random-numbers","shuf.html#specifying-output-file","shuf.html#nul-separator","shuf.html#exercises","paste.html#paste","paste.html#concatenating-files-column-wise","paste.html#interleaving-lines","paste.html#multiple-columns-from-single-input","paste.html#multicharacter-delimiters","paste.html#serialize","paste.html#nul-separator","paste.html#exercises","pr.html#pr","pr.html#columnate","pr.html#customizing-page-width","pr.html#concatenating-files-column-wise","pr.html#miscellaneous","pr.html#exercises","fold-fmt.html#fold-and-fmt","fold-fmt.html#fold","fold-fmt.html#fmt","fold-fmt.html#exercises","sort.html#sort","sort.html#default-sort-and-collating-order","sort.html#ignoring-headers","sort.html#dictionary-sort","sort.html#reversed-order","sort.html#numeric-sort","sort.html#human-numeric-sort","sort.html#version-sort","sort.html#random-sort","sort.html#unique-sort","sort.html#column-sort","sort.html#character-positions-within-columns","sort.html#debugging","sort.html#check-if-sorted","sort.html#specifying-output-file","sort.html#merge-sort","sort.html#nul-separator","sort.html#further-reading","sort.html#exercises","uniq.html#uniq","uniq.html#retain-single-copy-of-duplicates","uniq.html#duplicates-only","uniq.html#unique-only","uniq.html#grouping-similar-lines","uniq.html#prefix-count","uniq.html#ignoring-case","uniq.html#partial-match","uniq.html#specifying-output-file","uniq.html#nul-separator","uniq.html#alternatives","uniq.html#exercises","comm.html#comm","comm.html#three-column-output","comm.html#suppressing-columns","comm.html#duplicate-lines","comm.html#nul-separator","comm.html#alternatives","comm.html#exercises","join.html#join","join.html#default-join","join.html#non-matching-lines","join.html#change-field-separator","join.html#files-with-headers","join.html#change-key-field","join.html#customize-output-field-list","join.html#same-number-of-output-fields","join.html#set-operations","join.html#nul-separator","join.html#alternatives","join.html#exercises","nl.html#nl","nl.html#default-numbering","nl.html#number-formatting","nl.html#customize-width","nl.html#customize-separator","nl.html#starting-number-and-step-value","nl.html#section-wise-numbering","nl.html#section-numbering-criteria","nl.html#exercises","wc.html#wc","wc.html#line-word-and-byte-counts","wc.html#individual-counts","wc.html#multiple-files","wc.html#character-count","wc.html#longest-line-length","wc.html#corner-cases","wc.html#exercises","split.html#split","split.html#default-split","split.html#change-number-of-lines","split.html#split-by-byte-count","split.html#divide-based-on-file-size","split.html#interleaved-lines","split.html#custom-line-separator","split.html#customize-filenames","split.html#exclude-empty-files","split.html#process-parts-through-another-command","split.html#exercises","csplit.html#csplit","csplit.html#split-on-nth-line","csplit.html#split-on-regexp","csplit.html#regexp-offset","csplit.html#repeat-split","csplit.html#keep-files-on-error","csplit.html#suppress-matched-lines","csplit.html#exclude-empty-files","csplit.html#customize-filenames","csplit.html#exercises","expand-unexpand.html#expand-and-unexpand","expand-unexpand.html#default-expand","expand-unexpand.html#expand-only-the-initial-tabs","expand-unexpand.html#customize-the-tab-stop-width","expand-unexpand.html#default-unexpand","expand-unexpand.html#unexpand-all-blanks","expand-unexpand.html#change-the-tab-stop-width","expand-unexpand.html#exercises","basename-dirname.html#basename-and-dirname","basename-dirname.html#extract-filename-from-paths","basename-dirname.html#remove-file-extension","basename-dirname.html#remove-filename-from-path","basename-dirname.html#multiple-arguments","basename-dirname.html#combining-basename-and-dirname","basename-dirname.html#nul-separator","basename-dirname.html#exercises","what_next.html#what-next","Exercise_solutions.html#exercise-solutions","Exercise_solutions.html#cat-and-tac","Exercise_solutions.html#head-and-tail","Exercise_solutions.html#tr","Exercise_solutions.html#cut","Exercise_solutions.html#seq","Exercise_solutions.html#shuf","Exercise_solutions.html#paste","Exercise_solutions.html#pr","Exercise_solutions.html#fold-and-fmt","Exercise_solutions.html#sort","Exercise_solutions.html#uniq","Exercise_solutions.html#comm","Exercise_solutions.html#join","Exercise_solutions.html#nl","Exercise_solutions.html#wc","Exercise_solutions.html#split","Exercise_solutions.html#csplit","Exercise_solutions.html#expand-and-unexpand","Exercise_solutions.html#basename-and-dirname"],"index":{"documentStore":{"docInfo":{"0":{"body":2,"breadcrumbs":1,"title":1},"1":{"body":0,"breadcrumbs":6,"title":3},"10":{"body":36,"breadcrumbs":3,"title":2},"100":{"body":37,"breadcrumbs":3,"title":2},"101":{"body":151,"breadcrumbs":3,"title":2},"102":{"body":346,"breadcrumbs":3,"title":2},"103":{"body":166,"breadcrumbs":5,"title":4},"104":{"body":126,"breadcrumbs":2,"title":1},"105":{"body":48,"breadcrumbs":3,"title":2},"106":{"body":71,"breadcrumbs":4,"title":3},"107":{"body":132,"breadcrumbs":3,"title":2},"108":{"body":26,"breadcrumbs":3,"title":2},"109":{"body":33,"breadcrumbs":3,"title":2},"11":{"body":102,"breadcrumbs":3,"title":2},"110":{"body":224,"breadcrumbs":2,"title":1},"111":{"body":17,"breadcrumbs":2,"title":1},"112":{"body":122,"breadcrumbs":5,"title":4},"113":{"body":41,"breadcrumbs":2,"title":1},"114":{"body":30,"breadcrumbs":2,"title":1},"115":{"body":84,"breadcrumbs":4,"title":3},"116":{"body":77,"breadcrumbs":3,"title":2},"117":{"body":24,"breadcrumbs":3,"title":2},"118":{"body":208,"breadcrumbs":3,"title":2},"119":{"body":26,"breadcrumbs":4,"title":3},"12":{"body":26,"breadcrumbs":2,"title":1},"120":{"body":35,"breadcrumbs":3,"title":2},"121":{"body":35,"breadcrumbs":2,"title":1},"122":{"body":213,"breadcrumbs":2,"title":1},"123":{"body":20,"breadcrumbs":2,"title":1},"124":{"body":145,"breadcrumbs":4,"title":3},"125":{"body":112,"breadcrumbs":3,"title":2},"126":{"body":57,"breadcrumbs":3,"title":2},"127":{"body":29,"breadcrumbs":3,"title":2},"128":{"body":56,"breadcrumbs":2,"title":1},"129":{"body":129,"breadcrumbs":2,"title":1},"13":{"body":7,"breadcrumbs":3,"title":2},"130":{"body":16,"breadcrumbs":2,"title":1},"131":{"body":178,"breadcrumbs":3,"title":2},"132":{"body":104,"breadcrumbs":4,"title":3},"133":{"body":77,"breadcrumbs":4,"title":3},"134":{"body":64,"breadcrumbs":3,"title":2},"135":{"body":70,"breadcrumbs":4,"title":3},"136":{"body":83,"breadcrumbs":5,"title":4},"137":{"body":178,"breadcrumbs":5,"title":4},"138":{"body":255,"breadcrumbs":3,"title":2},"139":{"body":33,"breadcrumbs":3,"title":2},"14":{"body":148,"breadcrumbs":2,"title":1},"140":{"body":55,"breadcrumbs":2,"title":1},"141":{"body":316,"breadcrumbs":2,"title":1},"142":{"body":24,"breadcrumbs":2,"title":1},"143":{"body":70,"breadcrumbs":3,"title":2},"144":{"body":60,"breadcrumbs":3,"title":2},"145":{"body":24,"breadcrumbs":3,"title":2},"146":{"body":34,"breadcrumbs":3,"title":2},"147":{"body":78,"breadcrumbs":5,"title":4},"148":{"body":244,"breadcrumbs":4,"title":3},"149":{"body":160,"breadcrumbs":4,"title":3},"15":{"body":44,"breadcrumbs":2,"title":1},"150":{"body":213,"breadcrumbs":2,"title":1},"151":{"body":10,"breadcrumbs":2,"title":1},"152":{"body":43,"breadcrumbs":5,"title":4},"153":{"body":69,"breadcrumbs":3,"title":2},"154":{"body":101,"breadcrumbs":3,"title":2},"155":{"body":33,"breadcrumbs":3,"title":2},"156":{"body":61,"breadcrumbs":4,"title":3},"157":{"body":176,"breadcrumbs":3,"title":2},"158":{"body":118,"breadcrumbs":2,"title":1},"159":{"body":46,"breadcrumbs":2,"title":1},"16":{"body":43,"breadcrumbs":2,"title":1},"160":{"body":104,"breadcrumbs":3,"title":2},"161":{"body":31,"breadcrumbs":4,"title":3},"162":{"body":137,"breadcrumbs":4,"title":3},"163":{"body":215,"breadcrumbs":5,"title":4},"164":{"body":61,"breadcrumbs":3,"title":2},"165":{"body":40,"breadcrumbs":4,"title":3},"166":{"body":248,"breadcrumbs":3,"title":2},"167":{"body":61,"breadcrumbs":4,"title":3},"168":{"body":67,"breadcrumbs":6,"title":5},"169":{"body":204,"breadcrumbs":2,"title":1},"17":{"body":20,"breadcrumbs":4,"title":2},"170":{"body":31,"breadcrumbs":2,"title":1},"171":{"body":117,"breadcrumbs":4,"title":3},"172":{"body":132,"breadcrumbs":3,"title":2},"173":{"body":129,"breadcrumbs":3,"title":2},"174":{"body":176,"breadcrumbs":3,"title":2},"175":{"body":77,"breadcrumbs":4,"title":3},"176":{"body":116,"breadcrumbs":4,"title":3},"177":{"body":74,"breadcrumbs":4,"title":3},"178":{"body":195,"breadcrumbs":3,"title":2},"179":{"body":341,"breadcrumbs":2,"title":1},"18":{"body":209,"breadcrumbs":5,"title":3},"180":{"body":18,"breadcrumbs":4,"title":2},"181":{"body":129,"breadcrumbs":4,"title":2},"182":{"body":65,"breadcrumbs":5,"title":3},"183":{"body":289,"breadcrumbs":6,"title":4},"184":{"body":131,"breadcrumbs":4,"title":2},"185":{"body":106,"breadcrumbs":4,"title":2},"186":{"body":120,"breadcrumbs":6,"title":4},"187":{"body":128,"breadcrumbs":3,"title":1},"188":{"body":30,"breadcrumbs":4,"title":2},"189":{"body":52,"breadcrumbs":5,"title":3},"19":{"body":55,"breadcrumbs":4,"title":2},"190":{"body":55,"breadcrumbs":5,"title":3},"191":{"body":22,"breadcrumbs":5,"title":3},"192":{"body":41,"breadcrumbs":4,"title":2},"193":{"body":19,"breadcrumbs":5,"title":3},"194":{"body":30,"breadcrumbs":4,"title":2},"195":{"body":89,"breadcrumbs":3,"title":1},"196":{"body":56,"breadcrumbs":2,"title":1},"197":{"body":0,"breadcrumbs":4,"title":2},"198":{"body":254,"breadcrumbs":4,"title":2},"199":{"body":182,"breadcrumbs":4,"title":2},"2":{"body":8,"breadcrumbs":5,"title":2},"20":{"body":61,"breadcrumbs":5,"title":3},"200":{"body":274,"breadcrumbs":3,"title":1},"201":{"body":281,"breadcrumbs":3,"title":1},"202":{"body":100,"breadcrumbs":3,"title":1},"203":{"body":147,"breadcrumbs":3,"title":1},"204":{"body":176,"breadcrumbs":3,"title":1},"205":{"body":147,"breadcrumbs":3,"title":1},"206":{"body":160,"breadcrumbs":4,"title":2},"207":{"body":326,"breadcrumbs":3,"title":1},"208":{"body":349,"breadcrumbs":3,"title":1},"209":{"body":159,"breadcrumbs":3,"title":1},"21":{"body":56,"breadcrumbs":6,"title":4},"210":{"body":393,"breadcrumbs":3,"title":1},"211":{"body":295,"breadcrumbs":3,"title":1},"212":{"body":169,"breadcrumbs":3,"title":1},"213":{"body":264,"breadcrumbs":3,"title":1},"214":{"body":440,"breadcrumbs":3,"title":1},"215":{"body":202,"breadcrumbs":4,"title":2},"216":{"body":119,"breadcrumbs":4,"title":2},"22":{"body":101,"breadcrumbs":5,"title":3},"23":{"body":153,"breadcrumbs":5,"title":3},"24":{"body":225,"breadcrumbs":5,"title":3},"25":{"body":170,"breadcrumbs":3,"title":1},"26":{"body":175,"breadcrumbs":6,"title":4},"27":{"body":232,"breadcrumbs":3,"title":1},"28":{"body":40,"breadcrumbs":4,"title":2},"29":{"body":145,"breadcrumbs":5,"title":3},"3":{"body":21,"breadcrumbs":4,"title":1},"30":{"body":33,"breadcrumbs":6,"title":4},"31":{"body":38,"breadcrumbs":5,"title":3},"32":{"body":50,"breadcrumbs":5,"title":3},"33":{"body":85,"breadcrumbs":4,"title":2},"34":{"body":64,"breadcrumbs":4,"title":2},"35":{"body":26,"breadcrumbs":4,"title":2},"36":{"body":24,"breadcrumbs":4,"title":2},"37":{"body":171,"breadcrumbs":3,"title":1},"38":{"body":31,"breadcrumbs":2,"title":1},"39":{"body":128,"breadcrumbs":2,"title":1},"4":{"body":90,"breadcrumbs":4,"title":1},"40":{"body":131,"breadcrumbs":4,"title":3},"41":{"body":107,"breadcrumbs":5,"title":4},"42":{"body":28,"breadcrumbs":3,"title":2},"43":{"body":78,"breadcrumbs":2,"title":1},"44":{"body":53,"breadcrumbs":2,"title":1},"45":{"body":217,"breadcrumbs":2,"title":1},"46":{"body":19,"breadcrumbs":2,"title":1},"47":{"body":96,"breadcrumbs":4,"title":3},"48":{"body":47,"breadcrumbs":3,"title":2},"49":{"body":59,"breadcrumbs":4,"title":3},"5":{"body":58,"breadcrumbs":5,"title":2},"50":{"body":78,"breadcrumbs":4,"title":3},"51":{"body":47,"breadcrumbs":2,"title":1},"52":{"body":84,"breadcrumbs":5,"title":4},"53":{"body":66,"breadcrumbs":3,"title":2},"54":{"body":29,"breadcrumbs":3,"title":2},"55":{"body":54,"breadcrumbs":2,"title":1},"56":{"body":241,"breadcrumbs":2,"title":1},"57":{"body":21,"breadcrumbs":2,"title":1},"58":{"body":108,"breadcrumbs":3,"title":2},"59":{"body":41,"breadcrumbs":4,"title":3},"6":{"body":42,"breadcrumbs":2,"title":1},"60":{"body":57,"breadcrumbs":3,"title":2},"61":{"body":43,"breadcrumbs":3,"title":2},"62":{"body":39,"breadcrumbs":4,"title":3},"63":{"body":60,"breadcrumbs":2,"title":1},"64":{"body":91,"breadcrumbs":2,"title":1},"65":{"body":18,"breadcrumbs":2,"title":1},"66":{"body":60,"breadcrumbs":4,"title":3},"67":{"body":36,"breadcrumbs":4,"title":3},"68":{"body":48,"breadcrumbs":3,"title":2},"69":{"body":74,"breadcrumbs":5,"title":4},"7":{"body":31,"breadcrumbs":2,"title":1},"70":{"body":114,"breadcrumbs":4,"title":3},"71":{"body":51,"breadcrumbs":4,"title":3},"72":{"body":27,"breadcrumbs":3,"title":2},"73":{"body":92,"breadcrumbs":2,"title":1},"74":{"body":13,"breadcrumbs":2,"title":1},"75":{"body":155,"breadcrumbs":5,"title":4},"76":{"body":21,"breadcrumbs":3,"title":2},"77":{"body":131,"breadcrumbs":5,"title":4},"78":{"body":157,"breadcrumbs":3,"title":2},"79":{"body":83,"breadcrumbs":2,"title":1},"8":{"body":75,"breadcrumbs":2,"title":1},"80":{"body":27,"breadcrumbs":3,"title":2},"81":{"body":135,"breadcrumbs":2,"title":1},"82":{"body":49,"breadcrumbs":2,"title":1},"83":{"body":238,"breadcrumbs":2,"title":1},"84":{"body":172,"breadcrumbs":4,"title":3},"85":{"body":178,"breadcrumbs":5,"title":4},"86":{"body":65,"breadcrumbs":2,"title":1},"87":{"body":100,"breadcrumbs":2,"title":1},"88":{"body":19,"breadcrumbs":4,"title":2},"89":{"body":137,"breadcrumbs":3,"title":1},"9":{"body":62,"breadcrumbs":2,"title":1},"90":{"body":236,"breadcrumbs":3,"title":1},"91":{"body":120,"breadcrumbs":3,"title":1},"92":{"body":29,"breadcrumbs":2,"title":1},"93":{"body":178,"breadcrumbs":5,"title":4},"94":{"body":58,"breadcrumbs":3,"title":2},"95":{"body":30,"breadcrumbs":3,"title":2},"96":{"body":41,"breadcrumbs":3,"title":2},"97":{"body":176,"breadcrumbs":3,"title":2},"98":{"body":46,"breadcrumbs":4,"title":3},"99":{"body":87,"breadcrumbs":3,"title":2}},"docs":{"0":{"body":"book cover","breadcrumbs":"Cover","id":"0","title":"Cover"},"1":{"body":"","breadcrumbs":"Buy PDF/EPUB versions » Buy PDF/EPUB versions","id":"1","title":"Buy PDF/EPUB versions"},"10":{"body":"I would highly appreciate it if you'd let me know how you felt about this book. It could be anything from a simple thank you, pointing out a typo, mistakes in code snippets, which aspects of the book worked for you (or didn't!) and so on. Reader feedback is essential and especially so for self-published authors. You can reach me via: Issue Manager: https://github.com/learnbyexample/cli_text_processing_coreutils/issues E-mail: learnbyexample.net@gmail.com Twitter: https://twitter.com/learn_byexample","breadcrumbs":"Preface » Feedback and Errata","id":"10","title":"Feedback and Errata"},"100":{"body":"The -R option will display the output in random order. Unlike shuf, this option will always place identical lines next to each other due to the implementation. # the two lines with '42' will always be next to each other\n# use 'shuf' if you don't want this behavior\n$ sort -R mixed_numbers.txt\n31.24\n5678\n42\n42\n12,345\n-100","breadcrumbs":"sort » Random sort","id":"100","title":"Random sort"},"101":{"body":"The -u option will keep only the first copy of lines that are deemed equal. # (10) and [10] are deemed equal with dictionary sorting\n$ printf '(10)\\n[20]\\n[10]' | sort -du\n(10)\n[20] $ cat purchases.txt\ncoffee\ntea\nwashing powder\ncoffee\ntoothpaste\ntea\nsoap\ntea\n$ sort -u purchases.txt\ncoffee\nsoap\ntea\ntoothpaste\nwashing powder As seen earlier, the -n option will work even if there are extra characters after the number. When the -u option is also used, only the first such copy will be retained. Use the uniq command if you want to remove duplicates based on the whole line. $ printf '2 balls\\n13 pens\\n2 pins\\n13 pens\\n' | sort -nu\n2 balls\n13 pens # note that only the output order is reversed\n# use tac if you want the last duplicate to be preserved instead of the first\n$ printf '2 balls\\n13 pens\\n2 pins\\n13 pens\\n' | sort -r -nu\n13 pens\n2 balls # use uniq when the entire line contents should be compared\n$ printf '2 balls\\n13 pens\\n2 pins\\n13 pens\\n' | sort -n | uniq\n2 balls\n2 pins\n13 pens You can use the -f option to ignore case while determining duplicates. $ printf 'mat\\nbat\\nMAT\\ncar\\nbat\\n' | sort -u\nbat\ncar\nmat\nMAT # the first copy between 'mat' and 'MAT' is retained\n$ printf 'mat\\nbat\\nMAT\\ncar\\nbat\\n' | sort -fu\nbat\ncar\nmat","breadcrumbs":"sort » Unique sort","id":"101","title":"Unique sort"},"102":{"body":"The -k option allows you to sort based on specific columns instead of the entire input line. By default, the empty string between non-blank and blank characters is considered as the separator and thus the blanks are also part of the field contents. The effect of blanks and mitigation will be discussed later. The -k option accepts arguments in various ways. You can specify the starting and ending column numbers separated by a comma. If you specify only the starting column, the last column will be used as the ending column. Usually you just want to sort by a single column, in which case the same number is specified as both the starting and ending columns. Here's an example: $ cat shopping.txt\napple 50\ntoys 5\nPizza 2\nmango 25\nBanana 10 # sort based on the 2nd column numbers\n$ sort -k2,2n shopping.txt\nPizza 2\ntoys 5\nBanana 10\nmango 25\napple 50 info Note that in the above example, the -n option was also appended to the -k option. This makes it specific to that column and overrides global options, if any. Also, remember that the entire line will be used to break ties, unless otherwise specified. You can use the -t option to specify a single byte character as the field separator. Use \\0 to specify NUL as the separator. Depending on your shell you can use ANSI-C quoting to use escapes like \\t instead of a literal tab character. When the -t option is used, the field separator won't be part of the field contents. # department,name,marks\n$ cat marks.csv\nECE,Raj,53\nECE,Joel,72\nEEE,Moi,68\nCSE,Surya,81\nEEE,Raj,88\nCSE,Moi,62\nEEE,Tia,72\nECE,Om,92\nCSE,Amy,67 # name column is the primary sort key\n# entire line content will be used for breaking ties\n$ sort -t, -k2,2 marks.csv\nCSE,Amy,67\nECE,Joel,72\nCSE,Moi,62\nEEE,Moi,68\nECE,Om,92\nECE,Raj,53\nEEE,Raj,88\nCSE,Surya,81\nEEE,Tia,72 You can use the -k option multiple times to specify your own order of tie breakers. Entire line will still be used to break ties if needed. # second column is the primary key\n# reversed numeric sort on the third column is the secondary key\n# entire line will be used only if there are still tied entries\n$ sort -t, -k2,2 -k3,3nr marks.csv\nCSE,Amy,67\nECE,Joel,72\nEEE,Moi,68\nCSE,Moi,62\nECE,Om,92\nEEE,Raj,88\nECE,Raj,53\nCSE,Surya,81\nEEE,Tia,72 # sort by month first and then the day\n# -M option sorts based on abbreviated month names\n$ printf 'Aug-20\\nMay-5\\nAug-3' | sort -t- -k1,1M -k2,2n\nMay-5\nAug-3\nAug-20 Use the -s option to retain the original order of input lines when two or more lines are deemed equal. You can still use multiple keys to specify your own tie breakers, -s only prevents the last resort comparison. # -s prevents last resort comparison\n# so, lines having the same value in the 2nd column will retain input order\n$ sort -t, -s -k2,2 marks.csv\nCSE,Amy,67\nECE,Joel,72\nEEE,Moi,68\nCSE,Moi,62\nECE,Om,92\nECE,Raj,53\nEEE,Raj,88\nCSE,Surya,81\nEEE,Tia,72 The -u option, as discussed earlier, will retain only the first copy of lines that are deemed equal. # only the first copy of duplicates in the 2nd column will be retained\n$ sort -t, -u -k2,2 marks.csv\nCSE,Amy,67\nECE,Joel,72\nEEE,Moi,68\nECE,Om,92\nECE,Raj,53\nCSE,Surya,81\nEEE,Tia,72","breadcrumbs":"sort » Column sort","id":"102","title":"Column sort"},"103":{"body":"The -k option also accepts starting and ending character positions within the columns. These are specified after the column number, separated by a . character. If the character position is not specified for the ending column, the last character of that column is assumed. The character positions start with 1 for the first character. Recall that when the -t option is used, the field separator is not part of the field contents. # based on the second column number\n# 2.2 helps to ignore first character, otherwise -n won't have any effect here\n$ printf 'car,(20)\\njeep,[10]\\ntruck,(5)\\nbus,[3]' | sort -t, -k2.2,2n\nbus,[3]\ntruck,(5)\njeep,[10]\ncar,(20) # first character of the second column is the primary key\n# entire line acts as the last resort tie breaker\n$ printf 'car,(20)\\njeep,[10]\\ntruck,(5)\\nbus,[3]' | sort -t, -k2.1,2.1\ncar,(20)\ntruck,(5)\nbus,[3]\njeep,[10] The default separation based on blank characters works differently. The empty string between non-blank and blank characters is considered as the separator and thus the blanks are also part of the field contents. You can use the -b option to ignore such leading blanks of field contents. # the second column here starts with blank characters\n# adjusting the character position isn't feasible due to varying blanks\n$ printf 'car (20)\\njeep [10]\\ntruck (5)\\nbus [3]' | sort -k2.2,2n\nbus [3]\ncar (20)\njeep [10]\ntruck (5) # use -b in such cases to ignore the leading blanks\n$ printf 'car (20)\\njeep [10]\\ntruck (5)\\nbus [3]' | sort -k2.2b,2n\nbus [3]\ntruck (5)\njeep [10]\ncar (20)","breadcrumbs":"sort » Character positions within columns","id":"103","title":"Character positions within columns"},"104":{"body":"The --debug option can help you identify issues if the output isn't what you expected. Here's the previously seen -b example, now with --debug enabled. The underscores in the debug output shows which portions of the input are used as primary key, secondary key and so on. The collating order being used is also shown in the output. $ printf 'car (20)\\njeep [10]\\ntruck (5)\\nbus [3]' | sort -k2.2,2n --debug\nsort: text ordering performed using ‘en_IN’ sorting rules\nsort: leading blanks are significant in key 1; consider also specifying 'b'\nsort: note numbers use ‘.’ as a decimal point in this locale\nbus [3] ^ no match for key\n_______\ncar (20) ^ no match for key\n________\njeep [10] ^ no match for key\n_________\ntruck (5) ^ no match for key\n_________ $ printf 'car (20)\\njeep [10]\\ntruck (5)\\nbus [3]' | sort -k2.2b,2n --debug\nsort: text ordering performed using ‘en_IN’ sorting rules\nsort: note numbers use ‘.’ as a decimal point in this locale\nbus [3] _\n_______\ntruck (5) _\n_________\njeep [10] __\n_________\ncar (20) __\n________","breadcrumbs":"sort » Debugging","id":"104","title":"Debugging"},"105":{"body":"The -c option helps you spot the first unsorted entry in the given input. The uppercase -C option is similar but only affects the exit status. Note that these options will not work for multiple inputs. $ cat shopping.txt\napple 50\ntoys 5\nPizza 2\nmango 25\nBanana 10 $ sort -c shopping.txt\nsort: shopping.txt:3: disorder: Pizza 2\n$ echo $?\n1 $ sort -C shopping.txt\n$ echo $?\n1","breadcrumbs":"sort » Check if sorted","id":"105","title":"Check if sorted"},"106":{"body":"The -o option can be used to specify the output file to be used for saving the results. $ sort -R nums.txt -o rand_nums.txt $ cat rand_nums.txt\n1000\n3.14\n42 You can use -o for in-place editing as well, but the documentation gives this warning: However, it is often safer to output to an otherwise-unused file, as data may be lost if the system crashes or sort encounters an I/O or other serious error while a file is being sorted in place. Also, sort with --merge (-m) can open the output file before reading all input, so a command like cat F | sort -m -o F - G is not safe as sort might start writing F before cat is done reading it.","breadcrumbs":"sort » Specifying output file","id":"106","title":"Specifying output file"},"107":{"body":"The -m option is useful if you have one or more sorted input files and need a single sorted output file. Typically the use case is that you want to add newly obtained data to existing sorted data. In such cases, you can sort only the new data separately and then combine all the sorted inputs using the -m option. Here's a sample timing comparison between different combinations of sorted/unsorted inputs. $ shuf -n1000000 -i1-999999999999 > n1.txt\n$ shuf -n1000000 -i1-999999999999 > n2.txt\n$ sort -n n1.txt > n1_sorted.txt\n$ sort -n n2.txt > n2_sorted.txt $ time sort -n n1.txt n2.txt > op1.txt\nreal 0m1.010s\n$ time sort -mn n1_sorted.txt <(sort -n n2.txt) > op2.txt\nreal 0m0.535s\n$ time sort -mn n1_sorted.txt n2_sorted.txt > op3.txt\nreal 0m0.218s $ diff -sq op1.txt op2.txt\nFiles op1.txt and op2.txt are identical\n$ diff -sq op1.txt op3.txt\nFiles op1.txt and op3.txt are identical $ rm n{1,2}{,_sorted}.txt op{1..3}.txt info You might wonder if you can improve the performance of a single large file using the -m option. By default, sort already uses the available processors to split the input and merge. You can use the --parallel option to customize this behavior.","breadcrumbs":"sort » Merge sort","id":"107","title":"Merge sort"},"108":{"body":"Use the -z option if you want to use NUL character as the line separator. In this scenario, sort will ensure to add a final NUL character even if not present in the input. $ printf 'cherry\\0apple\\0banana' | sort -z | cat -v\napple^@banana^@cherry^@","breadcrumbs":"sort » NUL separator","id":"108","title":"NUL separator"},"109":{"body":"A few options like --compress-program and --files0-from aren't covered in this book. See the sort manual for details and examples. See also: unix.stackexchange: Scalability of sort for gigantic files stackoverflow: Sort by last field when the number of fields varies Arch wiki: locale ShellHacks: locale and language settings","breadcrumbs":"sort » Further Reading","id":"109","title":"Further Reading"},"11":{"body":"Sundeep Agarwal is a lazy being who prefers to work just enough to support his modest lifestyle. He accumulated vast wealth working as a Design Engineer at Analog Devices and retired from the corporate world at the ripe age of twenty-eight. Unfortunately, he squandered his savings within a few years and had to scramble trying to earn a living. Against all odds, selling programming ebooks saved his lazy self from having to look for a job again. He can now afford all the fantasy ebooks he wants to read and spends unhealthy amount of time browsing the internet. When the creative muse strikes, he can be found working on yet another programming ebook (which invariably ends up having at least one example with regular expressions). Researching materials for his ebooks and everyday social media usage drowned his bookmarks, so he maintains curated resource lists for sanity sake. He is thankful for free learning resources and open source tools. His own contributions can be found at https://github.com/learnbyexample . List of books: https://learnbyexample.github.io/books/","breadcrumbs":"Preface » Author info","id":"11","title":"Author info"},"110":{"body":"info The exercises directory has all the files used in this section. 1) Default sort doesn't work for numbers. Which option would you use to get the expected output shown below? $ printf '100\\n10\\n20\\n3000\\n2.45\\n' | sort ##### add your solution here\n2.45\n10\n20\n100\n3000 2) Which sort option will help you ignore case? LC_ALL=C is used here to avoid differences due to locale. $ printf 'Super\\nover\\nRUNE\\ntea\\n' | LC_ALL=C sort ##### add your solution here\nover\nRUNE\nSuper\ntea 3) The -n option doesn't work for all sorts of numbers. Which sort option would you use to get the expected output shown below? # wrong output\n$ printf '+120\\n-1.53\\n3.14e+4\\n42.1e-2' | sort -n\n-1.53\n+120\n3.14e+4\n42.1e-2 # expected output\n$ printf '+120\\n-1.53\\n3.14e+4\\n42.1e-2' | sort ##### add your solution here\n-1.53\n42.1e-2\n+120\n3.14e+4 4) What do the -V and -h options do? 5) Is there a difference between shuf and sort -R? 6) Sort the scores.csv file numerically in ascending order using the contents of the second field. Header line should be preserved as the first line as shown below. $ cat scores.csv\nName,Maths,Physics,Chemistry\nIth,100,100,100\nCy,97,98,95\nLin,78,83,80 ##### add your solution here\nName,Maths,Physics,Chemistry\nLin,78,83,80\nCy,97,98,95\nIth,100,100,100 7) Sort the contents of duplicates.csv by the fourth column numbers in descending order. Retain only the first copy of lines with the same number. $ cat duplicates.csv\nbrown,toy,bread,42\ndark red,ruby,rose,111\nblue,ruby,water,333\ndark red,sky,rose,555\nyellow,toy,flower,333\nwhite,sky,bread,111\nlight red,purse,rose,333 ##### add your solution here\ndark red,sky,rose,555\nblue,ruby,water,333\ndark red,ruby,rose,111\nbrown,toy,bread,42 8) Sort the contents of duplicates.csv by the third column item. Use the fourth column numbers as the tie-breaker. ##### add your solution here\nbrown,toy,bread,42\nwhite,sky,bread,111\nyellow,toy,flower,333\ndark red,ruby,rose,111\nlight red,purse,rose,333\ndark red,sky,rose,555\nblue,ruby,water,333 9) What does the -s option provide? 10) Sort the given input based on the numbers inside the brackets. $ printf '(-3.14)\\n[45]\\n(12.5)\\n{14093}' | ##### add your solution here\n(-3.14)\n(12.5)\n[45]\n{14093} 11) What do the -c, -C and -m options do?","breadcrumbs":"sort » Exercises","id":"110","title":"Exercises"},"111":{"body":"The uniq command identifies similar lines that are adjacent to each other. There are various options to help you filter unique or duplicate lines, count them, group them, etc.","breadcrumbs":"uniq » uniq","id":"111","title":"uniq"},"112":{"body":"This is the default behavior of the uniq command. If adjacent lines are the same, only the first copy will be displayed in the output. # only the adjacent lines are compared to determine duplicates\n# which is why you get 'red' twice in the output for this input\n$ printf 'red\\nred\\nred\\ngreen\\nred\\nblue\\nblue' | uniq\nred\ngreen\nred\nblue You'll need sorted input to make sure all the input lines are considered to determine duplicates. For some cases, sort -u is enough, like the example shown below: # same as sort -u for this case\n$ printf 'red\\nred\\nred\\ngreen\\nred\\nblue\\nblue' | sort | uniq\nblue\ngreen\nred Sometimes though, you may need to sort based on some specific criteria and then identify duplicates based on the entire line contents. Here's an example: # can't use sort -n -u here\n$ printf '2 balls\\n13 pens\\n2 pins\\n13 pens\\n' | sort -n | uniq\n2 balls\n2 pins\n13 pens info sort+uniq won't be suitable if you need to preserve the input order as well. You can use alternatives like awk, perl and huniq for such cases. # retain only the first copy of duplicates, maintain input order\n$ printf 'red\\nred\\nred\\ngreen\\nred\\nblue\\nblue' | awk '!seen[$0]++'\nred\ngreen\nblue","breadcrumbs":"uniq » Retain single copy of duplicates","id":"112","title":"Retain single copy of duplicates"},"113":{"body":"The -d option will display only the duplicate entries. That is, only if a line is seen more than once. $ cat purchases.txt\ncoffee\ntea\nwashing powder\ncoffee\ntoothpaste\ntea\nsoap\ntea $ sort purchases.txt | uniq -d\ncoffee\ntea To display all the copies of duplicates, use the -D option. $ sort purchases.txt | uniq -D\ncoffee\ncoffee\ntea\ntea\ntea","breadcrumbs":"uniq » Duplicates only","id":"113","title":"Duplicates only"},"114":{"body":"The -u option will display only the unique entries. That is, only if a line doesn't occur more than once. $ sort purchases.txt | uniq -u\nsoap\ntoothpaste\nwashing powder # reminder that uniq works based on adjacent lines only\n$ printf 'red\\nred\\nred\\ngreen\\nred\\nblue\\nblue' | uniq -u\ngreen\nred","breadcrumbs":"uniq » Unique only","id":"114","title":"Unique only"},"115":{"body":"The --group options allows you to visually separate groups of similar lines with an empty line. This option can accept four values — separate, prepend, append and both. The default is separate, which adds a newline character between the groups. prepend will add a newline before the first group as well and append will add a newline after the last group. both combines the prepend and append behavior. $ sort purchases.txt | uniq --group\ncoffee\ncoffee soap tea\ntea\ntea toothpaste washing powder The --group option cannot be used with the -c, -d, -D or -u options. The --all-repeated alias for the -D option uses none as the default grouping. You can change that to separate or prepend values. $ sort purchases.txt | uniq --all-repeated=prepend coffee\ncoffee tea\ntea\ntea","breadcrumbs":"uniq » Grouping similar lines","id":"115","title":"Grouping similar lines"},"116":{"body":"If you want to know how many times a line has been repeated, use the -c option. This will be added as a prefix. $ sort purchases.txt | uniq -c 2 coffee 1 soap 3 tea 1 toothpaste 1 washing powder $ sort purchases.txt | uniq -dc 2 coffee 3 tea The output of this option is usually piped to sort for ordering the output based on the count. $ sort purchases.txt | uniq -c | sort -n 1 soap 1 toothpaste 1 washing powder 2 coffee 3 tea $ sort purchases.txt | uniq -c | sort -nr 3 tea 2 coffee 1 washing powder 1 toothpaste 1 soap","breadcrumbs":"uniq » Prefix count","id":"116","title":"Prefix count"},"117":{"body":"Use the -i option to ignore case while determining duplicates. # depending on your locale, sort and sort -f can give the same results\n$ printf 'hat\\nbat\\nHAT\\ncar\\nbat\\nmat\\nmoat' | sort -f | uniq -iD\nbat\nbat\nhat\nHAT","breadcrumbs":"uniq » Ignoring case","id":"117","title":"Ignoring case"},"118":{"body":"uniq has three options to change the matching criteria to partial parts of the input line. These aren't as powerful as the sort -k option, but they do come in handy for some use cases. The -f option allows you to skip the first N fields. Field separation is based on one or more space/tab characters only. Note that these separators will still be part of the field contents, so this will not work with variable number of blanks. # skip the first field, works as expected since the no. of blanks is consistent\n$ printf '2 cars\\n5 cars\\n10 jeeps\\n5 jeeps\\n3 trucks\\n' | uniq -f1 --group\n2 cars\n5 cars 10 jeeps\n5 jeeps 3 trucks # example with variable number of blanks\n# 'cars' entries were identified as duplicates, but not 'jeeps'\n$ printf '2 cars\\n5 cars\\n1 jeeps\\n5 jeeps\\n3 trucks\\n' | uniq -f1\n2 cars\n1 jeeps\n5 jeeps\n3 trucks The -s option allows you to skip the first N characters (calculated as bytes). # skip the first character\n$ printf '* red\\n- green\\n* green\\n* blue\\n= blue' | uniq -s1\n* red\n- green\n* blue The -w option restricts the comparison to the first N characters (calculated as bytes). # compare only the first 2 characters\n$ printf '1) apple\\n1) almond\\n2) banana\\n3) cherry' | uniq -w2\n1) apple\n2) banana\n3) cherry When these options are used simultaneously, the priority is -f first, then -s and finally the -w option. Remember that blanks are part of the field content. # skip the first field\n# then skip the first two characters (including the blank character)\n# use the next two characters for comparison ('bl' and 'ch' in this example)\n$ printf '2 @blue\\n10 :black\\n5 :cherry\\n3 @chalk' | uniq -f1 -s2 -w2\n2 @blue\n5 :cherry info If a line doesn't have enough fields or characters to satisfy the -f and -s options respectively, a null string is used for comparison.","breadcrumbs":"uniq » Partial match","id":"118","title":"Partial match"},"119":{"body":"uniq can accept filename as the source of input contents, but only a maximum of one file. If you specify another file, it will be used as the output file. $ printf 'apple\\napple\\nbanana\\ncherry\\ncherry\\ncherry' > ip.txt\n$ uniq ip.txt op.txt $ cat op.txt\napple\nbanana\ncherry","breadcrumbs":"uniq » Specifying output file","id":"119","title":"Specifying output file"},"12":{"body":"This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License . Code snippets are available under MIT License . Resources mentioned in Acknowledgements section above are available under original licenses.","breadcrumbs":"Preface » License","id":"12","title":"License"},"120":{"body":"Use the -z option if you want to use NUL character as the line separator. In this scenario, uniq will ensure to add a final NUL character even if not present in the input. $ printf 'cherry\\0cherry\\0cherry\\0apple\\0banana' | uniq -z | cat -v\ncherry^@apple^@banana^@ info If grouping is specified, NUL will be used as the separator instead of the newline character.","breadcrumbs":"uniq » NUL separator","id":"120","title":"NUL separator"},"121":{"body":"Here are some alternate commands you can explore if uniq isn't enough to solve your task. Dealing with duplicates chapter from my GNU awk ebook Dealing with duplicates chapter from my Perl one-liners ebook huniq — remove duplicates from entire input contents, input order is maintained, supports count option as well","breadcrumbs":"uniq » Alternatives","id":"121","title":"Alternatives"},"122":{"body":"info The exercises directory has all the files used in this section. 1) Will uniq throw an error if the input is not sorted? What do you think will be the output for the following input? $ printf 'red\\nred\\nred\\ngreen\\nred\\nblue\\nblue' | uniq 2) Are there differences between sort -u file and sort file | uniq? 3) What are the differences between sort -u and uniq -u options, if any? 4) Filter the third column items from duplicates.csv. Construct three solutions to display only unique items, duplicate items and all duplicates. $ cat duplicates.csv\nbrown,toy,bread,42\ndark red,ruby,rose,111\nblue,ruby,water,333\ndark red,sky,rose,555\nyellow,toy,flower,333\nwhite,sky,bread,111\nlight red,purse,rose,333 # unique\n##### add your solution here\nflower\nwater # duplicates\n##### add your solution here\nbread\nrose # all duplicates\n##### add your solution here\nbread\nbread\nrose\nrose\nrose 5) What does the --group option do? What customization features are available? 6) Count the number of times input lines are repeated and display the results in the format shown below. $ s='brown\\nbrown\\nbrown\\ngreen\\nbrown\\nblue\\nblue'\n$ printf '%b' \"$s\" | ##### add your solution here 1 green 2 blue 4 brown 7) For the input file f1.txt, retain only unique entries based on the first two characters of each line. For example, abcd and ab12 should be considered as duplicates and neither of them will be part of the output. $ cat f1.txt\n3) cherry\n1) apple\n2) banana\n1) almond\n4) mango\n2) berry\n3) chocolate\n1) apple\n5) cherry ##### add your solution here\n4) mango\n5) cherry 8) For the input file f1.txt, display only the duplicate items without considering the first two characters of each line. For example, abcd and 12cd should be considered as duplicates. Assume that the third character of each line is always a space character. ##### add your solution here\n1) apple\n3) cherry 9) What does the -s option do? 10) Filter only unique lines, but ignore differences due to case. $ printf 'cat\\nbat\\nCAT\\nCar\\nBat\\nmat\\nMat' | ##### add your solution here\nCar","breadcrumbs":"uniq » Exercises","id":"122","title":"Exercises"},"123":{"body":"The comm command finds common and unique lines between two sorted files. These results are formatted as a table with three columns and one or more of these columns can be suppressed as required.","breadcrumbs":"comm » comm","id":"123","title":"comm"},"124":{"body":"Consider the sample input files as shown below: # side by side view of the sample files\n# note that these files are already sorted\n$ paste colors_1.txt colors_2.txt\nBlue Black\nBrown Blue\nOrange Green\nPurple Orange\nRed Pink\nTeal Red\nWhite White By default, comm gives a tabular output with three columns: first column has lines unique to the first file second column has lines unique to the second file third column has lines common to both the files The columns are separated by a tab character. Here's the output for the above sample files: $ comm colors_1.txt colors_2.txt Black Blue\nBrown Green Orange Pink\nPurple Red\nTeal White You can change the column separator to a string of your choice using the --output-delimiter option. Here's an example: # note that the input files need not have the same number of lines\n$ comm <(seq 3) <(seq 2 5)\n1 2 3 4 5 $ comm --output-delimiter=, <(seq 3) <(seq 2 5)\n1\n,,2\n,,3\n,4\n,5 info Collating order for comm should be same as the one used to sort the input files. info --nocheck-order option can be used for unsorted inputs. However, as per the documentation, this option \"is not guaranteed to produce any particular output.\"","breadcrumbs":"comm » Three column output","id":"124","title":"Three column output"},"125":{"body":"You can use one or more of the following options to suppress columns: -1 to suppress the lines unique to the first file -2 to suppress the lines unique to the second file -3 to suppress the lines common to both the files Here's how the output looks like when you suppress one of the columns: # suppress lines common to both the files\n$ comm -3 colors_1.txt colors_2.txt Black\nBrown Green Pink\nPurple\nTeal Combining two of these options gives three useful solutions. -12 will give you only the common lines. $ comm -12 colors_1.txt colors_2.txt\nBlue\nOrange\nRed\nWhite -23 will give you the lines unique to the first file. $ comm -23 colors_1.txt colors_2.txt\nBrown\nPurple\nTeal -13 will give you the lines unique to the second file. $ comm -13 colors_1.txt colors_2.txt\nBlack\nGreen\nPink You can combine all the three options as well. Useful with the --total option to get only the count of lines for each of the three columns. $ comm --total -123 colors_1.txt colors_2.txt\n3 3 4 total","breadcrumbs":"comm » Suppressing columns","id":"125","title":"Suppressing columns"},"126":{"body":"The number of duplicate lines in the common column will be minimum of the duplicate occurrences between the two files. Rest of the duplicate lines, if any, will be considered as unique to the file having the excess lines. Here's an example: $ paste list_1.txt list_2.txt\napple cherry\nbanana cherry\ncherry mango\ncherry papaya\ncherry cherry # 'cherry' occurs only twice in the second file\n# rest of the 'cherry' lines will be unique to the first file\n$ comm list_1.txt list_2.txt\napple\nbanana cherry cherry\ncherry\ncherry mango papaya","breadcrumbs":"comm » Duplicate lines","id":"126","title":"Duplicate lines"},"127":{"body":"Use the -z option if you want to use NUL character as the line separator. In this scenario, comm will ensure to add a final NUL character even if not present in the input. $ comm -z -12 <(printf 'a\\0b\\0c') <(printf 'a\\0c\\0x') | cat -v\na^@c^@","breadcrumbs":"comm » NUL separator","id":"127","title":"NUL separator"},"128":{"body":"Here are some alternate commands you can explore if comm isn't enough to solve your task. These alternatives do not require the input files to be sorted. zet — set operations on one or more input files Comparing lines between files section from my GNU grep ebook Two file processing chapter from my GNU awk ebook, has examples for both line and field based comparisons Two file processing chapter from my Perl one-liners ebook, has examples for both line and field based comparisons","breadcrumbs":"comm » Alternatives","id":"128","title":"Alternatives"},"129":{"body":"info The exercises directory has all the files used in this section. 1) Get the common lines between the s1.txt and s2.txt files. Assume that their contents are already sorted. $ paste s1.txt s2.txt\napple banana\ncoffee coffee\nfig eclair\nhoney fig\nmango honey\npasta milk\nsugar tea\ntea yeast ##### add your solution here\ncoffee\nfig\nhoney\ntea 2) Display lines present in s1.txt but not s2.txt and vice versa. # lines unique to the first file\n##### add your solution here\napple\nmango\npasta\nsugar # lines unique to the second file\n##### add your solution here\nbanana\neclair\nmilk\nyeast 3) Display lines unique to the s1.txt file and the common lines when compared to the s2.txt file. Use ==> to separate the output columns. ##### add your solution here\napple\n==>coffee\n==>fig\n==>honey\nmango\npasta\nsugar\n==>tea 4) What does the --total option do? 5) Will the comm command fail if there are repeated lines in the input files? If not, what'd be the expected output for the command shown below? $ cat s3.txt\napple\napple\nguava\nhoney\ntea\ntea\ntea $ comm -23 s3.txt s1.txt","breadcrumbs":"comm » Exercises","id":"129","title":"Exercises"},"13":{"body":"2.0 See Version_changes.md to track changes across book versions.","breadcrumbs":"Preface » Book version","id":"13","title":"Book version"},"130":{"body":"The join command helps you to combine lines from two files based on a common field. This works best when the input is already sorted by that field.","breadcrumbs":"join » join","id":"130","title":"join"},"131":{"body":"By default, join combines two files based on the first field content (also referred as key ). Only the lines with common keys will be part of the output. The key field will be displayed first in the output (this distinction will come into play if the first field isn't the key). Rest of the line will have the remaining fields from the first and second files, in that order. One or more blanks (space or tab) will be considered as the input field separator and a single space will be used as the output field separator. If present, blank characters at the start of the input lines will be ignored. # sample sorted input files\n$ cat shopping_jan.txt\napple 10\nbanana 20\nsoap 3\ntshirt 3\n$ cat shopping_feb.txt\nbanana 15\nfig 100\npen 2\nsoap 1 # combine common lines based on the first field\n$ join shopping_jan.txt shopping_feb.txt\nbanana 20 15\nsoap 3 1 If a field value is present multiple times in the same input file, all possible combinations will be present in the output. As shown below, join will also ensure to add a final newline character even if it wasn't present in the input. $ join <(printf 'a f1_x\\na f1_y') <(printf 'a f2_x\\na f2_y')\na f1_x f2_x\na f1_x f2_y\na f1_y f2_x\na f1_y f2_y info Note that the collating order used for join should be same as the one used to sort the input files. Use join -i to ignore case, similar to sort -f usage. info If the input files are not sorted, join will produce an error if there are unpairable lines. You can use the --nocheck-order option to ignore this error. However, as per the documentation, this option \"is not guaranteed to produce any particular output.\"","breadcrumbs":"join » Default join","id":"131","title":"Default join"},"132":{"body":"By default, only the lines having common keys are part of the output. You can use the -a option to also include the non-matching lines from the input files. Use 1 and 2 as the argument for the first and second file respectively. You'll later see how to fill missing fields with a custom string. # includes non-matching lines from the first file\n$ join -a1 shopping_jan.txt shopping_feb.txt\napple 10\nbanana 20 15\nsoap 3 1\ntshirt 3 # includes non-matching lines from both the files\n$ join -a1 -a2 shopping_jan.txt shopping_feb.txt\napple 10\nbanana 20 15\nfig 100\npen 2\nsoap 3 1\ntshirt 3 If you use -v instead of -a, the output will have only the non-matching lines. $ join -v2 shopping_jan.txt shopping_feb.txt\nfig 100\npen 2 $ join -v1 -v2 shopping_jan.txt shopping_feb.txt\napple 10\nfig 100\npen 2\ntshirt 3","breadcrumbs":"join » Non-matching lines","id":"132","title":"Non-matching lines"},"133":{"body":"You can use the -t option to specify a single byte character as the field separator. The output field separator will be same as the value used for the -t option. Use \\0 to specify NUL as the separator. Empty string will cause entire input line content to be considered as keys. Depending on your shell you can use ANSI-C quoting to use escapes like \\t instead of a literal tab character. $ cat marks.csv\nECE,Raj,53\nECE,Joel,72\nEEE,Moi,68\nCSE,Surya,81\nEEE,Raj,88\nCSE,Moi,62\nEEE,Tia,72\nECE,Om,92\nCSE,Amy,67\n$ cat dept.txt\nCSE\nECE # get all lines from marks.csv based on the first field keys in dept.txt\n$ join -t, <(sort marks.csv) dept.txt\nCSE,Amy,67\nCSE,Moi,62\nCSE,Surya,81\nECE,Joel,72\nECE,Om,92\nECE,Raj,53","breadcrumbs":"join » Change field separator","id":"133","title":"Change field separator"},"134":{"body":"Use the --header option to ignore first lines of both the input files from sorting consideration. Without this option, the join command might still work correctly if unpairable lines aren't found, but it is preferable to use --header when applicable. This option will also help when --check-order option is active. $ cat report_1.csv\nName,Maths,Physics\nAmy,78,95\nMoi,88,75\nRaj,67,76\n$ cat report_2.csv\nName,Chemistry\nAmy,85\nJoel,78\nRaj,72 $ join --check-order -t, report_1.csv report_2.csv\njoin: report_1.csv:2: is not sorted: Amy,78,95\n$ join --check-order --header -t, report_1.csv report_2.csv\nName,Maths,Physics,Chemistry\nAmy,78,95,85\nRaj,67,76,72","breadcrumbs":"join » Files with headers","id":"134","title":"Files with headers"},"135":{"body":"By default, the first field of both the input files are used to combine the lines. You can use -1 and -2 options followed by a field number to specify a different field number. You can use the -j option if the field number is the same for both the files. Recall that the key field is the first field in the output. You'll later see how to customize the output field order. $ cat names.txt\nAmy\nRaj\nTia # combine based on the second field of the first file\n# and the first field of the second file (default)\n$ join -t, -1 2 <(sort -t, -k2,2 marks.csv) names.txt\nAmy,CSE,67\nRaj,ECE,53\nRaj,EEE,88\nTia,EEE,72","breadcrumbs":"join » Change key field","id":"135","title":"Change key field"},"136":{"body":"Use the -o option to customize the fields required in the output and their order. Especially useful when the first field isn't the key. Each output field is specified as file number followed by a . character and then the field number. You can specify multiple fields separated by a , character. As a special case, you can use 0 to indicate the key field. # output field order is 1st, 2nd and 3rd fields from the first file\n$ join -t, -1 2 -o 1.1,1.2,1.3 <(sort -t, -k2,2 marks.csv) names.txt\nCSE,Amy,67\nECE,Raj,53\nEEE,Raj,88\nEEE,Tia,72 # 1st field from the first file, 2nd field from the second file\n# and then 2nd and 3rd fields from the first file\n$ join --header -t, -o 1.1,2.2,1.2,1.3 report_1.csv report_2.csv\nName,Chemistry,Maths,Physics\nAmy,85,78,95\nRaj,72,67,76","breadcrumbs":"join » Customize output field list","id":"136","title":"Customize output field list"},"137":{"body":"If you use auto as the argument for the -o option, first line of both the input files will be used to determine the number of output fields. If the other lines have extra fields, they will be discarded. $ join <(printf 'a 1 2\\nb p q r') <(printf 'a 3 4\\nb x y z')\na 1 2 3 4\nb p q r x y z $ join -o auto <(printf 'a 1 2\\nb p q r') <(printf 'a 3 4\\nb x y z')\na 1 2 3 4\nb p q x y If the other lines have lesser number of fields, the -e option will determine the string to be used as a filler (empty string is the default). # the second line has two empty fields\n$ join -o auto <(printf 'a 1 2\\nb p') <(printf 'a 3 4\\nb x')\na 1 2 3 4\nb p x $ join -o auto -e '-' <(printf 'a 1 2\\nb p') <(printf 'a 3 4\\nb x')\na 1 2 3 4\nb p - x - As promised earlier, here are some examples of filling fields for non-matching lines: $ join -o auto -a1 -e 'NA' shopping_jan.txt shopping_feb.txt\napple 10 NA\nbanana 20 15\nsoap 3 1\ntshirt 3 NA $ join -o auto -a1 -a2 -e 'NA' shopping_jan.txt shopping_feb.txt\napple 10 NA\nbanana 20 15\nfig NA 100\npen NA 2\nsoap 3 1\ntshirt 3 NA","breadcrumbs":"join » Same number of output fields","id":"137","title":"Same number of output fields"},"138":{"body":"This section covers whole line set operations you can perform on already sorted input files. Equivalent sort and uniq solutions will also be mentioned as comments (useful for unsorted inputs). Assume that there are no duplicate lines within an input file. These two sorted input files will be used for the examples to follow: $ paste colors_1.txt colors_2.txt\nBlue Black\nBrown Blue\nOrange Green\nPurple Orange\nRed Pink\nTeal Red\nWhite White Here's how you can get union and symmetric difference results. Recall that -t '' will cause the entire input line content to be considered as keys. # union\n# unsorted input: sort -u colors_1.txt colors_2.txt\n$ join -t '' -a1 -a2 colors_1.txt colors_2.txt\nBlack\nBlue\nBrown\nGreen\nOrange\nPink\nPurple\nRed\nTeal\nWhite # symmetric difference\n# unsorted input: sort colors_1.txt colors_2.txt | uniq -u\n$ join -t '' -v1 -v2 colors_1.txt colors_2.txt\nBlack\nBrown\nGreen\nPink\nPurple\nTeal Here's how you can get intersection and difference results. The equivalent comm solutions for sorted input is also mentioned in the comments. # intersection, same as: comm -12 colors_1.txt colors_2.txt\n# unsorted input: sort colors_1.txt colors_2.txt | uniq -d\n$ join -t '' colors_1.txt colors_2.txt\nBlue\nOrange\nRed\nWhite # difference, same as: comm -13 colors_1.txt colors_2.txt\n# unsorted input: sort colors_1.txt colors_1.txt colors_2.txt | uniq -u\n$ join -t '' -v2 colors_1.txt colors_2.txt\nBlack\nGreen\nPink # difference, same as: comm -23 colors_1.txt colors_2.txt\n# unsorted input: sort colors_1.txt colors_2.txt colors_2.txt | uniq -u\n$ join -t '' -v1 colors_1.txt colors_2.txt\nBrown\nPurple\nTeal As mentioned before, join will display all the combinations if there are duplicate entries. Here's an example to show the differences between sort, comm and join solutions for displaying common lines: $ paste list_1.txt list_2.txt\napple cherry\nbanana cherry\ncherry mango\ncherry papaya\ncherry cherry # only one entry per common line\n$ sort list_1.txt list_2.txt | uniq -d\ncherry # minimum of 'no. of entries in file1' and 'no. of entries in file2'\n$ comm -12 list_1.txt list_2.txt\ncherry\ncherry # 'no. of entries in file1' multiplied by 'no. of entries in file2'\n$ join -t '' list_1.txt list_2.txt\ncherry\ncherry\ncherry\ncherry\ncherry\ncherry\ncherry\ncherry","breadcrumbs":"join » Set operations","id":"138","title":"Set operations"},"139":{"body":"Use the -z option if you want to use NUL character as the line separator. In this scenario, join will ensure to add a final NUL character even if not present in the input. $ join -z <(printf 'a 1\\0b x') <(printf 'a 2\\0b y') | cat -v\na 1 2^@b x y^@","breadcrumbs":"join » NUL separator","id":"139","title":"NUL separator"},"14":{"body":"I've been using Linux since 2007, but it took me ten more years to really explore coreutils when I wrote tutorials for the Command Line Text Processing repository. Any beginner learning Linux command line tools would come across the cat command within the first week. Sooner or later, they'll come to know popular text processing tools like grep, head, tail, tr, sort, etc. If you were like me, you'd come across sed and awk, shudder at their complexity and prefer to use a scripting language like Perl and text editors like Vim instead (don't worry, I've already corrected that mistake). Knowing power tools like grep, sed and awk can help solve most of your text processing needs. So, why would you want to learn text processing tools from the coreutils package? The biggest motivation would be faster execution since these tools are optimized for the use cases they solve. And there's always the advantage of not having to write code (and test that solution) if there's an existing tool to solve the problem. This book will teach you more than twenty of such specialized text processing tools provided by the GNU coreutils package. Plenty of examples and exercise are provided to make it easier to understand a particular tool and its various features. Writing a book always has a few pleasant surprises for me. For this one, it was discovering a sort option for calendar months, regular expressions in the tac and nl commands, etc.","breadcrumbs":"Introduction » Introduction","id":"14","title":"Introduction"},"140":{"body":"Here are some alternate commands you can explore if join isn't enough to solve your task. These alternatives do not require input to be sorted. zet — set operations on one or more input files Comparing lines between files section from my GNU grep ebook Two file processing chapter from my GNU awk ebook, has examples for both line and field based comparisons Two file processing chapter from my Perl one-liners ebook, has examples for both line and field based comparisons","breadcrumbs":"join » Alternatives","id":"140","title":"Alternatives"},"141":{"body":"info The exercises directory has all the files used in this section. info Assume that the input files are already sorted for these exercises. 1) Use appropriate options to get the expected outputs shown below. # no output\n$ join <(printf 'apple 2\\nfig 5') <(printf 'Fig 10\\nmango 4') # expected output 1\n##### add your solution here\nfig 5 10 # expected output 2\n##### add your solution here\napple 2\nfig 5 10\nmango 4 2) Use the join command to display only the non-matching lines based on the first field. $ cat j1.txt\napple 2\nfig 5\nlemon 10\ntomato 22\n$ cat j2.txt\nalmond 33\nfig 115\nmango 20\npista 42 # first field items present in j1.txt but not j2.txt\n##### add your solution here\napple 2\nlemon 10\ntomato 22 # first field items present in j2.txt but not j1.txt\n##### add your solution here\nalmond 33\nmango 20\npista 42 3) Filter lines from j1.txt and j2.txt that match the items from s1.txt. $ cat s1.txt\napple\ncoffee\nfig\nhoney\nmango\npasta\nsugar\ntea ##### add your solution here\napple 2\nfig 115\nfig 5\nmango 20 4) Join the marks_1.csv and marks_2.csv files to get the expected output shown below. $ cat marks_1.csv\nName,Biology,Programming\nEr,92,77\nIth,100,100\nLin,92,100\nSil,86,98\n$ cat marks_2.csv\nName,Maths,Physics,Chemistry\nCy,97,98,95\nIth,100,100,100\nLin,78,83,80 ##### add your solution here\nName,Biology,Programming,Maths,Physics,Chemistry\nIth,100,100,100,100,100\nLin,92,100,78,83,80 5) By default, the first field is used to combine the lines. Which options are helpful if you want to change the key field to be used for joining? 6) Join the marks_1.csv and marks_2.csv files to get the expected output with specific fields as shown below. ##### add your solution here\nName,Programming,Maths,Biology\nIth,100,100,100\nLin,100,78,92 7) Join the marks_1.csv and marks_2.csv files to get the expected output shown below. Use 50 as the filler data. ##### add your solution here\nName,Biology,Programming,Maths,Physics,Chemistry\nCy,50,50,97,98,95\nEr,92,77,50,50,50\nIth,100,100,100,100,100\nLin,92,100,78,83,80\nSil,86,98,50,50,50 8) When you use the -o auto option, what'd happen to the extra fields compared to those in the first lines of the input data? 9) From the input files j3.txt and j4.txt, filter only the lines are unique — i.e. lines that are not common to these files. Assume that the input files do not have duplicate entries. $ cat j3.txt\nalmond\napple pie\ncold coffee\nhoney\nmango shake\npasta\nsugar\ntea\n$ cat j4.txt\napple\nbanana shake\ncoffee\nfig\nhoney\nmango shake\nmilk\ntea\nyeast ##### add your solution here\nalmond\napple\napple pie\nbanana shake\ncoffee\ncold coffee\nfig\nmilk\npasta\nsugar\nyeast 10) From the input files j3.txt and j4.txt, filter only the lines are common to these files. ##### add your solution here\nhoney\nmango shake\ntea","breadcrumbs":"join » Exercises","id":"141","title":"Exercises"},"142":{"body":"If the numbering options provided by cat isn't enough, nl might suit you better. Apart from options to customize the number formatting and the separator, you can also filter which lines should be numbered. Additionally, you can divide your input into sections and number them separately.","breadcrumbs":"nl » nl","id":"142","title":"nl"},"143":{"body":"By default, nl will prefix line numbers and a tab character to every non-empty input lines. The default number formatting is 6 characters wide and right justified with spaces. Similar to cat, the nl command will concatenate multiple inputs. # same as: cat -n greeting.txt fruits.txt nums.txt\n$ nl greeting.txt fruits.txt nums.txt 1 Hi there 2 Have a nice day 3 banana 4 papaya 5 mango 6 3.14 7 42 8 1000 # example for input with empty lines, same as: cat -b\n$ printf 'apple\\n\\nbanana\\n\\ncherry\\n' | nl 1 apple 2 banana 3 cherry","breadcrumbs":"nl » Default numbering","id":"143","title":"Default numbering"},"144":{"body":"You can use the -n option to customize the number formatting. The available styles are: rn right justified with space fillers (default) rz right justified with leading zeros ln left justified with space fillers # right justified with space fillers\n$ nl -n'rn' greeting.txt 1 Hi there 2 Have a nice day # right justified with leading zeros\n$ nl -n'rz' greeting.txt\n000001 Hi there\n000002 Have a nice day # left justified with space fillers\n$ nl -n'ln' greeting.txt\n1 Hi there\n2 Have a nice day","breadcrumbs":"nl » Number formatting","id":"144","title":"Number formatting"},"145":{"body":"You can use the -w option to specify the width to be used for the numbers (default is 6). $ nl greeting.txt 1 Hi there 2 Have a nice day $ nl -w2 greeting.txt 1 Hi there 2 Have a nice day","breadcrumbs":"nl » Customize width","id":"145","title":"Customize width"},"146":{"body":"By default, a tab character is used to separate the line number and the line content. You can use the -s option to specify your own custom string separator. $ nl -w2 -s' ' greeting.txt 1 Hi there 2 Have a nice day $ nl -w1 -s' --> ' greeting.txt\n1 --> Hi there\n2 --> Have a nice day","breadcrumbs":"nl » Customize separator","id":"146","title":"Customize separator"},"147":{"body":"The -v option allows you to specify a different starting integer. Negative integer is also allowed. $ nl -v10 greeting.txt 10 Hi there 11 Have a nice day $ nl -v-1 fruits.txt -1 banana 0 papaya 1 mango The -i option allows you to specify an integer as the step value (default is 1). $ nl -w2 -s') ' -i2 greeting.txt fruits.txt nums.txt 1) Hi there 3) Have a nice day 5) banana 7) papaya 9) mango\n11) 3.14\n13) 42\n15) 1000 $ nl -w1 -s'. ' -v8 -i-1 greeting.txt fruits.txt\n8. Hi there\n7. Have a nice day\n6. banana\n5. papaya\n4. mango","breadcrumbs":"nl » Starting number and step value","id":"147","title":"Starting number and step value"},"148":{"body":"If you organize your input with lines conforming to specific patterns, you can control their numbering separately. nl recognizes three types of sections with the following default patterns: \\:\\:\\: as header \\:\\: as body \\: as footer These special lines will be replaced with an empty line after numbering. The numbering will be reset at the start of every section. Here's an example with multiple body sections: $ cat body.txt\n\\:\\:\nHi there\nHow are you\n\\:\\:\nbanana\npapaya\nmango $ nl -w1 -s' ' body.txt 1 Hi there\n2 How are you 1 banana\n2 papaya\n3 mango Here's an example with both header and body sections. By default, header and footer section lines are not numbered (you'll see options to enable them later). $ cat header_body.txt\n\\:\\:\\:\nHeader\nteal\n\\:\\:\nHi there\nHow are you\n\\:\\:\nbanana\npapaya\nmango\n\\:\\:\\:\nHeader\ngreen $ nl -w1 -s' ' header_body.txt Header teal 1 Hi there\n2 How are you 1 banana\n2 papaya\n3 mango Header green And here's an example with all the three types of sections: $ cat all_sections.txt\n\\:\\:\\:\nHeader\nteal\n\\:\\:\nHi there\nHow are you\n\\:\\:\nbanana\npapaya\nmango\n\\:\nFooter $ nl -w1 -s' ' all_sections.txt Header teal 1 Hi there\n2 How are you 1 banana\n2 papaya\n3 mango Footer The -b, -h and -f options control which lines should be numbered for the three types of sections. Use a to number all lines of a particular section (other features will discussed later). $ nl -w1 -s' ' -ha -fa all_sections.txt 1 Header\n2 teal 1 Hi there\n2 How are you 1 banana\n2 papaya\n3 mango 1 Footer If you use the -p option, the numbering will not be reset on encountering a new section. $ nl -p -w1 -s' ' all_sections.txt Header teal 1 Hi there\n2 How are you 3 banana\n4 papaya\n5 mango Footer $ nl -p -w1 -s' ' -ha -fa all_sections.txt 1 Header\n2 teal 3 Hi there\n4 How are you 5 banana\n6 papaya\n7 mango 8 Footer The -d option allows you to customize the two character pattern used for sections. # pattern changed from \\: to %=\n$ cat body_sep.txt\n%=%=\napple\nbanana\n%=%=\nteal\ngreen $ nl -w1 -s' ' -d'%=' body_sep.txt 1 apple\n2 banana 1 teal\n2 green","breadcrumbs":"nl » Section wise numbering","id":"148","title":"Section wise numbering"},"149":{"body":"As mentioned earlier, the -b, -h and -f options control which lines should be numbered for the three types of sections. These options accept the following arguments: a number all lines, including empty lines t number lines except empty ones (default for body sections) n do not number lines (default for header and footer sections) pBRE use basic regular expressions (BRE) to filter lines for numbering If the input doesn't have special patterns to identify the different sections, it will be treated as if it has a single body section. Here's an example to include empty lines for numbering: $ printf 'apple\\n\\nbanana\\n\\ncherry\\n' | nl -w1 -s' ' -ba\n1 apple\n2 3 banana\n4 5 cherry The -l option controls how many consecutive empty lines should be considered as a single entry. Only the last empty line of such groupings will be numbered. # only the 2nd consecutive empty line will be considered for numbering\n$ printf 'a\\n\\n\\n\\n\\nb\\n\\nc' | nl -w1 -s' ' -ba -l2\n1 a 2 3 4 b 5 c Here's an example which uses regular expressions to identify the lines to be numbered: # number lines starting with 'c' or 't'\n$ nl -w1 -s' ' -bp'^[ct]' purchases.txt\n1 coffee\n2 tea washing powder\n3 coffee\n4 toothpaste\n5 tea soap\n6 tea info See the Regular Expressions chapter from my GNU grep ebook if you want to learn more about regexp syntax and features.","breadcrumbs":"nl » Section numbering criteria","id":"149","title":"Section numbering criteria"},"15":{"body":"On a GNU/Linux based OS, you are most likely to already have GNU coreutils installed. This book covers the version 9.1 of the coreutils package. To install a newer/particular version, see the coreutils download section for details. If you are not using a Linux distribution, you may be able to access coreutils using these options: Windows Subsystem for Linux — compatibility layer for running Linux binary executables natively on Windows brew — Package Manager for macOS (or Linux)","breadcrumbs":"Introduction » Installation","id":"15","title":"Installation"},"150":{"body":"info The exercises directory has all the files used in this section. 1) nl and cat -n are always equivalent for numbering lines. True or False? 2) What does the -n option do? 3) Use nl to produce the two expected outputs shown below. $ cat greeting.txt\nHi there\nHave a nice day # expected output 1\n##### add your solution here\n001 Hi there\n002 Have a nice day # expected output 2\n##### add your solution here\n001) Hi there\n002) Have a nice day 4) Figure out the logic based on the given input and output data. $ cat s1.txt\napple\ncoffee\nfig\nhoney\nmango\npasta\nsugar\ntea ##### add your solution here\n15. apple\n13. coffee\n11. fig 9. honey 7. mango 5. pasta 3. sugar 1. tea 5) What are the three types of sections supported by nl? 6) Only number the lines that start with ---- in the format shown below. $ cat blocks.txt\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000\n----\nsky blue\ndark green\n----\nhi hello ##### add your solution here 1) ---- apple--banana mango---fig 2) ---- 3.14 -42 1000 3) ---- sky blue dark green 4) ---- hi hello 7) For the blocks.txt file, determine the logic to produce the expected output shown below. ##### add your solution here 1. apple--banana\n2. mango---fig 1. 3.14\n2. -42\n3. 1000 1. sky blue\n2. dark green 1. hi hello 8) What does the -l option do? 9) Figure out the logic based on the given input and output data. $ cat all_sections.txt\n\\:\\:\\:\nHeader\nteal\n\\:\\:\nHi there\nHow are you\n\\:\\:\nbanana\npapaya\nmango\n\\:\nFooter ##### add your solution here 1) Header 2) teal 3) Hi there 4) How are you 5) banana 6) papaya 7) mango Footer","breadcrumbs":"nl » Exercises","id":"150","title":"Exercises"},"151":{"body":"The wc command is useful to count the number of lines, words and characters for the given inputs.","breadcrumbs":"wc » wc","id":"151","title":"wc"},"152":{"body":"By default, the wc command reports the number of lines, words and bytes (in that order). The byte count includes the newline characters, so you can use that as a measure of file size as well. Here's an example: $ cat greeting.txt\nHi there\nHave a nice day $ wc greeting.txt 2 6 25 greeting.txt Wondering why there are leading spaces in the output? They help in aligning results for multiple files (discussed later).","breadcrumbs":"wc » Line, word and byte counts","id":"152","title":"Line, word and byte counts"},"153":{"body":"Instead of the three default values, you can use options to get only the particular counts you are interested in. These options are: -l for line count -w for word count -c for byte count $ wc -l greeting.txt\n2 greeting.txt $ wc -w greeting.txt\n6 greeting.txt $ wc -c greeting.txt\n25 greeting.txt $ wc -wc greeting.txt 6 25 greeting.txt With stdin data, you'll get only the count value (unless you use - for stdin). Useful for assigning the output to shell variables. $ printf 'hello' | wc -c\n5\n$ printf 'hello' | wc -c -\n5 - $ lines=$(wc -l xaa <==\n1 ==> xab <==\n1001 ==> xae <==\n4001 ==> xaj <==\n9001 $ rm x* info warning As mentioned earlier, remove the output files after every illustration.","breadcrumbs":"split » Default split","id":"160","title":"Default split"},"161":{"body":"You can use the -l option to change the number of lines to be saved in each output file. # maximum of 3 lines at a time\n$ split -l3 purchases.txt $ head x*\n==> xaa <==\ncoffee\ntea\nwashing powder ==> xab <==\ncoffee\ntoothpaste\ntea ==> xac <==\nsoap\ntea","breadcrumbs":"split » Change number of lines","id":"161","title":"Change number of lines"},"162":{"body":"The -b option allows you to split the input by the number of bytes. Similar to line based splitting, you can always reconstruct the input by concatenating the output files. This option also accepts suffixes such as K for 1024 bytes, KB for 1000 bytes, M for 1024 * 1024 bytes and so on. # maximum of 15 bytes at a time\n$ split -b15 greeting.txt $ head x*\n==> xaa <==\nHi there\nHave a\n==> xab <== nice day # when you concatenate the output files, you'll the original input\n$ cat x*\nHi there\nHave a nice day The -C option is similar to the -b option, but it will try to break on line boundaries if possible. The break will happen before the given byte limit. Here's an example where input lines do not exceed the given byte limit: $ split -C20 purchases.txt $ head x*\n==> xaa <==\ncoffee\ntea ==> xab <==\nwashing powder ==> xac <==\ncoffee\ntoothpaste ==> xad <==\ntea\nsoap\ntea $ wc -c x*\n11 xaa\n15 xab\n18 xac\n13 xad\n57 total If a line exceeds the given limit, it will be broken down into multiple parts: $ printf 'apple\\nbanana\\n' | split -C4 $ head x*\n==> xaa <==\nappl\n==> xab <==\ne ==> xac <==\nbana\n==> xad <==\nna $ cat x*\napple\nbanana","breadcrumbs":"split » Split by byte count","id":"162","title":"Split by byte count"},"163":{"body":"The -n option has several features. If you pass only a numeric argument N, the given input file will be divided into N chunks. The output files will be roughly the same size. # divide the file into 2 parts\n$ split -n2 purchases.txt\n$ head x*\n==> xaa <==\ncoffee\ntea\nwashing powder\nco\n==> xab <==\nffee\ntoothpaste\ntea\nsoap\ntea # the two output files are roughly the same size\n$ wc x* 3 5 28 xaa 5 5 29 xab 8 10 57 total warning Since the division is based on file size, stdin data cannot be used. Newer versions of the coreutils package supports this use case by creating a temporary file before splitting. $ seq 6 | split -n2\nsplit: -: cannot determine file size By using K/N as the argument, you can view the Kth chunk of N parts on stdout. No output file will be created in this scenario. # divide the input into 2 parts\n# view only the 1st chunk on stdout\n$ split -n1/2 greeting.txt\nHi there\nHav To avoid splitting a line, use l/ as a prefix. Quoting from the manual : For l mode, chunks are approximately input size / N. The input is partitioned into N equal sized portions, with the last assigned any excess. If a line starts within a partition it is written completely to the corresponding file. Since lines or records are not split even if they overlap a partition, the files written can be larger or smaller than the partition size, and even empty if a line/record is so long as to completely overlap the partition. # divide input into 2 parts, but don't split lines\n$ split -nl/2 purchases.txt\n$ head x*\n==> xaa <==\ncoffee\ntea\nwashing powder\ncoffee ==> xab <==\ntoothpaste\ntea\nsoap\ntea Here's an example to view the Kth chunk without splitting lines: # 2nd chunk of 3 parts without splitting lines\n$ split -nl/2/3 sample.txt 7) Believe it 8) 9) banana\n10) papaya\n11) mango","breadcrumbs":"split » Divide based on file size","id":"163","title":"Divide based on file size"},"164":{"body":"The -n option will also help you create output files with interleaved lines. Since this is based on the line separator and not file size, stdin data can also be used. Use the r/ prefix to enable this feature. # two parts, lines distributed in round robin fashion\n$ seq 5 | split -nr/2 $ head x*\n==> xaa <==\n1\n3\n5 ==> xab <==\n2\n4 Here's an example to view the Kth chunk: $ split -nr/1/3 sample.txt 1) Hello World 4) How are you 7) Believe it\n10) papaya\n13) Much ado about nothing","breadcrumbs":"split » Interleaved lines","id":"164","title":"Interleaved lines"},"165":{"body":"You can use the -t option to specify a single byte character as the line separator. Use \\0 to specify NUL as the separator. Depending on your shell you can use ANSI-C quoting to use escapes like \\t instead of a literal tab character. $ printf 'apple\\nbanana\\n;mango\\npapaya\\n' | split -t';' -l1 $ head x*\n==> xaa <==\napple\nbanana\n;\n==> xab <==\nmango\npapaya","breadcrumbs":"split » Custom line separator","id":"165","title":"Custom line separator"},"166":{"body":"As seen earlier, x is the default prefix for output filenames. To change this prefix, pass an argument after the input source. # choose prefix as 'op_' instead of 'x'\n$ split -l1 greeting.txt op_ $ head op_*\n==> op_aa <==\nHi there ==> op_ab <==\nHave a nice day The -a option controls the length of the suffix. You'll get an error if this length isn't enough to cover all the output files. In such a case, you'll still get output files that can fit within the given length. $ seq 10 | split -l1 -a1\n$ ls x*\nxa xb xc xd xe xf xg xh xi xj\n$ rm x* $ seq 10 | split -l1 -a3\n$ ls x*\nxaaa xaab xaac xaad xaae xaaf xaag xaah xaai xaaj\n$ rm x* $ seq 100 | split -l1 -a1\nsplit: output file suffixes exhausted\n$ ls x*\nxa xc xe xg xi xk xm xo xq xs xu xw xy\nxb xd xf xh xj xl xn xp xr xt xv xx xz\n$ rm x* You can use the -d option to use numeric suffixes, starting from 00 (length can be changed using the -a option). You can use the long option --numeric-suffixes to specify a different starting number. $ seq 10 | split -l1 -d\n$ ls x*\nx00 x01 x02 x03 x04 x05 x06 x07 x08 x09\n$ rm x* $ seq 10 | split -l2 --numeric-suffixes=10\n$ ls x*\nx10 x11 x12 x13 x14 Use -x and --hex-suffixes options for hexadecimal numbering. $ seq 10 | split -l1 --hex-suffixes=8\n$ ls x*\nx08 x09 x0a x0b x0c x0d x0e x0f x10 x11 You can use the --additional-suffix option to add a constant string at the end of filenames. $ seq 10 | split -l2 -a1 --additional-suffix='.log'\n$ ls x*\nxa.log xb.log xc.log xd.log xe.log\n$ rm x* $ seq 10 | split -l2 -a1 -d --additional-suffix='.txt' - num_\n$ ls num_*\nnum_0.txt num_1.txt num_2.txt num_3.txt num_4.txt","breadcrumbs":"split » Customize filenames","id":"166","title":"Customize filenames"},"167":{"body":"You can sometimes end up with empty files. For example, trying to split into more parts than possible with the given criteria. In such cases, you can use the -e option to prevent empty files in the output. The split command will ensure that the filenames are sequential even if files in the middle are empty. # 'xac' is empty in this example\n$ split -nl/3 greeting.txt\n$ head x*\n==> xaa <==\nHi there ==> xab <==\nHave a nice day ==> xac <== $ rm x* # prevent empty files\n$ split -e -nl/3 greeting.txt\n$ head x*\n==> xaa <==\nHi there ==> xab <==\nHave a nice day","breadcrumbs":"split » Exclude empty files","id":"167","title":"Exclude empty files"},"168":{"body":"The --filter option will allow you to apply another command on the intermediate split results before saving the output files. Use $FILE to refer to the output filename of the intermediate parts. Here's an example of compressing the results: $ split -l1 --filter='gzip > $FILE.gz' greeting.txt $ ls x*\nxaa.gz xab.gz $ zcat xaa.gz\nHi there\n$ zcat xab.gz\nHave a nice day Here's an example of ignoring the first line of the results: $ cat body_sep.txt\n%=%=\napple\nbanana\n%=%=\nred\ngreen $ split -l3 --filter='tail -n +2 > $FILE' body_sep.txt $ head x*\n==> xaa <==\napple\nbanana ==> xab <==\nred\ngreen","breadcrumbs":"split » Process parts through another command","id":"168","title":"Process parts through another command"},"169":{"body":"info The exercises directory has all the files used in this section. info Remove the output files after every exercise. 1) Split the s1.txt file 3 lines at a time. ##### add your solution here $ head xa?\n==> xaa <==\napple\ncoffee\nfig ==> xab <==\nhoney\nmango\npasta ==> xac <==\nsugar\ntea $ rm xa? 2) Use appropriate options to get the output shown below. $ echo 'apple,banana,cherry,dates' | ##### add your solution here $ head xa?\n==> xaa <==\napple,\n==> xab <==\nbanana,\n==> xac <==\ncherry,\n==> xad <==\ndates $ rm xa? 3) What do the -b and -C options do? 4) Display the 2nd chunk of the ip.txt file after splitting it 4 times as shown below. ##### add your solution here\ncome back before the sky turns dark There are so many delights to cherish 5) What does the r prefix do when used with the -n option? 6) Split the ip.txt file 2 lines at a time. Customize the output filenames as shown below. ##### add your solution here $ head ip_*\n==> ip_0.txt <==\nit is a warm and cozy day\nlisten to what I say ==> ip_1.txt <==\ngo play in the park\ncome back before the sky turns dark ==> ip_2.txt <== There are so many delights to cherish ==> ip_3.txt <==\nApple, Banana and Cherry\nBread, Butter and Jelly ==> ip_4.txt <==\nTry them all before you perish $ rm ip_* 7) Which option would you use to prevent empty files in the output? 8) Split the items.txt file 5 lines at a time. Additionally, remove lines starting with a digit character as shown below. $ cat items.txt\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 ##### add your solution here $ head xa?\n==> xaa <==\napple 5\nbanana 10\ngreen ==> xab <==\nsky blue\ndragon 3\nunicorn 42 $ rm xa?","breadcrumbs":"split » Exercises","id":"169","title":"Exercises"},"17":{"body":"cat derives its name from con cat enation and provides other nifty options too. tac helps you to reverse the input line wise, usually used for further text processing.","breadcrumbs":"cat and tac » cat and tac","id":"17","title":"cat and tac"},"170":{"body":"The csplit command is useful to divide the input into smaller parts based on line numbers and regular expression patterns. Similar to split, this command also supports customizing output filenames. info Since a lot of output files will be generated in this chapter (often with same filenames), remove these files after every illustration.","breadcrumbs":"csplit » csplit","id":"170","title":"csplit"},"171":{"body":"You can split the input into two based on a particular line number. To do so, specify the line number after the input source (filename or stdin data). The first output file will have the input lines before the given line number and the second output file will have the rest of the contents. By default, the output files will be named xx00, xx01, xx02, and so on (where xx is the prefix). The numerical suffix will automatically use more digits if needed. You'll see examples with more than two output files later. # split input into two based on line number 4\n$ seq 10 | csplit - 4\n6\n15 # first output file will have the first 3 lines\n# second output file will have the rest\n$ head xx*\n==> xx00 <==\n1\n2\n3 ==> xx01 <==\n4\n5\n6\n7\n8\n9\n10 $ rm xx* info As seen in the example above, csplit will also display the number of bytes written for each output file. You can use the -q option to suppress this message. info warning As mentioned earlier, remove the output files after every illustration.","breadcrumbs":"csplit » Split on Nth line","id":"171","title":"Split on Nth line"},"172":{"body":"You can also split the input based on a line matching the given regular expression. The output produced will vary based on the // or %% delimiters being used to surround the regexp. When /regexp/ is used, output is similar to the line number based splitting. The first output file will have the input lines before the first occurrence of a line matching the given regexp and the second output file will have the rest of the contents. # match a line containing 't' followed by zero or more characters and then 'p'\n# 'toothpaste' is the only match for this input file\n$ csplit -q purchases.txt '/t.*p/' $ head xx*\n==> xx00 <==\ncoffee\ntea\nwashing powder\ncoffee ==> xx01 <==\ntoothpaste\ntea\nsoap\ntea When %regexp% is used, the lines occurring before the matching line won't be part of the output. Only the line matching the given regexp and the rest of the contents will be part of the single output file. $ csplit -q purchases.txt '%t.*p%' $ cat xx00\ntoothpaste\ntea\nsoap\ntea warning You'll get an error if the given regexp isn't found in the input. $ csplit -q purchases.txt '/xyz/'\ncsplit: ‘/xyz/’: match not found info See the Regular Expressions chapter from my GNU grep ebook if you want to learn more about regexp syntax and features.","breadcrumbs":"csplit » Split on regexp","id":"172","title":"Split on regexp"},"173":{"body":"You can also provide offset numbers that'll affect where the matching line and its surrounding lines should be placed. When the offset is greater than zero, the split will happen that many lines after the matching line. The default offset is zero. # when the offset is '1', the matching line will be part of the first file\n$ csplit -q purchases.txt '/t.*p/1'\n$ head xx*\n==> xx00 <==\ncoffee\ntea\nwashing powder\ncoffee\ntoothpaste ==> xx01 <==\ntea\nsoap\ntea # matching line and 1 line after won't be part of the output\n$ csplit -q purchases.txt '%t.*p%2'\n$ cat xx00\nsoap\ntea When the offset is less than zero, the split will happen that many lines before the matching line. # 2 lines before the matching line will be part of the second file\n$ csplit -q purchases.txt '/t.*p/-2'\n$ head xx*\n==> xx00 <==\ncoffee\ntea ==> xx01 <==\nwashing powder\ncoffee\ntoothpaste\ntea\nsoap\ntea warning You'll get an error if the offset goes beyond the number of lines available in the input. $ csplit -q purchases.txt '/t.*p/5'\ncsplit: ‘/t.*p/5’: line number out of range $ csplit -q purchases.txt '/t.*p/-5'\ncsplit: ‘/t.*p/-5’: line number out of range","breadcrumbs":"csplit » Regexp offset","id":"173","title":"Regexp offset"},"174":{"body":"You can perform line number and regexp based split more than once by adding the {N} argument after the pattern. Default behavior examples seen so far is same as specifying {0}. Any number greater than zero will result in that many more splits. # {1} means split one time more than the default split\n# so, two splits in total and three output files\n# in this example, split happens on the 4th and 8th line numbers\n$ seq 10 | csplit -q - 4 '{1}' $ head xx*\n==> xx00 <==\n1\n2\n3 ==> xx01 <==\n4\n5\n6\n7 ==> xx02 <==\n8\n9\n10 Here's an example with regexp: $ cat log.txt\n--> warning 1\na,b,c,d\n42\n--> warning 2\nx,y,z\n--> warning 3\n4,3,1 # split on the third (2+1) occurrence of a line containing 'warning'\n$ csplit -q log.txt '%warning%' '{2}'\n$ cat xx00\n--> warning 3\n4,3,1 As a special case, you can use {*} to repeat the split until the input is exhausted. This is especially useful with the /regexp/ form of splitting. Here's an example: # split on all lines matching 'paste' or 'powder'\n$ csplit -q purchases.txt '/paste\\|powder/' '{*}'\n$ head xx*\n==> xx00 <==\ncoffee\ntea ==> xx01 <==\nwashing powder\ncoffee ==> xx02 <==\ntoothpaste\ntea\nsoap\ntea warning You'll get an error if the repeat count goes beyond the number of matches possible with the given input. $ seq 10 | csplit -q - 4 '{2}'\ncsplit: ‘4’: line number out of range on repetition 2 $ csplit -q purchases.txt '/tea/' '{4}'\ncsplit: ‘/tea/’: match not found on repetition 3","breadcrumbs":"csplit » Repeat split","id":"174","title":"Repeat split"},"175":{"body":"By default, csplit will remove the created output files if there's an error or a signal that causes the command to stop. You can use the -k option to keep such files. One use case is line number based splitting with the {*} modifier. $ seq 7 | csplit -q - 4 '{*}'\ncsplit: ‘4’: line number out of range on repetition 1\n$ ls xx*\nls: cannot access 'xx*': No such file or directory # -k option will allow you to retain the created files\n$ seq 7 | csplit -qk - 4 '{*}'\ncsplit: ‘4’: line number out of range on repetition 1\n$ head xx*\n==> xx00 <==\n1\n2\n3 ==> xx01 <==\n4\n5\n6\n7","breadcrumbs":"csplit » Keep files on error","id":"175","title":"Keep files on error"},"176":{"body":"The --suppress-matched option will suppress the lines matching the split condition. $ seq 5 | csplit -q --suppress-matched - 3\n# 3rd line won't be part of the output\n$ head xx*\n==> xx00 <==\n1\n2 ==> xx01 <==\n4\n5 $ rm xx* $ seq 10 | csplit -q --suppress-matched - 4 '{1}'\n# 4th and 8th lines won't be part of the output\n$ head xx*\n==> xx00 <==\n1\n2\n3 ==> xx01 <==\n5\n6\n7 ==> xx02 <==\n9\n10 Here's an example with regexp based split: $ csplit -q --suppress-matched purchases.txt '/soap\\|powder/' '{*}'\n# lines matching 'soap' or 'powder' won't be part of the output\n$ head xx*\n==> xx00 <==\ncoffee\ntea ==> xx01 <==\ncoffee\ntoothpaste\ntea ==> xx02 <==\ntea Here's another example: $ seq 11 16 | csplit -q --suppress-matched - '/[35]/' '{1}'\n# lines matching '3' or '5' won't be part of the output\n$ head xx*\n==> xx00 <==\n11\n12 ==> xx01 <==\n14 ==> xx02 <==\n16 $ rm xx*","breadcrumbs":"csplit » Suppress matched lines","id":"176","title":"Suppress matched lines"},"177":{"body":"There are various cases that can result in empty output files. For example, first or last line matching the given split condition. Another possibility is the --suppress-matched option combined with consecutive lines matching during multiple splits. Here's an example: $ csplit -q --suppress-matched purchases.txt '/coffee\\|tea/' '{*}' $ head xx*\n==> xx00 <== ==> xx01 <== ==> xx02 <==\nwashing powder ==> xx03 <==\ntoothpaste ==> xx04 <==\nsoap ==> xx05 <== You can use the -z option to exclude empty files from the output. The suffix numbering will be automatically adjusted in such cases. $ csplit -qz --suppress-matched purchases.txt '/coffee\\|tea/' '{*}' $ head xx*\n==> xx00 <==\nwashing powder ==> xx01 <==\ntoothpaste ==> xx02 <==\nsoap","breadcrumbs":"csplit » Exclude empty files","id":"177","title":"Exclude empty files"},"178":{"body":"As seen earlier, xx is the default prefix for output filenames. Use the -f option to change this prefix. $ seq 4 | csplit -q -f'num_' - 3 $ head num_*\n==> num_00 <==\n1\n2 ==> num_01 <==\n3\n4 The -n option controls the length of the numeric suffix. The suffix length will automatically increment if filenames are exhausted. $ seq 4 | csplit -q -n1 - 3\n$ ls xx*\nxx0 xx1\n$ rm xx* $ seq 4 | csplit -q -n3 - 3\n$ ls xx*\nxx000 xx001 The -b option allows you to control the suffix using the printf formatting. Quoting from the manual : When this option is specified, the suffix string must include exactly one printf(3)-style conversion specification, possibly including format specification flags, a field width, a precision specifications, or all of these kinds of modifiers. The format letter must convert a binary unsigned integer argument to readable form. The format letters d and i are aliases for u, and the u, o, x, and X conversions are allowed. Here are some examples: # hexadecimal numbering\n# minimum two digits, zero filled\n$ seq 100 | csplit -q -b'%02x' - 3 '{20}'\n$ ls xx*\nxx00 xx02 xx04 xx06 xx08 xx0a xx0c xx0e xx10 xx12 xx14\nxx01 xx03 xx05 xx07 xx09 xx0b xx0d xx0f xx11 xx13 xx15\n$ rm xx* # custom prefix and suffix around decimal numbering\n# default minimum is a single digit\n$ seq 20 | csplit -q -f'num_' -b'%d.txt' - 3 '{4}'\n$ ls num_*\nnum_0.txt num_1.txt num_2.txt num_3.txt num_4.txt num_5.txt info Note that the -b option will override the -n option. See man 3 printf for more details about the formatting options.","breadcrumbs":"csplit » Customize filenames","id":"178","title":"Customize filenames"},"179":{"body":"info The exercises directory has all the files used in this section. info Remove the output files after every exercise. 1) Split the blocks.txt file such that the first 7 lines are in the first file and the rest are in the second file as shown below. ##### add your solution here $ head xx*\n==> xx00 <==\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000 ==> xx01 <==\n----\nsky blue\ndark green\n----\nhi hello $ rm xx* 2) Split the input file items.txt such that the text before a line containing colors is part of the first file and the rest are part of the second file as shown below. ##### add your solution here $ head xx*\n==> xx00 <==\n1) fruits\napple 5\nbanana 10 ==> xx01 <==\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx* 3) Split the input file items.txt such that the line containing magical and all the lines that come after are part of the single output file. ##### add your solution here $ cat xx00\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx00 4) Split the input file items.txt such that the line containing colors as well the line that comes after are part of the first output file. ##### add your solution here $ head xx*\n==> xx00 <==\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen ==> xx01 <==\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx* 5) Split the input file items.txt on the line that comes before a line containing magical. Generate only a single output file as shown below. ##### add your solution here $ cat xx00\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx00 6) Split the input file blocks.txt on the 4th occurrence of a line starting with the - character. Generate only a single output file as shown below. ##### add your solution here $ cat xx00\n----\nsky blue\ndark green\n----\nhi hello $ rm xx00 7) For the input file blocks.txt, determine the logic to produce the expected output shown below. ##### add your solution here $ head xx*\n==> xx00 <==\napple--banana\nmango---fig ==> xx01 <==\n3.14\n-42\n1000 ==> xx02 <==\nsky blue\ndark green ==> xx03 <==\nhi hello $ rm xx* 8) What does the -k option do? 9) Split the books.txt file on every line as shown below. ##### add your solution here\ncsplit: ‘1’: line number out of range on repetition 3 $ head row_*\n==> row_0 <==\nCradle:::Mage Errant::The Weirkey Chronicles ==> row_1 <==\nMother of Learning::Eight:::::Dear Spellbook:Ascendant ==> row_2 <==\nMark of the Fool:Super Powereds:::Ends of Magic $ rm row_* 10) Split the items.txt file on lines starting with a digit character. Matching lines shouldn't be part of the output and the files should be named group_0.txt, group_1.txt and so on. ##### add your solution here $ head group_*\n==> group_0.txt <==\napple 5\nbanana 10 ==> group_1.txt <==\ngreen\nsky blue ==> group_2.txt <==\ndragon 3\nunicorn 42 $ rm group_*","breadcrumbs":"csplit » Exercises","id":"179","title":"Exercises"},"18":{"body":"Yeah, cat can be used to write contents to a file by typing them from the terminal itself. If you invoke cat without providing file arguments or stdin data from a pipe, it will wait for you to type the content. After you are done typing all the text you want to save, press Enter and then the Ctrl+d key combinations. If you don't want the last line to have a newline character, press Ctrl+d twice instead of Enter and Ctrl+d. See also unix.stackexchange: difference between Ctrl+c and Ctrl+d . # press Enter and Ctrl+d after typing all the required characters\n$ cat > greeting.txt\nHi there\nHave a nice day In the above example, the output of cat is redirected to a file named greeting.txt. If you don't redirect the stdout data, each line will be echoed as you type. You can check the contents of the file you just created by using cat again. $ cat greeting.txt\nHi there\nHave a nice day Here Documents is another popular way to create such files. In this case, the termination condition is a line matching a predefined string which is specified after the << redirection operator. This is especially helpful for automation, since pressing Ctrl+d interactively isn't desirable. Here's an example: # > and a space at the start of lines represents the secondary prompt PS2\n# don't type them in a shell script\n# EOF is typically used as the identifier\n$ cat << 'EOF' > fruits.txt\n> banana\n> papaya\n> mango\n> EOF $ cat fruits.txt\nbanana\npapaya\nmango The termination string is enclosed in single quotes to prevent parameter expansion, command substitution, etc. You can also use \\string for this purpose. If you use <<- instead of <<, you can use leading tab characters for indentation purposes. See bash manual: Here Documents and stackoverflow: here-documents for more examples and details. info Note that creating files as shown above isn't restricted to cat, it can be applied to any command waiting for stdin. # 'tr' converts lowercase alphabets to uppercase in this example\n$ tr 'a-z' 'A-Z' << 'end' > op.txt\n> hi there\n> have a nice day\n> end $ cat op.txt\nHI THERE\nHAVE A NICE DAY","breadcrumbs":"cat and tac » Creating text files","id":"18","title":"Creating text files"},"180":{"body":"These two commands will help you convert tabs to spaces and vice versa. Both these commands support options to customize the width of tab stops and which occurrences should be converted.","breadcrumbs":"expand and unexpand » expand and unexpand","id":"180","title":"expand and unexpand"},"181":{"body":"The expand command converts tab characters to space characters. The default expansion aligns at multiples of 8 columns (calculated in terms of bytes). # sample stdin data\n$ printf 'apple\\tbanana\\tcherry\\na\\tb\\tc\\n' | cat -T\napple^Ibanana^Icherry\na^Ib^Ic\n# 'apple' = 5 bytes, \\t converts to 3 spaces\n# 'banana' = 6 bytes, \\t converts to 2 spaces\n# 'a' and 'b' = 1 byte, \\t converts to 7 spaces\n$ printf 'apple\\tbanana\\tcherry\\na\\tb\\tc\\n' | expand\napple banana cherry\na b c # 'αλε' = 6 bytes, \\t converts to 2 spaces\n$ printf 'αλε\\tπού\\n' | expand\nαλε πού Here's an example with strings of size 7 and 8 bytes before the tab character: $ printf 'deviate\\treached\\nbackdrop\\toverhang\\n' | expand\ndeviate reached\nbackdrop overhang The expand command also considers backspace characters to determine the number of spaces needed. # sample input with a backspace character\n$ printf 'cart\\bd\\tbard\\n' | cat -t\ncart^Hd^Ibard # 'card' = 4 bytes, \\t converts to 4 spaces\n$ printf 'cart\\bd\\tbard\\n' | expand\ncard bard\n$ printf 'cart\\bd\\tbard\\n' | expand | cat -t\ncart^Hd bard info expand will concatenate multiple files passed as input source, so cat will not be needed for such cases.","breadcrumbs":"expand and unexpand » Default expand","id":"181","title":"Default expand"},"182":{"body":"You can use the -i option to convert only the tab characters present at the start of a line. The first occurrence of a character that is not tab or space characters will stop the expansion. # 'a' present at the start of line is not a tab/space character\n# so no tabs are expanded for this input\n$ printf 'a\\tb\\tc\\n' | expand -i | cat -T\na^Ib^Ic # the first \\t gets expanded here, 'a' stops further expansion\n$ printf '\\ta\\tb\\tc\\n' | expand -i | cat -T a^Ib^Ic # first two \\t gets expanded here, 'a' stops further expansion\n# presence of space characters will not stop the expansion\n$ printf '\\t \\ta\\tb\\tc\\n' | expand -i | cat -T a^Ib^Ic","breadcrumbs":"expand and unexpand » Expand only the initial tabs","id":"182","title":"Expand only the initial tabs"},"183":{"body":"You can use the -t option to control the expansion width. Default is 8 as seen in the previous examples. This option provides various features. Here's an example where all the tab characters are converted equally to the given width: $ cat -T code.py\ndef compute(x, y):\n^Iif x > y:\n^I^Iprint('hello')\n^Ielse:\n^I^Iprint('bye') $ expand -t 2 code.py\ndef compute(x, y): if x > y: print('hello') else: print('bye') You can provide multiple widths separated by a comma character. In such a case, the given widths determine the stop locations for those many tab characters. These stop values refer to absolute positions from the start of the line, not the number of spaces they can expand to. Rest of the tab characters will be expanded to a single space character. # first tab character can expand till the 3rd column\n# second tab character can expand till the 7th column\n# rest of the tab characters will be expanded to a single space\n$ printf 'a\\tb\\tc\\td\\te\\n' | expand -t 3,7\na b c d e # here are two more examples with the same specification as above\n# second tab expands to two spaces to end at the 7th column\n$ printf 'a\\tbb\\tc\\td\\te\\n' | expand -t 3,7\na bb c d e\n# second tab expands to a single space since it goes beyond the 7th column\n$ printf 'a\\tbbbbbbbb\\tc\\td\\te\\n' | expand -t 3,7\na bbbbbbbb c d e If you prefix a / character to the last width, the remaining tab characters will use multiple of this position instead of a single space default. # first tab character can expand till the 3rd column\n# remaining tab characters can expand till 7/14/21/etc\n$ printf 'a\\tb\\tc\\td\\te\\tf\\tg\\n' | expand -t 3,/7\na b c d e f g # first tab character can expand till the 3rd column\n# second tab character can expand till the 7th column\n# remaining tab characters can expand till 10/15/20/etc\n$ printf 'a\\tb\\tc\\td\\te\\tf\\tg\\n' | expand -t 3,7,/5\na b c d e f g If you use + instead of / as the prefix for the last width, the multiple calculation will use the second last width as an offset. # first tab character can expand till the 3rd column\n# 3+7=10, so remaining tab characters can expand till 10/17/24/etc\n$ printf 'a\\tb\\tc\\td\\te\\tf\\tg\\n' | expand -t 3,+7\na b c d e f g # first tab character can expand till the 3rd column\n# second tab character can expand till the 7th column\n# 7+5=12, so remaining tab characters can expand till 12/17/22/etc\n$ printf 'a\\tb\\tc\\td\\te\\tf\\tg\\n' | expand -t 3,7,+5\na b c d e f g","breadcrumbs":"expand and unexpand » Customize the tab stop width","id":"183","title":"Customize the tab stop width"},"184":{"body":"By default, the unexpand command converts initial blank characters (space or tab) to tabs. The first occurrence of a non-blank character will stop the conversion. By default, every 8 columns worth of blanks is converted to a tab. # input is 8 spaces followed by 'a' and then more characters\n# the initial 8 spaces is converted to a tab character\n# 'a' stops any further conversion, since it is a non-blank character\n$ printf ' a b c\\n' | unexpand | cat -T\n^Ia b c # input is 9 spaces followed by 'a' and then more characters\n# the initial 8 spaces are converted to a tab character\n# remaining space is left as is\n$ printf ' a b c\\n' | unexpand | cat -T\n^I a b c # input has 16 initial spaces, gets converted to two tabs\n$ printf '\\t\\ta\\tb\\tc\\n' | expand | unexpand | cat -T\n^I^Ia b c # input has 4 spaces and a tab character (that expands till the 8th column)\n# output will have a single tab character at the start\n$ printf ' \\ta b\\n' | unexpand | cat -T\n^Ia b info The current locale determines which characters are considered as blanks. Also, unexpand will concatenate multiple files passed as input source, so cat will not be needed for such cases.","breadcrumbs":"expand and unexpand » Default unexpand","id":"184","title":"Default unexpand"},"185":{"body":"The -a option will allow you to convert all sequences of two or more blanks at tab boundaries. Here are some examples: # default unexpand stops at the first non-blank character\n$ printf ' a b c\\n' | unexpand | cat -T\n^Ia b c\n# -a option will convert all sequences of blanks at tab boundaries\n$ printf ' a b c\\n' | unexpand -a | cat -T\n^Ia^Ib^Ic # only two or more consecutive blanks are considered for conversion\n$ printf 'riddled reached\\n' | unexpand -a | cat -T\nriddled reached\n$ printf 'riddle reached\\n' | unexpand -a | cat -T\nriddle^Ireached # blanks at non-tab boundaries won't be converted\n$ printf 'oh hi hello\\n' | unexpand -a | cat -T\noh hi^Ihello The unexpand command also considers backspace characters to determine the tab boundary. # 'card' = 4 bytes, so the 4 spaces gets converted to a tab\n$ printf 'cart\\bd bard\\n' | unexpand -a | cat -T\ncard^Ibard\n$ printf 'cart\\bd bard\\n' | unexpand -a | cat -t\ncart^Hd^Ibard","breadcrumbs":"expand and unexpand » Unexpand all blanks","id":"185","title":"Unexpand all blanks"},"186":{"body":"The -t option has the same features as seen with the expand command. The -a option is also implied when this option is used. Here's an example of changing the tab stop width to 2: $ printf '\\ta\\n\\t\\tb\\n' | expand -t 2 a b $ printf '\\ta\\n\\t\\tb\\n' | expand -t 2 | unexpand -t 2 | cat -T\n^Ia\n^I^Ib Here are some examples with multiple tab widths: $ printf 'a\\tb\\tc\\td\\te\\n' | expand -t 3,7\na b c d e\n$ printf 'a b c d e\\n' | unexpand -t 3,7 | cat -T\na^Ib^Ic d e\n$ printf 'a\\tb\\tc\\td\\te\\n' | expand -t 3,7 | unexpand -t 3,7 | cat -T\na^Ib^Ic d e $ printf 'a\\tb\\tc\\td\\te\\tf\\n' | expand -t 3,/7\na b c d e f\n$ printf 'a b c d e f\\n' | unexpand -t 3,/7 | cat -T\na^Ib^Ic^Id^Ie^If $ printf 'a\\tb\\tc\\td\\te\\tf\\n' | expand -t 3,+7\na b c d e f\n$ printf 'a b c d e f\\n' | unexpand -t 3,+7 | cat -T\na^Ib^Ic^Id^Ie^If","breadcrumbs":"expand and unexpand » Change the tab stop width","id":"186","title":"Change the tab stop width"},"187":{"body":"info The exercises directory has all the files used in this section. 1) The items.txt file has space separated words. Convert the spaces to be aligned at 10 column widths as shown below. $ cat items.txt\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 ##### add your solution here\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 2) What does the expand -i option do? 3) Expand the first tab character to stop at the 10th column and the second one at the 16th column. Rest of the tabs should be converted to a single space character. $ printf 'app\\tfix\\tjoy\\tmap\\ttap\\n' | ##### add your solution here\napp fix joy map tap $ printf 'appleseed\\tfig\\tjoy\\n' | ##### add your solution here\nappleseed fig joy $ printf 'a\\tb\\tc\\td\\te\\n' | ##### add your solution here\na b c d e 4) Will the following code give back the original input? If not, is there an option that can help? $ printf 'a\\tb\\tc\\n' | expand | unexpand 5) How do the + and / prefix modifiers affect the -t option?","breadcrumbs":"expand and unexpand » Exercises","id":"187","title":"Exercises"},"188":{"body":"These handy commands allow you to extract filenames and directory portions of the given paths. You could also use Parameter Expansion or cut, sed, awk, etc for such purposes. The advantage is that these commands will also handle corner cases like trailing slashes and there are handy features like removing file extensions.","breadcrumbs":"basename and dirname » basename and dirname","id":"188","title":"basename and dirname"},"189":{"body":"By default, the basename command will remove the leading directory component from the given path argument. Any trailing slashes will be removed before determining the portion to be extracted. $ basename /home/learnbyexample/example_files/scores.csv\nscores.csv # quote the arguments when needed\n$ basename 'path with spaces/report.log'\nreport.log # one or more trailing slashes will not affect the output\n$ basename /home/learnbyexample/example_files/\nexample_files If there's no leading directory component or if slash alone is the input, the argument will be returned as is after removing any trailing slashes. $ basename filename.txt\nfilename.txt\n$ basename /\n/","breadcrumbs":"basename and dirname » Extract filename from paths","id":"189","title":"Extract filename from paths"},"19":{"body":"Here are some examples to showcase cat's main utility. One or more files can be passed as arguments. $ cat greeting.txt fruits.txt nums.txt\nHi there\nHave a nice day\nbanana\npapaya\nmango\n3.14\n42\n1000 info Visit the cli_text_processing_coreutils repo to get all the example files used in this book. To save the output of concatenation, use the shell's redirection features. $ cat greeting.txt fruits.txt nums.txt > op.txt $ cat op.txt\nHi there\nHave a nice day\nbanana\npapaya\nmango\n3.14\n42\n1000","breadcrumbs":"cat and tac » Concatenate files","id":"19","title":"Concatenate files"},"190":{"body":"You can use the -s option to remove a suffix from the filename. Usually used to remove the file extension. $ basename -s'.csv' /home/learnbyexample/example_files/scores.csv\nscores $ basename -s'_2' final_report.txt_2\nfinal_report.txt $ basename -s'.tar.gz' /backups/jan_2021.tar.gz\njan_2021 $ basename -s'.txt' purchases.txt.txt\npurchases.txt # -s will be ignored if it would have resulted in an empty output\n$ basename -s'report' /backups/report\nreport You can also pass the suffix to be removed after the path argument, but the -s option is preferred as it makes the intention clearer and works for multiple path arguments. $ basename example_files/scores.csv .csv\nscores","breadcrumbs":"basename and dirname » Remove file extension","id":"190","title":"Remove file extension"},"191":{"body":"By default, the dirname command removes the trailing path component (after removing any trailing slashes). $ dirname /home/learnbyexample/example_files/scores.csv\n/home/learnbyexample/example_files # one or more trailing slashes will not affect the output\n$ dirname /home/learnbyexample/example_files/\n/home/learnbyexample","breadcrumbs":"basename and dirname » Remove filename from path","id":"191","title":"Remove filename from path"},"192":{"body":"The dirname command accepts multiple path arguments by default. The basename command requires -a or -s (which implies -a) to work with multiple arguments. $ basename -a /backups/jan_2021.tar.gz /home/learnbyexample/report.log\njan_2021.tar.gz\nreport.log # -a is implied when the -s option is used\n$ basename -s'.txt' logs/purchases.txt logs/report.txt\npurchases\nreport # dirname accepts multiple path arguments by default\n$ dirname /home/learnbyexample/example_files/scores.csv ../report/backups/\n/home/learnbyexample/example_files\n../report","breadcrumbs":"basename and dirname » Multiple arguments","id":"192","title":"Multiple arguments"},"193":{"body":"You can use shell features like command substitution to combine the effects of the basename and dirname commands. # extract the second last path component\n$ basename $(dirname /home/learnbyexample/example_files/scores.csv)\nexample_files","breadcrumbs":"basename and dirname » Combining basename and dirname","id":"193","title":"Combining basename and dirname"},"194":{"body":"Use the -z option if you want to use NUL character as the output path separator. $ basename -zs'.txt' logs/purchases.txt logs/report.txt | cat -v\npurchases^@report^@ $ basename -z logs/purchases.txt | cat -v\npurchases.txt^@ $ dirname -z example_files/scores.csv ../report/backups/ | cat -v\nexample_files^@../report^@","breadcrumbs":"basename and dirname » NUL separator","id":"194","title":"NUL separator"},"195":{"body":"1) Is the following command valid? If so, what would be the output? $ basename -s.txt ~///test.txt/// 2) Given the file path in the shell variable p, how'd you obtain the outputs shown below? $ p='~/projects/square_tictactoe/python/game.py'\n##### add your solution here\n~/projects/square_tictactoe $ p='/backups/jan_2021.tar.gz'\n##### add your solution here\n/ 3) What would be the output of the basename command if the input has no leading directory component or only has the / character? 4) For the paths stored in the shell variable p, how'd you obtain the outputs shown below? $ p='/a/b/ip.txt /c/d/e/f/op.txt' # expected output 1\n##### add your solution here\nip\nop # expected output 2\n##### add your solution here\n/a/b\n/c/d/e/f 5) Given the file path in the shell variable p, how'd you obtain the outputs shown below? $ p='~/projects/python/square_tictactoe/game.py'\n##### add your solution here\nsquare_tictactoe $ p='/backups/aug_2024/ip.tar.gz'\n##### add your solution here\naug_2024","breadcrumbs":"basename and dirname » Exercises","id":"195","title":"Exercises"},"196":{"body":"Hope you've found this book interesting and useful. There are plenty of general purpose and specialized text processing tools. Here's a list of books I've written: CLI text processing with GNU grep and ripgrep CLI text processing with GNU sed CLI text processing with GNU awk Ruby One-Liners Guide Perl One-Liners Guide Command line text processing with Rust tools See also my curated list of resources on Linux CLI and Shell scripting .","breadcrumbs":"What next? » What next?","id":"196","title":"What next?"},"197":{"body":"","breadcrumbs":"Exercise solutions » Exercise solutions","id":"197","title":"Exercise solutions"},"198":{"body":"1) The given sample data has empty lines at the start and end of the input. Also, there are multiple empty lines between the paragraphs. How would you get the output shown below? # note that there's an empty line at the end of the output\n$ printf '\\n\\n\\ndragon\\n\\n\\n\\nunicorn\\nbee\\n\\n\\n' | cat -sb 1 dragon 2 unicorn 3 bee 2) Pass appropriate arguments to the cat command to get the output shown below. $ cat greeting.txt\nHi there\nHave a nice day $ echo '42 apples and 100 bananas' | cat - greeting.txt\n42 apples and 100 bananas\nHi there\nHave a nice day 3) What does the -v option of the cat command do? Displays nonprinting characters using the caret notation. 4) Which options of the cat command do the following stand in for? -e option is equivalent to -vE -t option is equivalent to -vT -A option is equivalent to -vET 5) Will the two commands shown below produce the same output? If not, why not? $ cat fruits.txt ip.txt | tac $ tac fruits.txt ip.txt No. The first command concatenates the input files before reversing the content linewise. With the second command, each file content will be reversed separately. 6) Reverse the contents of blocks.txt file as shown below, considering ---- as the separator. $ cat blocks.txt\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000\n----\nsky blue\ndark green\n----\nhi hello $ tac -bs '----' blocks.txt\n----\nhi hello\n----\nsky blue\ndark green\n----\n3.14\n-42\n1000\n----\napple--banana\nmango---fig 7) For the blocks.txt file, write solutions to display only the last such group and last two groups. # can also use: tac -bs '----' blocks.txt | awk '/----/ && ++c==2{exit} 1'\n$ tac blocks.txt | sed '/----/q' | tac\n----\nhi hello $ tac -bs '----' blocks.txt | awk '/----/ && ++c==3{exit} 1' | tac -bs '----'\n----\nsky blue\ndark green\n----\nhi hello 8) Reverse the contents of items.txt as shown below. Consider digits at the start of lines as the separator. $ cat items.txt\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ tac -brs '^[0-9]' items.txt\n3) magical beasts\ndragon 3\nunicorn 42\n2) colors\ngreen\nsky blue\n1) fruits\napple 5\nbanana 10","breadcrumbs":"Exercise solutions » cat and tac","id":"198","title":"cat and tac"},"199":{"body":"1) Use appropriate commands and shell features to get the output shown below. $ printf 'carpet\\njeep\\nbus\\n'\ncarpet\njeep\nbus # use the above 'printf' command for input data\n$ c=$(printf 'carpet\\njeep\\nbus\\n' | head -c3)\n$ echo \"$c\"\ncar 2) How would you display all the input lines except the first one? $ printf 'apple\\nfig\\ncarpet\\njeep\\nbus\\n' | tail -n +2\nfig\ncarpet\njeep\nbus 3) Which command would you use to get the output shown below? $ cat fruits.txt\nbanana\npapaya\nmango\n$ cat blocks.txt\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000\n----\nsky blue\ndark green\n----\nhi hello $ head -n2 fruits.txt blocks.txt\n==> fruits.txt <==\nbanana\npapaya ==> blocks.txt <==\n----\napple--banana 4) Use a combination of head and tail commands to get the 11th to 14th characters from the given input. # can also use: tail -c +11 | head -c4\n$ printf 'apple\\nfig\\ncarpet\\njeep\\nbus\\n' | head -c14 | tail -c +11\ncarp 5) Extract the starting six bytes from the input files ip.txt and fruits.txt. $ head -q -c6 ip.txt fruits.txt\nit is banana 6) Extract the last six bytes from the input files fruits.txt and ip.txt. $ tail -q -c6 fruits.txt ip.txt\nmango\nerish 7) For the input file ip.txt, display except the last 5 lines. $ head -n -5 ip.txt\nit is a warm and cozy day\nlisten to what I say\ngo play in the park\ncome back before the sky turns dark 8) Display the third line from the given stdin data. Consider the NUL character as the line separator. $ printf 'apple\\0fig\\0carpet\\0jeep\\0bus\\0' | head -z -n3 | tail -z -n1\ncarpet","breadcrumbs":"Exercise solutions » head and tail","id":"199","title":"head and tail"},"2":{"body":"You can buy the pdf/epub versions of the book using these links: https://learnbyexample.gumroad.com/l/cli_coreutils https://leanpub.com/cli_coreutils","breadcrumbs":"Buy PDF/EPUB versions » Purchase links","id":"2","title":"Purchase links"},"20":{"body":"You can represent the stdin data using - as a file argument. If the file arguments are not present, cat will read the stdin data if present or wait for interactive input as seen earlier. # only stdin (- is optional in this case)\n$ echo 'apple banana cherry' | cat\napple banana cherry # both stdin and file arguments\n$ echo 'apple banana cherry' | cat greeting.txt -\nHi there\nHave a nice day\napple banana cherry # here's an example without a newline character at the end of the first input\n$ printf 'Some\\nNumbers' | cat - nums.txt\nSome\nNumbers3.14\n42\n1000","breadcrumbs":"cat and tac » Accepting stdin data","id":"20","title":"Accepting stdin data"},"200":{"body":"1) What's wrong with the following command? $ echo 'apple#banana#cherry' | tr # :\ntr: missing operand\nTry 'tr --help' for more information. $ echo 'apple#banana#cherry' | tr '#' ':'\napple:banana:cherry As a good practice, always quote the arguments passed to the tr command to avoid conflict with shell metacharacters. Unless of course, you need the shell to interpret them. 2) Retain only alphabets, digits and whitespace characters. $ printf 'Apple_42 cool,blue\\tDragon:army\\n' | tr -dc '[:alnum:][:space:]'\nApple42 coolblue Dragonarmy 3) Similar to rot13, figure out a way to shift digits such that the same logic can be used both ways. $ echo '4780 89073' | tr '0-9' '5-90-4'\n9235 34528 $ echo '9235 34528' | tr '0-9' '5-90-4'\n4780 89073 4) Figure out the logic based on the given input and output data. Hint: use two ranges for the first set and only 6 characters in the second set. $ echo 'apple banana cherry damson etrog' | tr 'a-ep-z' '12345X'\n1XXl5 21n1n1 3h5XXX 41mXon 5XXog 5) Which option would you use to truncate the first set so that it matches the length of the second set? The -t option is needed for this. 6) What does the * notation do in the second set? The [c*n] notation repeats a character c by n times. You can specify n in decimal or octal formats. If n is omitted, the character c is repeated as many times as needed to equalize the length of the sets. 7) Change : to - and ; to the newline character. $ echo 'tea:coffee;brown:teal;dragon:unicorn' | tr ':;' '-\\n'\ntea-coffee\nbrown-teal\ndragon-unicorn 8) Convert all characters to * except digit and newline characters. $ echo 'ajsd45_sdg2Khnf4v_54as' | tr -c '0-9\\n' '*'\n****45****2****4**54** 9) Change consecutive repeated punctuation characters to a single punctuation character. $ echo '\"\"hi...\"\", good morning!!!!' | tr -s '[:punct:]'\n\"hi.\", good morning! 10) Figure out the logic based on the given input and output data. $ echo 'Aapple noon banana!!!!!' | tr -cs 'a-z\\n' ':'\n:apple:noon:banana: 11) The books.txt file has items separated by one or more : characters. Change this separator to a single newline character as shown below. $ cat books.txt\nCradle:::Mage Errant::The Weirkey Chronicles\nMother of Learning::Eight:::::Dear Spellbook:Ascendant\nMark of the Fool:Super Powereds:::Ends of Magic $ to separate the output columns. $ comm -2 --output-delimiter='==>' s1.txt s2.txt\napple\n==>coffee\n==>fig\n==>honey\nmango\npasta\nsugar\n==>tea 4) What does the --total option do? Gives you the count of lines for each of the three columns. 5) Will the comm command fail if there are repeated lines in the input files? If not, what'd be the expected output for the command shown below? The number of duplicate lines in the common column will be minimum of the duplicate occurrences between the two files. Rest of the duplicate lines, if any, will be considered as unique to the file having the excess lines. $ cat s3.txt\napple\napple\nguava\nhoney\ntea\ntea\ntea $ comm -23 s3.txt s1.txt\napple\nguava\ntea\ntea","breadcrumbs":"Exercise solutions » comm","id":"209","title":"comm"},"21":{"body":"As mentioned before, cat provides many features beyond concatenation. Consider this sample stdin data: $ printf 'hello\\n\\n\\nworld\\n\\nhave a nice day\\n\\n\\n\\n\\n\\napple\\n'\nhello world have a nice day apple You can use the -s option to squeeze consecutive empty lines to a single empty line. If present, leading and trailing empty lines will also be squeezed (won't be completely removed). You can modify the below example to test it out. $ printf 'hello\\n\\n\\nworld\\n\\nhave a nice day\\n\\n\\n\\n\\n\\napple\\n' | cat -s\nhello world have a nice day apple","breadcrumbs":"cat and tac » Squeeze consecutive empty lines","id":"21","title":"Squeeze consecutive empty lines"},"210":{"body":"info Assume that the input files are already sorted for these exercises. 1) Use appropriate options to get the expected outputs shown below. # no output\n$ join <(printf 'apple 2\\nfig 5') <(printf 'Fig 10\\nmango 4') # expected output 1\n$ join -i <(printf 'apple 2\\nfig 5') <(printf 'Fig 10\\nmango 4')\nfig 5 10 # expected output 2\n$ join -i -a1 -a2 <(printf 'apple 2\\nfig 5') <(printf 'Fig 10\\nmango 4')\napple 2\nfig 5 10\nmango 4 2) Use the join command to display only the non-matching lines based on the first field. $ cat j1.txt\napple 2\nfig 5\nlemon 10\ntomato 22\n$ cat j2.txt\nalmond 33\nfig 115\nmango 20\npista 42 # first field items present in j1.txt but not j2.txt\n$ join -v1 j1.txt j2.txt\napple 2\nlemon 10\ntomato 22 # first field items present in j2.txt but not j1.txt\n$ join -v2 j1.txt j2.txt\nalmond 33\nmango 20\npista 42 3) Filter lines from j1.txt and j2.txt that match the items from s1.txt. $ cat s1.txt\napple\ncoffee\nfig\nhoney\nmango\npasta\nsugar\ntea # note that sort -m is used since the input files are already sorted\n$ join s1.txt <(sort -m j1.txt j2.txt)\napple 2\nfig 115\nfig 5\nmango 20 4) Join the marks_1.csv and marks_2.csv files to get the expected output shown below. $ cat marks_1.csv\nName,Biology,Programming\nEr,92,77\nIth,100,100\nLin,92,100\nSil,86,98\n$ cat marks_2.csv\nName,Maths,Physics,Chemistry\nCy,97,98,95\nIth,100,100,100\nLin,78,83,80 $ join -t, --header marks_1.csv marks_2.csv\nName,Biology,Programming,Maths,Physics,Chemistry\nIth,100,100,100,100,100\nLin,92,100,78,83,80 5) By default, the first field is used to combine the lines. Which options are helpful if you want to change the key field to be used for joining? You can use -1 and -2 options followed by a field number to specify a different field number. You can use the -j option if the field number is the same for both the files. 6) Join the marks_1.csv and marks_2.csv files to get the expected output with specific fields as shown below. $ join -t, --header -o 1.1,1.3,2.2,1.2 marks_1.csv marks_2.csv\nName,Programming,Maths,Biology\nIth,100,100,100\nLin,100,78,92 7) Join the marks_1.csv and marks_2.csv files to get the expected output shown below. Use 50 as the filler data. $ join -t, --header -o auto -a1 -a2 -e '50' marks_1.csv marks_2.csv\nName,Biology,Programming,Maths,Physics,Chemistry\nCy,50,50,97,98,95\nEr,92,77,50,50,50\nIth,100,100,100,100,100\nLin,92,100,78,83,80\nSil,86,98,50,50,50 8) When you use the -o auto option, what'd happen to the extra fields compared to those in the first lines of the input data? If you use auto as the argument for the -o option, first line of both the input files will be used to determine the number of output fields. If the other lines have extra fields, they will be discarded. 9) From the input files j3.txt and j4.txt, filter only the lines are unique — i.e. lines that are not common to these files. Assume that the input files do not have duplicate entries. $ cat j3.txt\nalmond\napple pie\ncold coffee\nhoney\nmango shake\npasta\nsugar\ntea\n$ cat j4.txt\napple\nbanana shake\ncoffee\nfig\nhoney\nmango shake\nmilk\ntea\nyeast $ join -t '' -v1 -v2 j3.txt j4.txt\nalmond\napple\napple pie\nbanana shake\ncoffee\ncold coffee\nfig\nmilk\npasta\nsugar\nyeast 10) From the input files j3.txt and j4.txt, filter only the lines are common to these files. $ join -t '' j3.txt j4.txt\nhoney\nmango shake\ntea","breadcrumbs":"Exercise solutions » join","id":"210","title":"join"},"211":{"body":"1) nl and cat -n are always equivalent for numbering lines. True or False? True if there are no empty lines in the input data. cat -b and nl are always equivalent. 2) What does the -n option do? You can use the -n option to customize the number formatting. The available styles are: rn right justified with space fillers (default) rz right justified with leading zeros ln left justified with space fillers 3) Use nl to produce the two expected outputs shown below. $ cat greeting.txt\nHi there\nHave a nice day # expected output 1\n$ nl -w3 -n'rz' greeting.txt\n001 Hi there\n002 Have a nice day # expected output 2\n$ nl -w3 -n'rz' -s') ' greeting.txt\n001) Hi there\n002) Have a nice day 4) Figure out the logic based on the given input and output data. $ cat s1.txt\napple\ncoffee\nfig\nhoney\nmango\npasta\nsugar\ntea $ nl -w2 -s'. ' -v15 -i-2 s1.txt\n15. apple\n13. coffee\n11. fig 9. honey 7. mango 5. pasta 3. sugar 1. tea 5) What are the three types of sections supported by nl? nl recognizes three types of sections with the following default patterns: \\:\\:\\: as header \\:\\: as body \\: as footer These special lines will be replaced with an empty line after numbering. The numbering will be reset at the start of every section unless the -p option is used. 6) Only number the lines that start with ---- in the format shown below. $ cat blocks.txt\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000\n----\nsky blue\ndark green\n----\nhi hello $ nl -w2 -s') ' -bp'^----' blocks.txt 1) ---- apple--banana mango---fig 2) ---- 3.14 -42 1000 3) ---- sky blue dark green 4) ---- hi hello 7) For the blocks.txt file, determine the logic to produce the expected output shown below. $ nl -w1 -s'. ' -d'--' blocks.txt 1. apple--banana\n2. mango---fig 1. 3.14\n2. -42\n3. 1000 1. sky blue\n2. dark green 1. hi hello 8) What does the -l option do? The -l option controls how many consecutive empty lines should be considered as a single entry. Only the last empty line of such groupings will be numbered. 9) Figure out the logic based on the given input and output data. $ cat all_sections.txt\n\\:\\:\\:\nHeader\nteal\n\\:\\:\nHi there\nHow are you\n\\:\\:\nbanana\npapaya\nmango\n\\:\nFooter $ nl -p -w2 -s') ' -ha all_sections.txt 1) Header 2) teal 3) Hi there 4) How are you 5) banana 6) papaya 7) mango Footer","breadcrumbs":"Exercise solutions » nl","id":"211","title":"nl"},"212":{"body":"1) Save the number of lines in the greeting.txt input file to the lines shell variable. $ lines=$(wc -l xaa <==\napple\ncoffee\nfig ==> xab <==\nhoney\nmango\npasta ==> xac <==\nsugar\ntea $ rm xa? 2) Use appropriate options to get the output shown below. $ echo 'apple,banana,cherry,dates' | split -t, -l1 $ head xa?\n==> xaa <==\napple,\n==> xab <==\nbanana,\n==> xac <==\ncherry,\n==> xad <==\ndates $ rm xa? 3) What do the -b and -C options do? The -b option allows you to split the input by the number of bytes. This option also accepts suffixes such as K for 1024 bytes, KB for 1000 bytes, M for 1024 * 1024 bytes and so on. The -C option is similar to the -b option, but it will try to break on line boundaries if possible. The break will happen before the given byte limit. If a line exceeds the given limit, it will be broken down into multiple parts. 4) Display the 2nd chunk of the ip.txt file after splitting it 4 times as shown below. $ split -nl/2/4 ip.txt\ncome back before the sky turns dark There are so many delights to cherish 5) What does the r prefix do when used with the -n option? This creates output files with interleaved lines. 6) Split the ip.txt file 2 lines at a time. Customize the output filenames as shown below. $ split -l2 -a1 -d --additional-suffix='.txt' ip.txt ip_ $ head ip_*\n==> ip_0.txt <==\nit is a warm and cozy day\nlisten to what I say ==> ip_1.txt <==\ngo play in the park\ncome back before the sky turns dark ==> ip_2.txt <== There are so many delights to cherish ==> ip_3.txt <==\nApple, Banana and Cherry\nBread, Butter and Jelly ==> ip_4.txt <==\nTry them all before you perish $ rm ip_* 7) Which option would you use to prevent empty files in the output? The -e option prevents empty files in the output. 8) Split the items.txt file 5 lines at a time. Additionally, remove lines starting with a digit character as shown below. $ cat items.txt\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ split -l5 --filter='grep -v \"^[0-9]\" > $FILE' items.txt $ head xa?\n==> xaa <==\napple 5\nbanana 10\ngreen ==> xab <==\nsky blue\ndragon 3\nunicorn 42 $ rm xa?","breadcrumbs":"Exercise solutions » split","id":"213","title":"split"},"214":{"body":"info Remove the output files after every exercise. 1) Split the blocks.txt file such that the first 7 lines are in the first file and the rest are in the second file as shown below. $ csplit -q blocks.txt 8 $ head xx*\n==> xx00 <==\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000 ==> xx01 <==\n----\nsky blue\ndark green\n----\nhi hello $ rm xx* 2) Split the input file items.txt such that the text before a line containing colors is part of the first file and the rest are part of the second file as shown below. $ csplit -q items.txt '/colors/' $ head xx*\n==> xx00 <==\n1) fruits\napple 5\nbanana 10 ==> xx01 <==\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx* 3) Split the input file items.txt such that the line containing magical and all the lines that come after are part of the single output file. $ csplit -q items.txt '%magical%' $ cat xx00\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx00 4) Split the input file items.txt such that the line containing colors as well the line that comes after are part of the first output file. $ csplit -q items.txt '/colors/2' $ head xx*\n==> xx00 <==\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen ==> xx01 <==\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx* 5) Split the input file items.txt on the line that comes before a line containing magical. Generate only a single output file as shown below. $ csplit -q items.txt '%magical%-1' $ cat xx00\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx00 6) Split the input file blocks.txt on the 4th occurrence of a line starting with the - character. Generate only a single output file as shown below. $ csplit -q blocks.txt '%^-%' '{3}' $ cat xx00\n----\nsky blue\ndark green\n----\nhi hello $ rm xx00 7) For the input file blocks.txt, determine the logic to produce the expected output shown below. $ csplit -qz --suppress-matched blocks.txt '/----/' '{*}' $ head xx*\n==> xx00 <==\napple--banana\nmango---fig ==> xx01 <==\n3.14\n-42\n1000 ==> xx02 <==\nsky blue\ndark green ==> xx03 <==\nhi hello $ rm xx* 8) What does the -k option do? By default, csplit will remove the created output files if there's an error or a signal that causes the command to stop. You can use the -k option to keep such files. One use case is line number based splitting with the {*} modifier. $ seq 7 | csplit -q - 4 '{*}'\ncsplit: ‘4’: line number out of range on repetition 1\n$ ls xx*\nls: cannot access 'xx*': No such file or directory # -k option will allow you to retain the created files\n$ seq 7 | csplit -qk - 4 '{*}'\ncsplit: ‘4’: line number out of range on repetition 1\n$ head xx*\n==> xx00 <==\n1\n2\n3 ==> xx01 <==\n4\n5\n6\n7 $ rm xx* 9) Split the books.txt file on every line as shown below. # can also use: split -l1 -d -a1 books.txt row_\n$ csplit -qkz -f'row_' -n1 books.txt 1 '{*}'\ncsplit: ‘1’: line number out of range on repetition 3 $ head row_*\n==> row_0 <==\nCradle:::Mage Errant::The Weirkey Chronicles ==> row_1 <==\nMother of Learning::Eight:::::Dear Spellbook:Ascendant ==> row_2 <==\nMark of the Fool:Super Powereds:::Ends of Magic $ rm row_* 10) Split the items.txt file on lines starting with a digit character. Matching lines shouldn't be part of the output and the files should be named group_0.txt, group_1.txt and so on. $ csplit -qz --suppress-matched -q -f'group_' -b'%d.txt' items.txt '/^[0-9]/' '{*}' $ head group_*\n==> group_0.txt <==\napple 5\nbanana 10 ==> group_1.txt <==\ngreen\nsky blue ==> group_2.txt <==\ndragon 3\nunicorn 42 $ rm group_*","breadcrumbs":"Exercise solutions » csplit","id":"214","title":"csplit"},"215":{"body":"1) The items.txt file has space separated words. Convert the spaces to be aligned at 10 column widths as shown below. $ cat items.txt\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ warning 1\na,b,c,d\n42\n--> warning 2\nx,y,z\n--> warning 3\n4,3,1 $ tac log.txt | grep -m1 'warning'\n--> warning 3 $ tac log.txt | sed '/warning/q' | tac\n--> warning 3\n4,3,1 In the above example, log.txt has multiple lines containing warning. The task is to fetch lines based on the last match, which isn't usually supported by CLI tools. Matching the first occurrence is easy with tools like grep and sed. Hence, tac is helpful to reverse the condition from the last match to the first match. After processing with tools like sed, the result is then reversed again to get back the original order of input lines. Another benefit is that the first tac command will stop reading the input contents after the match is found. info Use the rev command if you want each input line to be reversed character wise.","breadcrumbs":"cat and tac » tac","id":"25","title":"tac"},"26":{"body":"By default, the newline character is used to split the input content into lines . You can use the -s option to specify a different string to be used as the separator. # use NUL as the line separator\n# -s $'\\0' can also be used instead of -s '' if ANSI-C quoting is supported\n$ printf 'car\\0jeep\\0bus\\0' | tac -s '' | cat -v\nbus^@jeep^@car^@ # as seen before, the last entry should also have the separator\n# otherwise it won't be present in the output\n$ printf 'apple banana cherry' | tac -s ' ' | cat -e\ncherrybanana apple $\n$ printf 'apple banana cherry ' | tac -s ' ' | cat -e\ncherry banana apple $ When the custom separator occurs before the content of interest, use the -b option to print those separators before the content in the output as well. $ cat body_sep.txt\n%=%=\napple\nbanana\n%=%=\nteal\ngreen $ tac -b -s '%=%=' body_sep.txt\n%=%=\nteal\ngreen\n%=%=\napple\nbanana The separator will be treated as a regular expression if you use the -r option as well. $ cat shopping.txt\napple 50\ntoys 5\nPizza 2\nmango 25\nBanana 10 # separator character is 'a' or 'm' at the start of a line\n$ tac -b -rs '^[am]' shopping.txt\nmango 25\nBanana 10\napple 50\ntoys 5\nPizza 2 # alternate solution for: tac log.txt | sed '/warning/q' | tac\n# separator is zero or more characters from the start of a line till 'warning'\n$ tac -b -rs '^.*warning' log.txt | awk '/warning/ && ++c==2{exit} 1'\n--> warning 3\n4,3,1 info See Regular Expressions chapter from my GNU grep ebook if you want to learn about regexp syntax and features.","breadcrumbs":"cat and tac » Customize line separator for tac","id":"26","title":"Customize line separator for tac"},"27":{"body":"info All the exercises are also collated together in one place at Exercises.md . For solutions, see Exercise_solutions.md . info The exercises directory has all the files used in this section. 1) The given sample data has empty lines at the start and end of the input. Also, there are multiple empty lines between the paragraphs. How would you get the output shown below? # note that there's an empty line at the end of the output\n$ printf '\\n\\n\\ndragon\\n\\n\\n\\nunicorn\\nbee\\n\\n\\n' | ##### add your solution here 1 dragon 2 unicorn 3 bee 2) Pass appropriate arguments to the cat command to get the output shown below. $ cat greeting.txt\nHi there\nHave a nice day $ echo '42 apples and 100 bananas' | cat ##### add your solution here\n42 apples and 100 bananas\nHi there\nHave a nice day 3) What does the -v option of the cat command do? 4) Which options of the cat command do the following stand in for? -e option is equivalent to -t option is equivalent to -A option is equivalent to 5) Will the two commands shown below produce the same output? If not, why not? $ cat fruits.txt ip.txt | tac $ tac fruits.txt ip.txt 6) Reverse the contents of blocks.txt file as shown below, considering ---- as the separator. $ cat blocks.txt\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000\n----\nsky blue\ndark green\n----\nhi hello ##### add your solution here\n----\nhi hello\n----\nsky blue\ndark green\n----\n3.14\n-42\n1000\n----\napple--banana\nmango---fig 7) For the blocks.txt file, write solutions to display only the last such group and last two groups. ##### add your solution here\n----\nhi hello ##### add your solution here\n----\nsky blue\ndark green\n----\nhi hello 8) Reverse the contents of items.txt as shown below. Consider digits at the start of lines as the separator. $ cat items.txt\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 ##### add your solution here\n3) magical beasts\ndragon 3\nunicorn 42\n2) colors\ngreen\nsky blue\n1) fruits\napple 5\nbanana 10","breadcrumbs":"cat and tac » Exercises","id":"27","title":"Exercises"},"28":{"body":"cat is useful to view entire contents of files. Pagers like less can be used if you are working with large files (man pages for example). Sometimes though, you just want a peek at the starting or ending lines of input files. Or, you know the line numbers for the information you are looking for. In such cases, you can use head or tail or a combination of both these commands to extract the content you want.","breadcrumbs":"head and tail » head and tail","id":"28","title":"head and tail"},"29":{"body":"Consider this sample file, with line numbers prefixed for convenience. $ cat sample.txt 1) Hello World 2) 3) Hi there 4) How are you 5) 6) Just do-it 7) Believe it 8) 9) banana\n10) papaya\n11) mango\n12) 13) Much ado about nothing\n14) He he he\n15) Adios amigo By default, head and tail will display the first and last 10 lines respectively. $ head sample.txt 1) Hello World 2) 3) Hi there 4) How are you 5) 6) Just do-it 7) Believe it 8) 9) banana\n10) papaya $ tail sample.txt 6) Just do-it 7) Believe it 8) 9) banana\n10) papaya\n11) mango\n12) 13) Much ado about nothing\n14) He he he\n15) Adios amigo If there are less than 10 lines in the input, only those lines will be displayed. # seq command will be discussed in detail later, generates 1 to 3 here\n# same as: seq 3 | tail\n$ seq 3 | head\n1\n2\n3 You can use the -nN option to customize the number of lines (N) needed. # first three lines\n# space between -n and N is optional\n$ head -n3 sample.txt 1) Hello World 2) 3) Hi there # last two lines\n$ tail -n2 sample.txt\n14) He he he\n15) Adios amigo","breadcrumbs":"head and tail » Leading and trailing lines","id":"29","title":"Leading and trailing lines"},"3":{"body":"You can also get the book as part of these bundles: All books bundle https://leanpub.com/b/learnbyexample-all-books https://learnbyexample.gumroad.com/l/all-books Linux CLI Text Processing https://leanpub.com/b/linux-cli-text-processing https://learnbyexample.gumroad.com/l/linux-cli-text-processing","breadcrumbs":"Buy PDF/EPUB versions » Bundles","id":"3","title":"Bundles"},"30":{"body":"By using head -n -N, you can get all the input lines except the ones you'll get when you use the tail -nN command. # except the last 11 lines\n# space between -n and -N is optional\n$ head -n -11 sample.txt 1) Hello World 2) 3) Hi there 4) How are you","breadcrumbs":"head and tail » Excluding the last N lines","id":"30","title":"Excluding the last N lines"},"31":{"body":"By using tail -n +N, you can get all the input lines except the ones you'll get when you use the head -n(N-1) command. # all lines starting from the 11th line\n# space between -n and +N is optional\n$ tail -n +11 sample.txt\n11) mango\n12) 13) Much ado about nothing\n14) He he he\n15) Adios amigo","breadcrumbs":"head and tail » Starting from the Nth line","id":"31","title":"Starting from the Nth line"},"32":{"body":"If you pass multiple input files to the head and tail commands, each file will be processed separately. By default, the output is nicely formatted with filename headers and empty line separators. $ seq 2 | head -n1 greeting.txt -\n==> greeting.txt <==\nHi there ==> standard input <==\n1 You can use the -q option to avoid filename headers and empty line separators. $ tail -q -n2 sample.txt nums.txt\n14) He he he\n15) Adios amigo\n42\n1000","breadcrumbs":"head and tail » Multiple input files","id":"32","title":"Multiple input files"},"33":{"body":"The -c option works similar to the -n option, but with bytes instead of lines. In the below examples, the shell prompt at the end of the output aren't shown for illustration purposes. # first three characters\n$ printf 'apple pie' | head -c3\napp # last three characters\n$ printf 'apple pie' | tail -c3\npie # excluding the last four characters\n$ printf 'car\\njeep\\nbus\\n' | head -c -4\ncar\njeep # all characters starting from the fifth character\n$ printf 'car\\njeep\\nbus\\n' | tail -c +5\njeep\nbus Since -c works byte wise, it may not be suitable for multibyte characters: # all input characters in this example occupy two bytes each\n$ printf 'αλεπού' | head -c2\nα # g̈ requires three bytes\n$ printf 'cag̈e' | tail -c4\ng̈e","breadcrumbs":"head and tail » Byte selection","id":"33","title":"Byte selection"},"34":{"body":"You can select a range of lines by combining both the head and tail commands. # 9th to 11th lines\n# same as: head -n11 sample.txt | tail -n +9\n$ tail -n +9 sample.txt | head -n3 9) banana\n10) papaya\n11) mango # 6th to 7th lines\n# same as: tail -n +6 sample.txt | head -n2\n$ head -n7 sample.txt | tail -n +6 6) Just do-it 7) Believe it info See unix.stackexchange: line X to line Y on a huge file for performance comparison with other commands like sed, awk, etc.","breadcrumbs":"head and tail » Range of lines","id":"34","title":"Range of lines"},"35":{"body":"The -z option sets the NUL character as the line separator instead of the newline character. $ printf 'car\\0jeep\\0bus\\0' | head -z -n2 | cat -v\ncar^@jeep^@ $ printf 'car\\0jeep\\0bus\\0' | tail -z -n2 | cat -v\njeep^@bus^@","breadcrumbs":"head and tail » NUL separator","id":"35","title":"NUL separator"},"36":{"body":"wikipedia: File monitoring with tail -f and -F options toolong — terminal application to view, tail, merge, and search log files unix.stackexchange: How does the tail -f option work? How to deal with output buffering?","breadcrumbs":"head and tail » Further Reading","id":"36","title":"Further Reading"},"37":{"body":"info The exercises directory has all the files used in this section. 1) Use appropriate commands and shell features to get the output shown below. $ printf 'carpet\\njeep\\nbus\\n'\ncarpet\njeep\nbus # use the above 'printf' command for input data\n$ c=##### add your solution here\n$ echo \"$c\"\ncar 2) How would you display all the input lines except the first one? $ printf 'apple\\nfig\\ncarpet\\njeep\\nbus\\n' | ##### add your solution here\nfig\ncarpet\njeep\nbus 3) Which command would you use to get the output shown below? $ cat fruits.txt\nbanana\npapaya\nmango\n$ cat blocks.txt\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000\n----\nsky blue\ndark green\n----\nhi hello ##### add your solution here\n==> fruits.txt <==\nbanana\npapaya ==> blocks.txt <==\n----\napple--banana 4) Use a combination of head and tail commands to get the 11th to 14th characters from the given input. $ printf 'apple\\nfig\\ncarpet\\njeep\\nbus\\n' | ##### add your solution here\ncarp 5) Extract the starting six bytes from the input files ip.txt and fruits.txt. ##### add your solution here\nit is banana 6) Extract the last six bytes from the input files fruits.txt and ip.txt. ##### add your solution here\nmango\nerish 7) For the input file ip.txt, display except the last 5 lines. ##### add your solution here\nit is a warm and cozy day\nlisten to what I say\ngo play in the park\ncome back before the sky turns dark 8) Display the third line from the given stdin data. Consider the NUL character as the line separator. $ printf 'apple\\0fig\\0carpet\\0jeep\\0bus\\0' | ##### add your solution here\ncarpet","breadcrumbs":"head and tail » Exercises","id":"37","title":"Exercises"},"38":{"body":"tr helps you to map one set of characters to another set of characters. Features like range, repeats, character sets, squeeze, complement, etc makes it a must know text processing tool. To be precise, tr can handle only bytes. Multibyte character processing isn't supported yet.","breadcrumbs":"tr » tr","id":"38","title":"tr"},"39":{"body":"Here are some examples that map one set of characters to another. As a good practice, always enclose the sets in single quotes to avoid issues due to shell metacharacters. # 'l' maps to '1', 'e' to '3', 't' to '7' and 's' to '5'\n$ echo 'leet speak' | tr 'lets' '1375'\n1337 5p3ak # example with shell metacharacters\n$ echo 'apple;banana;cherry' | tr ; :\ntr: missing operand\nTry 'tr --help' for more information.\n$ echo 'apple;banana;cherry' | tr ';' ':'\napple:banana:cherry You can use - between two characters to construct a range (ascending order only). # uppercase to lowercase\n$ echo 'HELLO WORLD' | tr 'A-Z' 'a-z'\nhello world # swap case\n$ echo 'Hello World' | tr 'a-zA-Z' 'A-Za-z'\nhELLO wORLD # rot13\n$ echo 'Hello World' | tr 'a-zA-Z' 'n-za-mN-ZA-M'\nUryyb Jbeyq\n$ echo 'Uryyb Jbeyq' | tr 'a-zA-Z' 'n-za-mN-ZA-M'\nHello World tr works only on stdin data, so use shell input redirection for file inputs. $ tr 'a-z' 'A-Z' e\n$ paste -d' : - ' <(seq 3) e e <(seq 4 6) e e <(seq 7 9)\n1 : 4 - 7\n2 : 5 - 8\n3 : 6 - 9","breadcrumbs":"paste » Multicharacter delimiters","id":"78","title":"Multicharacter delimiters"},"79":{"body":"The -s option allows you to combine all the input lines from a file into a single line using the given delimiter. paste will ensure to add a final newline character even if it wasn't present in the input. # this will give you a trailing comma\n# and there won't be a newline character at the end\n$ n1.txt\n$ shuf -n1000000 -i1-999999999999 > n2.txt\n$ sort -n n1.txt > n1_sorted.txt\n$ sort -n n2.txt > n2_sorted.txt $ time sort -n n1.txt n2.txt > op1.txt\nreal 0m1.010s\n$ time sort -mn n1_sorted.txt <(sort -n n2.txt) > op2.txt\nreal 0m0.535s\n$ time sort -mn n1_sorted.txt n2_sorted.txt > op3.txt\nreal 0m0.218s $ diff -sq op1.txt op2.txt\nFiles op1.txt and op2.txt are identical\n$ diff -sq op1.txt op3.txt\nFiles op1.txt and op3.txt are identical $ rm n{1,2}{,_sorted}.txt op{1..3}.txt info You might wonder if you can improve the performance of a single large file using the -m option. By default, sort already uses the available processors to split the input and merge. You can use the --parallel option to customize this behavior.","breadcrumbs":"sort » Merge sort","id":"107","title":"Merge sort"},"108":{"body":"Use the -z option if you want to use NUL character as the line separator. In this scenario, sort will ensure to add a final NUL character even if not present in the input. $ printf 'cherry\\0apple\\0banana' | sort -z | cat -v\napple^@banana^@cherry^@","breadcrumbs":"sort » NUL separator","id":"108","title":"NUL separator"},"109":{"body":"A few options like --compress-program and --files0-from aren't covered in this book. See the sort manual for details and examples. See also: unix.stackexchange: Scalability of sort for gigantic files stackoverflow: Sort by last field when the number of fields varies Arch wiki: locale ShellHacks: locale and language settings","breadcrumbs":"sort » Further Reading","id":"109","title":"Further Reading"},"11":{"body":"Sundeep Agarwal is a lazy being who prefers to work just enough to support his modest lifestyle. He accumulated vast wealth working as a Design Engineer at Analog Devices and retired from the corporate world at the ripe age of twenty-eight. Unfortunately, he squandered his savings within a few years and had to scramble trying to earn a living. Against all odds, selling programming ebooks saved his lazy self from having to look for a job again. He can now afford all the fantasy ebooks he wants to read and spends unhealthy amount of time browsing the internet. When the creative muse strikes, he can be found working on yet another programming ebook (which invariably ends up having at least one example with regular expressions). Researching materials for his ebooks and everyday social media usage drowned his bookmarks, so he maintains curated resource lists for sanity sake. He is thankful for free learning resources and open source tools. His own contributions can be found at https://github.com/learnbyexample . List of books: https://learnbyexample.github.io/books/","breadcrumbs":"Preface » Author info","id":"11","title":"Author info"},"110":{"body":"info The exercises directory has all the files used in this section. 1) Default sort doesn't work for numbers. Which option would you use to get the expected output shown below? $ printf '100\\n10\\n20\\n3000\\n2.45\\n' | sort ##### add your solution here\n2.45\n10\n20\n100\n3000 2) Which sort option will help you ignore case? LC_ALL=C is used here to avoid differences due to locale. $ printf 'Super\\nover\\nRUNE\\ntea\\n' | LC_ALL=C sort ##### add your solution here\nover\nRUNE\nSuper\ntea 3) The -n option doesn't work for all sorts of numbers. Which sort option would you use to get the expected output shown below? # wrong output\n$ printf '+120\\n-1.53\\n3.14e+4\\n42.1e-2' | sort -n\n-1.53\n+120\n3.14e+4\n42.1e-2 # expected output\n$ printf '+120\\n-1.53\\n3.14e+4\\n42.1e-2' | sort ##### add your solution here\n-1.53\n42.1e-2\n+120\n3.14e+4 4) What do the -V and -h options do? 5) Is there a difference between shuf and sort -R? 6) Sort the scores.csv file numerically in ascending order using the contents of the second field. Header line should be preserved as the first line as shown below. $ cat scores.csv\nName,Maths,Physics,Chemistry\nIth,100,100,100\nCy,97,98,95\nLin,78,83,80 ##### add your solution here\nName,Maths,Physics,Chemistry\nLin,78,83,80\nCy,97,98,95\nIth,100,100,100 7) Sort the contents of duplicates.csv by the fourth column numbers in descending order. Retain only the first copy of lines with the same number. $ cat duplicates.csv\nbrown,toy,bread,42\ndark red,ruby,rose,111\nblue,ruby,water,333\ndark red,sky,rose,555\nyellow,toy,flower,333\nwhite,sky,bread,111\nlight red,purse,rose,333 ##### add your solution here\ndark red,sky,rose,555\nblue,ruby,water,333\ndark red,ruby,rose,111\nbrown,toy,bread,42 8) Sort the contents of duplicates.csv by the third column item. Use the fourth column numbers as the tie-breaker. ##### add your solution here\nbrown,toy,bread,42\nwhite,sky,bread,111\nyellow,toy,flower,333\ndark red,ruby,rose,111\nlight red,purse,rose,333\ndark red,sky,rose,555\nblue,ruby,water,333 9) What does the -s option provide? 10) Sort the given input based on the numbers inside the brackets. $ printf '(-3.14)\\n[45]\\n(12.5)\\n{14093}' | ##### add your solution here\n(-3.14)\n(12.5)\n[45]\n{14093} 11) What do the -c, -C and -m options do?","breadcrumbs":"sort » Exercises","id":"110","title":"Exercises"},"111":{"body":"The uniq command identifies similar lines that are adjacent to each other. There are various options to help you filter unique or duplicate lines, count them, group them, etc.","breadcrumbs":"uniq » uniq","id":"111","title":"uniq"},"112":{"body":"This is the default behavior of the uniq command. If adjacent lines are the same, only the first copy will be displayed in the output. # only the adjacent lines are compared to determine duplicates\n# which is why you get 'red' twice in the output for this input\n$ printf 'red\\nred\\nred\\ngreen\\nred\\nblue\\nblue' | uniq\nred\ngreen\nred\nblue You'll need sorted input to make sure all the input lines are considered to determine duplicates. For some cases, sort -u is enough, like the example shown below: # same as sort -u for this case\n$ printf 'red\\nred\\nred\\ngreen\\nred\\nblue\\nblue' | sort | uniq\nblue\ngreen\nred Sometimes though, you may need to sort based on some specific criteria and then identify duplicates based on the entire line contents. Here's an example: # can't use sort -n -u here\n$ printf '2 balls\\n13 pens\\n2 pins\\n13 pens\\n' | sort -n | uniq\n2 balls\n2 pins\n13 pens info sort+uniq won't be suitable if you need to preserve the input order as well. You can use alternatives like awk, perl and huniq for such cases. # retain only the first copy of duplicates, maintain input order\n$ printf 'red\\nred\\nred\\ngreen\\nred\\nblue\\nblue' | awk '!seen[$0]++'\nred\ngreen\nblue","breadcrumbs":"uniq » Retain single copy of duplicates","id":"112","title":"Retain single copy of duplicates"},"113":{"body":"The -d option will display only the duplicate entries. That is, only if a line is seen more than once. $ cat purchases.txt\ncoffee\ntea\nwashing powder\ncoffee\ntoothpaste\ntea\nsoap\ntea $ sort purchases.txt | uniq -d\ncoffee\ntea To display all the copies of duplicates, use the -D option. $ sort purchases.txt | uniq -D\ncoffee\ncoffee\ntea\ntea\ntea","breadcrumbs":"uniq » Duplicates only","id":"113","title":"Duplicates only"},"114":{"body":"The -u option will display only the unique entries. That is, only if a line doesn't occur more than once. $ sort purchases.txt | uniq -u\nsoap\ntoothpaste\nwashing powder # reminder that uniq works based on adjacent lines only\n$ printf 'red\\nred\\nred\\ngreen\\nred\\nblue\\nblue' | uniq -u\ngreen\nred","breadcrumbs":"uniq » Unique only","id":"114","title":"Unique only"},"115":{"body":"The --group options allows you to visually separate groups of similar lines with an empty line. This option can accept four values — separate, prepend, append and both. The default is separate, which adds a newline character between the groups. prepend will add a newline before the first group as well and append will add a newline after the last group. both combines the prepend and append behavior. $ sort purchases.txt | uniq --group\ncoffee\ncoffee soap tea\ntea\ntea toothpaste washing powder The --group option cannot be used with the -c, -d, -D or -u options. The --all-repeated alias for the -D option uses none as the default grouping. You can change that to separate or prepend values. $ sort purchases.txt | uniq --all-repeated=prepend coffee\ncoffee tea\ntea\ntea","breadcrumbs":"uniq » Grouping similar lines","id":"115","title":"Grouping similar lines"},"116":{"body":"If you want to know how many times a line has been repeated, use the -c option. This will be added as a prefix. $ sort purchases.txt | uniq -c 2 coffee 1 soap 3 tea 1 toothpaste 1 washing powder $ sort purchases.txt | uniq -dc 2 coffee 3 tea The output of this option is usually piped to sort for ordering the output based on the count. $ sort purchases.txt | uniq -c | sort -n 1 soap 1 toothpaste 1 washing powder 2 coffee 3 tea $ sort purchases.txt | uniq -c | sort -nr 3 tea 2 coffee 1 washing powder 1 toothpaste 1 soap","breadcrumbs":"uniq » Prefix count","id":"116","title":"Prefix count"},"117":{"body":"Use the -i option to ignore case while determining duplicates. # depending on your locale, sort and sort -f can give the same results\n$ printf 'hat\\nbat\\nHAT\\ncar\\nbat\\nmat\\nmoat' | sort -f | uniq -iD\nbat\nbat\nhat\nHAT","breadcrumbs":"uniq » Ignoring case","id":"117","title":"Ignoring case"},"118":{"body":"uniq has three options to change the matching criteria to partial parts of the input line. These aren't as powerful as the sort -k option, but they do come in handy for some use cases. The -f option allows you to skip the first N fields. Field separation is based on one or more space/tab characters only. Note that these separators will still be part of the field contents, so this will not work with variable number of blanks. # skip the first field, works as expected since the no. of blanks is consistent\n$ printf '2 cars\\n5 cars\\n10 jeeps\\n5 jeeps\\n3 trucks\\n' | uniq -f1 --group\n2 cars\n5 cars 10 jeeps\n5 jeeps 3 trucks # example with variable number of blanks\n# 'cars' entries were identified as duplicates, but not 'jeeps'\n$ printf '2 cars\\n5 cars\\n1 jeeps\\n5 jeeps\\n3 trucks\\n' | uniq -f1\n2 cars\n1 jeeps\n5 jeeps\n3 trucks The -s option allows you to skip the first N characters (calculated as bytes). # skip the first character\n$ printf '* red\\n- green\\n* green\\n* blue\\n= blue' | uniq -s1\n* red\n- green\n* blue The -w option restricts the comparison to the first N characters (calculated as bytes). # compare only the first 2 characters\n$ printf '1) apple\\n1) almond\\n2) banana\\n3) cherry' | uniq -w2\n1) apple\n2) banana\n3) cherry When these options are used simultaneously, the priority is -f first, then -s and finally the -w option. Remember that blanks are part of the field content. # skip the first field\n# then skip the first two characters (including the blank character)\n# use the next two characters for comparison ('bl' and 'ch' in this example)\n$ printf '2 @blue\\n10 :black\\n5 :cherry\\n3 @chalk' | uniq -f1 -s2 -w2\n2 @blue\n5 :cherry info If a line doesn't have enough fields or characters to satisfy the -f and -s options respectively, a null string is used for comparison.","breadcrumbs":"uniq » Partial match","id":"118","title":"Partial match"},"119":{"body":"uniq can accept filename as the source of input contents, but only a maximum of one file. If you specify another file, it will be used as the output file. $ printf 'apple\\napple\\nbanana\\ncherry\\ncherry\\ncherry' > ip.txt\n$ uniq ip.txt op.txt $ cat op.txt\napple\nbanana\ncherry","breadcrumbs":"uniq » Specifying output file","id":"119","title":"Specifying output file"},"12":{"body":"This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License . Code snippets are available under MIT License . Resources mentioned in Acknowledgements section above are available under original licenses.","breadcrumbs":"Preface » License","id":"12","title":"License"},"120":{"body":"Use the -z option if you want to use NUL character as the line separator. In this scenario, uniq will ensure to add a final NUL character even if not present in the input. $ printf 'cherry\\0cherry\\0cherry\\0apple\\0banana' | uniq -z | cat -v\ncherry^@apple^@banana^@ info If grouping is specified, NUL will be used as the separator instead of the newline character.","breadcrumbs":"uniq » NUL separator","id":"120","title":"NUL separator"},"121":{"body":"Here are some alternate commands you can explore if uniq isn't enough to solve your task. Dealing with duplicates chapter from my GNU awk ebook Dealing with duplicates chapter from my Perl one-liners ebook huniq — remove duplicates from entire input contents, input order is maintained, supports count option as well","breadcrumbs":"uniq » Alternatives","id":"121","title":"Alternatives"},"122":{"body":"info The exercises directory has all the files used in this section. 1) Will uniq throw an error if the input is not sorted? What do you think will be the output for the following input? $ printf 'red\\nred\\nred\\ngreen\\nred\\nblue\\nblue' | uniq 2) Are there differences between sort -u file and sort file | uniq? 3) What are the differences between sort -u and uniq -u options, if any? 4) Filter the third column items from duplicates.csv. Construct three solutions to display only unique items, duplicate items and all duplicates. $ cat duplicates.csv\nbrown,toy,bread,42\ndark red,ruby,rose,111\nblue,ruby,water,333\ndark red,sky,rose,555\nyellow,toy,flower,333\nwhite,sky,bread,111\nlight red,purse,rose,333 # unique\n##### add your solution here\nflower\nwater # duplicates\n##### add your solution here\nbread\nrose # all duplicates\n##### add your solution here\nbread\nbread\nrose\nrose\nrose 5) What does the --group option do? What customization features are available? 6) Count the number of times input lines are repeated and display the results in the format shown below. $ s='brown\\nbrown\\nbrown\\ngreen\\nbrown\\nblue\\nblue'\n$ printf '%b' \"$s\" | ##### add your solution here 1 green 2 blue 4 brown 7) For the input file f1.txt, retain only unique entries based on the first two characters of each line. For example, abcd and ab12 should be considered as duplicates and neither of them will be part of the output. $ cat f1.txt\n3) cherry\n1) apple\n2) banana\n1) almond\n4) mango\n2) berry\n3) chocolate\n1) apple\n5) cherry ##### add your solution here\n4) mango\n5) cherry 8) For the input file f1.txt, display only the duplicate items without considering the first two characters of each line. For example, abcd and 12cd should be considered as duplicates. Assume that the third character of each line is always a space character. ##### add your solution here\n1) apple\n3) cherry 9) What does the -s option do? 10) Filter only unique lines, but ignore differences due to case. $ printf 'cat\\nbat\\nCAT\\nCar\\nBat\\nmat\\nMat' | ##### add your solution here\nCar","breadcrumbs":"uniq » Exercises","id":"122","title":"Exercises"},"123":{"body":"The comm command finds common and unique lines between two sorted files. These results are formatted as a table with three columns and one or more of these columns can be suppressed as required.","breadcrumbs":"comm » comm","id":"123","title":"comm"},"124":{"body":"Consider the sample input files as shown below: # side by side view of the sample files\n# note that these files are already sorted\n$ paste colors_1.txt colors_2.txt\nBlue Black\nBrown Blue\nOrange Green\nPurple Orange\nRed Pink\nTeal Red\nWhite White By default, comm gives a tabular output with three columns: first column has lines unique to the first file second column has lines unique to the second file third column has lines common to both the files The columns are separated by a tab character. Here's the output for the above sample files: $ comm colors_1.txt colors_2.txt Black Blue\nBrown Green Orange Pink\nPurple Red\nTeal White You can change the column separator to a string of your choice using the --output-delimiter option. Here's an example: # note that the input files need not have the same number of lines\n$ comm <(seq 3) <(seq 2 5)\n1 2 3 4 5 $ comm --output-delimiter=, <(seq 3) <(seq 2 5)\n1\n,,2\n,,3\n,4\n,5 info Collating order for comm should be same as the one used to sort the input files. info --nocheck-order option can be used for unsorted inputs. However, as per the documentation, this option \"is not guaranteed to produce any particular output.\"","breadcrumbs":"comm » Three column output","id":"124","title":"Three column output"},"125":{"body":"You can use one or more of the following options to suppress columns: -1 to suppress the lines unique to the first file -2 to suppress the lines unique to the second file -3 to suppress the lines common to both the files Here's how the output looks like when you suppress one of the columns: # suppress lines common to both the files\n$ comm -3 colors_1.txt colors_2.txt Black\nBrown Green Pink\nPurple\nTeal Combining two of these options gives three useful solutions. -12 will give you only the common lines. $ comm -12 colors_1.txt colors_2.txt\nBlue\nOrange\nRed\nWhite -23 will give you the lines unique to the first file. $ comm -23 colors_1.txt colors_2.txt\nBrown\nPurple\nTeal -13 will give you the lines unique to the second file. $ comm -13 colors_1.txt colors_2.txt\nBlack\nGreen\nPink You can combine all the three options as well. Useful with the --total option to get only the count of lines for each of the three columns. $ comm --total -123 colors_1.txt colors_2.txt\n3 3 4 total","breadcrumbs":"comm » Suppressing columns","id":"125","title":"Suppressing columns"},"126":{"body":"The number of duplicate lines in the common column will be minimum of the duplicate occurrences between the two files. Rest of the duplicate lines, if any, will be considered as unique to the file having the excess lines. Here's an example: $ paste list_1.txt list_2.txt\napple cherry\nbanana cherry\ncherry mango\ncherry papaya\ncherry cherry # 'cherry' occurs only twice in the second file\n# rest of the 'cherry' lines will be unique to the first file\n$ comm list_1.txt list_2.txt\napple\nbanana cherry cherry\ncherry\ncherry mango papaya","breadcrumbs":"comm » Duplicate lines","id":"126","title":"Duplicate lines"},"127":{"body":"Use the -z option if you want to use NUL character as the line separator. In this scenario, comm will ensure to add a final NUL character even if not present in the input. $ comm -z -12 <(printf 'a\\0b\\0c') <(printf 'a\\0c\\0x') | cat -v\na^@c^@","breadcrumbs":"comm » NUL separator","id":"127","title":"NUL separator"},"128":{"body":"Here are some alternate commands you can explore if comm isn't enough to solve your task. These alternatives do not require the input files to be sorted. zet — set operations on one or more input files Comparing lines between files section from my GNU grep ebook Two file processing chapter from my GNU awk ebook, has examples for both line and field based comparisons Two file processing chapter from my Perl one-liners ebook, has examples for both line and field based comparisons","breadcrumbs":"comm » Alternatives","id":"128","title":"Alternatives"},"129":{"body":"info The exercises directory has all the files used in this section. 1) Get the common lines between the s1.txt and s2.txt files. Assume that their contents are already sorted. $ paste s1.txt s2.txt\napple banana\ncoffee coffee\nfig eclair\nhoney fig\nmango honey\npasta milk\nsugar tea\ntea yeast ##### add your solution here\ncoffee\nfig\nhoney\ntea 2) Display lines present in s1.txt but not s2.txt and vice versa. # lines unique to the first file\n##### add your solution here\napple\nmango\npasta\nsugar # lines unique to the second file\n##### add your solution here\nbanana\neclair\nmilk\nyeast 3) Display lines unique to the s1.txt file and the common lines when compared to the s2.txt file. Use ==> to separate the output columns. ##### add your solution here\napple\n==>coffee\n==>fig\n==>honey\nmango\npasta\nsugar\n==>tea 4) What does the --total option do? 5) Will the comm command fail if there are repeated lines in the input files? If not, what'd be the expected output for the command shown below? $ cat s3.txt\napple\napple\nguava\nhoney\ntea\ntea\ntea $ comm -23 s3.txt s1.txt","breadcrumbs":"comm » Exercises","id":"129","title":"Exercises"},"13":{"body":"2.0 See Version_changes.md to track changes across book versions.","breadcrumbs":"Preface » Book version","id":"13","title":"Book version"},"130":{"body":"The join command helps you to combine lines from two files based on a common field. This works best when the input is already sorted by that field.","breadcrumbs":"join » join","id":"130","title":"join"},"131":{"body":"By default, join combines two files based on the first field content (also referred as key ). Only the lines with common keys will be part of the output. The key field will be displayed first in the output (this distinction will come into play if the first field isn't the key). Rest of the line will have the remaining fields from the first and second files, in that order. One or more blanks (space or tab) will be considered as the input field separator and a single space will be used as the output field separator. If present, blank characters at the start of the input lines will be ignored. # sample sorted input files\n$ cat shopping_jan.txt\napple 10\nbanana 20\nsoap 3\ntshirt 3\n$ cat shopping_feb.txt\nbanana 15\nfig 100\npen 2\nsoap 1 # combine common lines based on the first field\n$ join shopping_jan.txt shopping_feb.txt\nbanana 20 15\nsoap 3 1 If a field value is present multiple times in the same input file, all possible combinations will be present in the output. As shown below, join will also ensure to add a final newline character even if it wasn't present in the input. $ join <(printf 'a f1_x\\na f1_y') <(printf 'a f2_x\\na f2_y')\na f1_x f2_x\na f1_x f2_y\na f1_y f2_x\na f1_y f2_y info Note that the collating order used for join should be same as the one used to sort the input files. Use join -i to ignore case, similar to sort -f usage. info If the input files are not sorted, join will produce an error if there are unpairable lines. You can use the --nocheck-order option to ignore this error. However, as per the documentation, this option \"is not guaranteed to produce any particular output.\"","breadcrumbs":"join » Default join","id":"131","title":"Default join"},"132":{"body":"By default, only the lines having common keys are part of the output. You can use the -a option to also include the non-matching lines from the input files. Use 1 and 2 as the argument for the first and second file respectively. You'll later see how to fill missing fields with a custom string. # includes non-matching lines from the first file\n$ join -a1 shopping_jan.txt shopping_feb.txt\napple 10\nbanana 20 15\nsoap 3 1\ntshirt 3 # includes non-matching lines from both the files\n$ join -a1 -a2 shopping_jan.txt shopping_feb.txt\napple 10\nbanana 20 15\nfig 100\npen 2\nsoap 3 1\ntshirt 3 If you use -v instead of -a, the output will have only the non-matching lines. $ join -v2 shopping_jan.txt shopping_feb.txt\nfig 100\npen 2 $ join -v1 -v2 shopping_jan.txt shopping_feb.txt\napple 10\nfig 100\npen 2\ntshirt 3","breadcrumbs":"join » Non-matching lines","id":"132","title":"Non-matching lines"},"133":{"body":"You can use the -t option to specify a single byte character as the field separator. The output field separator will be same as the value used for the -t option. Use \\0 to specify NUL as the separator. Empty string will cause entire input line content to be considered as keys. Depending on your shell you can use ANSI-C quoting to use escapes like \\t instead of a literal tab character. $ cat marks.csv\nECE,Raj,53\nECE,Joel,72\nEEE,Moi,68\nCSE,Surya,81\nEEE,Raj,88\nCSE,Moi,62\nEEE,Tia,72\nECE,Om,92\nCSE,Amy,67\n$ cat dept.txt\nCSE\nECE # get all lines from marks.csv based on the first field keys in dept.txt\n$ join -t, <(sort marks.csv) dept.txt\nCSE,Amy,67\nCSE,Moi,62\nCSE,Surya,81\nECE,Joel,72\nECE,Om,92\nECE,Raj,53","breadcrumbs":"join » Change field separator","id":"133","title":"Change field separator"},"134":{"body":"Use the --header option to ignore first lines of both the input files from sorting consideration. Without this option, the join command might still work correctly if unpairable lines aren't found, but it is preferable to use --header when applicable. This option will also help when --check-order option is active. $ cat report_1.csv\nName,Maths,Physics\nAmy,78,95\nMoi,88,75\nRaj,67,76\n$ cat report_2.csv\nName,Chemistry\nAmy,85\nJoel,78\nRaj,72 $ join --check-order -t, report_1.csv report_2.csv\njoin: report_1.csv:2: is not sorted: Amy,78,95\n$ join --check-order --header -t, report_1.csv report_2.csv\nName,Maths,Physics,Chemistry\nAmy,78,95,85\nRaj,67,76,72","breadcrumbs":"join » Files with headers","id":"134","title":"Files with headers"},"135":{"body":"By default, the first field of both the input files are used to combine the lines. You can use -1 and -2 options followed by a field number to specify a different field number. You can use the -j option if the field number is the same for both the files. Recall that the key field is the first field in the output. You'll later see how to customize the output field order. $ cat names.txt\nAmy\nRaj\nTia # combine based on the second field of the first file\n# and the first field of the second file (default)\n$ join -t, -1 2 <(sort -t, -k2,2 marks.csv) names.txt\nAmy,CSE,67\nRaj,ECE,53\nRaj,EEE,88\nTia,EEE,72","breadcrumbs":"join » Change key field","id":"135","title":"Change key field"},"136":{"body":"Use the -o option to customize the fields required in the output and their order. Especially useful when the first field isn't the key. Each output field is specified as file number followed by a . character and then the field number. You can specify multiple fields separated by a , character. As a special case, you can use 0 to indicate the key field. # output field order is 1st, 2nd and 3rd fields from the first file\n$ join -t, -1 2 -o 1.1,1.2,1.3 <(sort -t, -k2,2 marks.csv) names.txt\nCSE,Amy,67\nECE,Raj,53\nEEE,Raj,88\nEEE,Tia,72 # 1st field from the first file, 2nd field from the second file\n# and then 2nd and 3rd fields from the first file\n$ join --header -t, -o 1.1,2.2,1.2,1.3 report_1.csv report_2.csv\nName,Chemistry,Maths,Physics\nAmy,85,78,95\nRaj,72,67,76","breadcrumbs":"join » Customize output field list","id":"136","title":"Customize output field list"},"137":{"body":"If you use auto as the argument for the -o option, first line of both the input files will be used to determine the number of output fields. If the other lines have extra fields, they will be discarded. $ join <(printf 'a 1 2\\nb p q r') <(printf 'a 3 4\\nb x y z')\na 1 2 3 4\nb p q r x y z $ join -o auto <(printf 'a 1 2\\nb p q r') <(printf 'a 3 4\\nb x y z')\na 1 2 3 4\nb p q x y If the other lines have lesser number of fields, the -e option will determine the string to be used as a filler (empty string is the default). # the second line has two empty fields\n$ join -o auto <(printf 'a 1 2\\nb p') <(printf 'a 3 4\\nb x')\na 1 2 3 4\nb p x $ join -o auto -e '-' <(printf 'a 1 2\\nb p') <(printf 'a 3 4\\nb x')\na 1 2 3 4\nb p - x - As promised earlier, here are some examples of filling fields for non-matching lines: $ join -o auto -a1 -e 'NA' shopping_jan.txt shopping_feb.txt\napple 10 NA\nbanana 20 15\nsoap 3 1\ntshirt 3 NA $ join -o auto -a1 -a2 -e 'NA' shopping_jan.txt shopping_feb.txt\napple 10 NA\nbanana 20 15\nfig NA 100\npen NA 2\nsoap 3 1\ntshirt 3 NA","breadcrumbs":"join » Same number of output fields","id":"137","title":"Same number of output fields"},"138":{"body":"This section covers whole line set operations you can perform on already sorted input files. Equivalent sort and uniq solutions will also be mentioned as comments (useful for unsorted inputs). Assume that there are no duplicate lines within an input file. These two sorted input files will be used for the examples to follow: $ paste colors_1.txt colors_2.txt\nBlue Black\nBrown Blue\nOrange Green\nPurple Orange\nRed Pink\nTeal Red\nWhite White Here's how you can get union and symmetric difference results. Recall that -t '' will cause the entire input line content to be considered as keys. # union\n# unsorted input: sort -u colors_1.txt colors_2.txt\n$ join -t '' -a1 -a2 colors_1.txt colors_2.txt\nBlack\nBlue\nBrown\nGreen\nOrange\nPink\nPurple\nRed\nTeal\nWhite # symmetric difference\n# unsorted input: sort colors_1.txt colors_2.txt | uniq -u\n$ join -t '' -v1 -v2 colors_1.txt colors_2.txt\nBlack\nBrown\nGreen\nPink\nPurple\nTeal Here's how you can get intersection and difference results. The equivalent comm solutions for sorted input is also mentioned in the comments. # intersection, same as: comm -12 colors_1.txt colors_2.txt\n# unsorted input: sort colors_1.txt colors_2.txt | uniq -d\n$ join -t '' colors_1.txt colors_2.txt\nBlue\nOrange\nRed\nWhite # difference, same as: comm -13 colors_1.txt colors_2.txt\n# unsorted input: sort colors_1.txt colors_1.txt colors_2.txt | uniq -u\n$ join -t '' -v2 colors_1.txt colors_2.txt\nBlack\nGreen\nPink # difference, same as: comm -23 colors_1.txt colors_2.txt\n# unsorted input: sort colors_1.txt colors_2.txt colors_2.txt | uniq -u\n$ join -t '' -v1 colors_1.txt colors_2.txt\nBrown\nPurple\nTeal As mentioned before, join will display all the combinations if there are duplicate entries. Here's an example to show the differences between sort, comm and join solutions for displaying common lines: $ paste list_1.txt list_2.txt\napple cherry\nbanana cherry\ncherry mango\ncherry papaya\ncherry cherry # only one entry per common line\n$ sort list_1.txt list_2.txt | uniq -d\ncherry # minimum of 'no. of entries in file1' and 'no. of entries in file2'\n$ comm -12 list_1.txt list_2.txt\ncherry\ncherry # 'no. of entries in file1' multiplied by 'no. of entries in file2'\n$ join -t '' list_1.txt list_2.txt\ncherry\ncherry\ncherry\ncherry\ncherry\ncherry\ncherry\ncherry","breadcrumbs":"join » Set operations","id":"138","title":"Set operations"},"139":{"body":"Use the -z option if you want to use NUL character as the line separator. In this scenario, join will ensure to add a final NUL character even if not present in the input. $ join -z <(printf 'a 1\\0b x') <(printf 'a 2\\0b y') | cat -v\na 1 2^@b x y^@","breadcrumbs":"join » NUL separator","id":"139","title":"NUL separator"},"14":{"body":"I've been using Linux since 2007, but it took me ten more years to really explore coreutils when I wrote tutorials for the Command Line Text Processing repository. Any beginner learning Linux command line tools would come across the cat command within the first week. Sooner or later, they'll come to know popular text processing tools like grep, head, tail, tr, sort, etc. If you were like me, you'd come across sed and awk, shudder at their complexity and prefer to use a scripting language like Perl and text editors like Vim instead (don't worry, I've already corrected that mistake). Knowing power tools like grep, sed and awk can help solve most of your text processing needs. So, why would you want to learn text processing tools from the coreutils package? The biggest motivation would be faster execution since these tools are optimized for the use cases they solve. And there's always the advantage of not having to write code (and test that solution) if there's an existing tool to solve the problem. This book will teach you more than twenty of such specialized text processing tools provided by the GNU coreutils package. Plenty of examples and exercise are provided to make it easier to understand a particular tool and its various features. Writing a book always has a few pleasant surprises for me. For this one, it was discovering a sort option for calendar months, regular expressions in the tac and nl commands, etc.","breadcrumbs":"Introduction » Introduction","id":"14","title":"Introduction"},"140":{"body":"Here are some alternate commands you can explore if join isn't enough to solve your task. These alternatives do not require input to be sorted. zet — set operations on one or more input files Comparing lines between files section from my GNU grep ebook Two file processing chapter from my GNU awk ebook, has examples for both line and field based comparisons Two file processing chapter from my Perl one-liners ebook, has examples for both line and field based comparisons","breadcrumbs":"join » Alternatives","id":"140","title":"Alternatives"},"141":{"body":"info The exercises directory has all the files used in this section. info Assume that the input files are already sorted for these exercises. 1) Use appropriate options to get the expected outputs shown below. # no output\n$ join <(printf 'apple 2\\nfig 5') <(printf 'Fig 10\\nmango 4') # expected output 1\n##### add your solution here\nfig 5 10 # expected output 2\n##### add your solution here\napple 2\nfig 5 10\nmango 4 2) Use the join command to display only the non-matching lines based on the first field. $ cat j1.txt\napple 2\nfig 5\nlemon 10\ntomato 22\n$ cat j2.txt\nalmond 33\nfig 115\nmango 20\npista 42 # first field items present in j1.txt but not j2.txt\n##### add your solution here\napple 2\nlemon 10\ntomato 22 # first field items present in j2.txt but not j1.txt\n##### add your solution here\nalmond 33\nmango 20\npista 42 3) Filter lines from j1.txt and j2.txt that match the items from s1.txt. $ cat s1.txt\napple\ncoffee\nfig\nhoney\nmango\npasta\nsugar\ntea ##### add your solution here\napple 2\nfig 115\nfig 5\nmango 20 4) Join the marks_1.csv and marks_2.csv files to get the expected output shown below. $ cat marks_1.csv\nName,Biology,Programming\nEr,92,77\nIth,100,100\nLin,92,100\nSil,86,98\n$ cat marks_2.csv\nName,Maths,Physics,Chemistry\nCy,97,98,95\nIth,100,100,100\nLin,78,83,80 ##### add your solution here\nName,Biology,Programming,Maths,Physics,Chemistry\nIth,100,100,100,100,100\nLin,92,100,78,83,80 5) By default, the first field is used to combine the lines. Which options are helpful if you want to change the key field to be used for joining? 6) Join the marks_1.csv and marks_2.csv files to get the expected output with specific fields as shown below. ##### add your solution here\nName,Programming,Maths,Biology\nIth,100,100,100\nLin,100,78,92 7) Join the marks_1.csv and marks_2.csv files to get the expected output shown below. Use 50 as the filler data. ##### add your solution here\nName,Biology,Programming,Maths,Physics,Chemistry\nCy,50,50,97,98,95\nEr,92,77,50,50,50\nIth,100,100,100,100,100\nLin,92,100,78,83,80\nSil,86,98,50,50,50 8) When you use the -o auto option, what'd happen to the extra fields compared to those in the first lines of the input data? 9) From the input files j3.txt and j4.txt, filter only the lines are unique — i.e. lines that are not common to these files. Assume that the input files do not have duplicate entries. $ cat j3.txt\nalmond\napple pie\ncold coffee\nhoney\nmango shake\npasta\nsugar\ntea\n$ cat j4.txt\napple\nbanana shake\ncoffee\nfig\nhoney\nmango shake\nmilk\ntea\nyeast ##### add your solution here\nalmond\napple\napple pie\nbanana shake\ncoffee\ncold coffee\nfig\nmilk\npasta\nsugar\nyeast 10) From the input files j3.txt and j4.txt, filter only the lines are common to these files. ##### add your solution here\nhoney\nmango shake\ntea","breadcrumbs":"join » Exercises","id":"141","title":"Exercises"},"142":{"body":"If the numbering options provided by cat isn't enough, nl might suit you better. Apart from options to customize the number formatting and the separator, you can also filter which lines should be numbered. Additionally, you can divide your input into sections and number them separately.","breadcrumbs":"nl » nl","id":"142","title":"nl"},"143":{"body":"By default, nl will prefix line numbers and a tab character to every non-empty input lines. The default number formatting is 6 characters wide and right justified with spaces. Similar to cat, the nl command will concatenate multiple inputs. # same as: cat -n greeting.txt fruits.txt nums.txt\n$ nl greeting.txt fruits.txt nums.txt 1 Hi there 2 Have a nice day 3 banana 4 papaya 5 mango 6 3.14 7 42 8 1000 # example for input with empty lines, same as: cat -b\n$ printf 'apple\\n\\nbanana\\n\\ncherry\\n' | nl 1 apple 2 banana 3 cherry","breadcrumbs":"nl » Default numbering","id":"143","title":"Default numbering"},"144":{"body":"You can use the -n option to customize the number formatting. The available styles are: rn right justified with space fillers (default) rz right justified with leading zeros ln left justified with space fillers # right justified with space fillers\n$ nl -n'rn' greeting.txt 1 Hi there 2 Have a nice day # right justified with leading zeros\n$ nl -n'rz' greeting.txt\n000001 Hi there\n000002 Have a nice day # left justified with space fillers\n$ nl -n'ln' greeting.txt\n1 Hi there\n2 Have a nice day","breadcrumbs":"nl » Number formatting","id":"144","title":"Number formatting"},"145":{"body":"You can use the -w option to specify the width to be used for the numbers (default is 6). $ nl greeting.txt 1 Hi there 2 Have a nice day $ nl -w2 greeting.txt 1 Hi there 2 Have a nice day","breadcrumbs":"nl » Customize width","id":"145","title":"Customize width"},"146":{"body":"By default, a tab character is used to separate the line number and the line content. You can use the -s option to specify your own custom string separator. $ nl -w2 -s' ' greeting.txt 1 Hi there 2 Have a nice day $ nl -w1 -s' --> ' greeting.txt\n1 --> Hi there\n2 --> Have a nice day","breadcrumbs":"nl » Customize separator","id":"146","title":"Customize separator"},"147":{"body":"The -v option allows you to specify a different starting integer. Negative integer is also allowed. $ nl -v10 greeting.txt 10 Hi there 11 Have a nice day $ nl -v-1 fruits.txt -1 banana 0 papaya 1 mango The -i option allows you to specify an integer as the step value (default is 1). $ nl -w2 -s') ' -i2 greeting.txt fruits.txt nums.txt 1) Hi there 3) Have a nice day 5) banana 7) papaya 9) mango\n11) 3.14\n13) 42\n15) 1000 $ nl -w1 -s'. ' -v8 -i-1 greeting.txt fruits.txt\n8. Hi there\n7. Have a nice day\n6. banana\n5. papaya\n4. mango","breadcrumbs":"nl » Starting number and step value","id":"147","title":"Starting number and step value"},"148":{"body":"If you organize your input with lines conforming to specific patterns, you can control their numbering separately. nl recognizes three types of sections with the following default patterns: \\:\\:\\: as header \\:\\: as body \\: as footer These special lines will be replaced with an empty line after numbering. The numbering will be reset at the start of every section. Here's an example with multiple body sections: $ cat body.txt\n\\:\\:\nHi there\nHow are you\n\\:\\:\nbanana\npapaya\nmango $ nl -w1 -s' ' body.txt 1 Hi there\n2 How are you 1 banana\n2 papaya\n3 mango Here's an example with both header and body sections. By default, header and footer section lines are not numbered (you'll see options to enable them later). $ cat header_body.txt\n\\:\\:\\:\nHeader\nteal\n\\:\\:\nHi there\nHow are you\n\\:\\:\nbanana\npapaya\nmango\n\\:\\:\\:\nHeader\ngreen $ nl -w1 -s' ' header_body.txt Header teal 1 Hi there\n2 How are you 1 banana\n2 papaya\n3 mango Header green And here's an example with all the three types of sections: $ cat all_sections.txt\n\\:\\:\\:\nHeader\nteal\n\\:\\:\nHi there\nHow are you\n\\:\\:\nbanana\npapaya\nmango\n\\:\nFooter $ nl -w1 -s' ' all_sections.txt Header teal 1 Hi there\n2 How are you 1 banana\n2 papaya\n3 mango Footer The -b, -h and -f options control which lines should be numbered for the three types of sections. Use a to number all lines of a particular section (other features will discussed later). $ nl -w1 -s' ' -ha -fa all_sections.txt 1 Header\n2 teal 1 Hi there\n2 How are you 1 banana\n2 papaya\n3 mango 1 Footer If you use the -p option, the numbering will not be reset on encountering a new section. $ nl -p -w1 -s' ' all_sections.txt Header teal 1 Hi there\n2 How are you 3 banana\n4 papaya\n5 mango Footer $ nl -p -w1 -s' ' -ha -fa all_sections.txt 1 Header\n2 teal 3 Hi there\n4 How are you 5 banana\n6 papaya\n7 mango 8 Footer The -d option allows you to customize the two character pattern used for sections. # pattern changed from \\: to %=\n$ cat body_sep.txt\n%=%=\napple\nbanana\n%=%=\nteal\ngreen $ nl -w1 -s' ' -d'%=' body_sep.txt 1 apple\n2 banana 1 teal\n2 green","breadcrumbs":"nl » Section wise numbering","id":"148","title":"Section wise numbering"},"149":{"body":"As mentioned earlier, the -b, -h and -f options control which lines should be numbered for the three types of sections. These options accept the following arguments: a number all lines, including empty lines t number lines except empty ones (default for body sections) n do not number lines (default for header and footer sections) pBRE use basic regular expressions (BRE) to filter lines for numbering If the input doesn't have special patterns to identify the different sections, it will be treated as if it has a single body section. Here's an example to include empty lines for numbering: $ printf 'apple\\n\\nbanana\\n\\ncherry\\n' | nl -w1 -s' ' -ba\n1 apple\n2 3 banana\n4 5 cherry The -l option controls how many consecutive empty lines should be considered as a single entry. Only the last empty line of such groupings will be numbered. # only the 2nd consecutive empty line will be considered for numbering\n$ printf 'a\\n\\n\\n\\n\\nb\\n\\nc' | nl -w1 -s' ' -ba -l2\n1 a 2 3 4 b 5 c Here's an example which uses regular expressions to identify the lines to be numbered: # number lines starting with 'c' or 't'\n$ nl -w1 -s' ' -bp'^[ct]' purchases.txt\n1 coffee\n2 tea washing powder\n3 coffee\n4 toothpaste\n5 tea soap\n6 tea info See the Regular Expressions chapter from my GNU grep ebook if you want to learn more about regexp syntax and features.","breadcrumbs":"nl » Section numbering criteria","id":"149","title":"Section numbering criteria"},"15":{"body":"On a GNU/Linux based OS, you are most likely to already have GNU coreutils installed. This book covers the version 9.1 of the coreutils package. To install a newer/particular version, see the coreutils download section for details. If you are not using a Linux distribution, you may be able to access coreutils using these options: Windows Subsystem for Linux — compatibility layer for running Linux binary executables natively on Windows brew — Package Manager for macOS (or Linux)","breadcrumbs":"Introduction » Installation","id":"15","title":"Installation"},"150":{"body":"info The exercises directory has all the files used in this section. 1) nl and cat -n are always equivalent for numbering lines. True or False? 2) What does the -n option do? 3) Use nl to produce the two expected outputs shown below. $ cat greeting.txt\nHi there\nHave a nice day # expected output 1\n##### add your solution here\n001 Hi there\n002 Have a nice day # expected output 2\n##### add your solution here\n001) Hi there\n002) Have a nice day 4) Figure out the logic based on the given input and output data. $ cat s1.txt\napple\ncoffee\nfig\nhoney\nmango\npasta\nsugar\ntea ##### add your solution here\n15. apple\n13. coffee\n11. fig 9. honey 7. mango 5. pasta 3. sugar 1. tea 5) What are the three types of sections supported by nl? 6) Only number the lines that start with ---- in the format shown below. $ cat blocks.txt\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000\n----\nsky blue\ndark green\n----\nhi hello ##### add your solution here 1) ---- apple--banana mango---fig 2) ---- 3.14 -42 1000 3) ---- sky blue dark green 4) ---- hi hello 7) For the blocks.txt file, determine the logic to produce the expected output shown below. ##### add your solution here 1. apple--banana\n2. mango---fig 1. 3.14\n2. -42\n3. 1000 1. sky blue\n2. dark green 1. hi hello 8) What does the -l option do? 9) Figure out the logic based on the given input and output data. $ cat all_sections.txt\n\\:\\:\\:\nHeader\nteal\n\\:\\:\nHi there\nHow are you\n\\:\\:\nbanana\npapaya\nmango\n\\:\nFooter ##### add your solution here 1) Header 2) teal 3) Hi there 4) How are you 5) banana 6) papaya 7) mango Footer","breadcrumbs":"nl » Exercises","id":"150","title":"Exercises"},"151":{"body":"The wc command is useful to count the number of lines, words and characters for the given inputs.","breadcrumbs":"wc » wc","id":"151","title":"wc"},"152":{"body":"By default, the wc command reports the number of lines, words and bytes (in that order). The byte count includes the newline characters, so you can use that as a measure of file size as well. Here's an example: $ cat greeting.txt\nHi there\nHave a nice day $ wc greeting.txt 2 6 25 greeting.txt Wondering why there are leading spaces in the output? They help in aligning results for multiple files (discussed later).","breadcrumbs":"wc » Line, word and byte counts","id":"152","title":"Line, word and byte counts"},"153":{"body":"Instead of the three default values, you can use options to get only the particular counts you are interested in. These options are: -l for line count -w for word count -c for byte count $ wc -l greeting.txt\n2 greeting.txt $ wc -w greeting.txt\n6 greeting.txt $ wc -c greeting.txt\n25 greeting.txt $ wc -wc greeting.txt 6 25 greeting.txt With stdin data, you'll get only the count value (unless you use - for stdin). Useful for assigning the output to shell variables. $ printf 'hello' | wc -c\n5\n$ printf 'hello' | wc -c -\n5 - $ lines=$(wc -l xaa <==\n1 ==> xab <==\n1001 ==> xae <==\n4001 ==> xaj <==\n9001 $ rm x* info warning As mentioned earlier, remove the output files after every illustration.","breadcrumbs":"split » Default split","id":"160","title":"Default split"},"161":{"body":"You can use the -l option to change the number of lines to be saved in each output file. # maximum of 3 lines at a time\n$ split -l3 purchases.txt $ head x*\n==> xaa <==\ncoffee\ntea\nwashing powder ==> xab <==\ncoffee\ntoothpaste\ntea ==> xac <==\nsoap\ntea","breadcrumbs":"split » Change number of lines","id":"161","title":"Change number of lines"},"162":{"body":"The -b option allows you to split the input by the number of bytes. Similar to line based splitting, you can always reconstruct the input by concatenating the output files. This option also accepts suffixes such as K for 1024 bytes, KB for 1000 bytes, M for 1024 * 1024 bytes and so on. # maximum of 15 bytes at a time\n$ split -b15 greeting.txt $ head x*\n==> xaa <==\nHi there\nHave a\n==> xab <== nice day # when you concatenate the output files, you'll the original input\n$ cat x*\nHi there\nHave a nice day The -C option is similar to the -b option, but it will try to break on line boundaries if possible. The break will happen before the given byte limit. Here's an example where input lines do not exceed the given byte limit: $ split -C20 purchases.txt $ head x*\n==> xaa <==\ncoffee\ntea ==> xab <==\nwashing powder ==> xac <==\ncoffee\ntoothpaste ==> xad <==\ntea\nsoap\ntea $ wc -c x*\n11 xaa\n15 xab\n18 xac\n13 xad\n57 total If a line exceeds the given limit, it will be broken down into multiple parts: $ printf 'apple\\nbanana\\n' | split -C4 $ head x*\n==> xaa <==\nappl\n==> xab <==\ne ==> xac <==\nbana\n==> xad <==\nna $ cat x*\napple\nbanana","breadcrumbs":"split » Split by byte count","id":"162","title":"Split by byte count"},"163":{"body":"The -n option has several features. If you pass only a numeric argument N, the given input file will be divided into N chunks. The output files will be roughly the same size. # divide the file into 2 parts\n$ split -n2 purchases.txt\n$ head x*\n==> xaa <==\ncoffee\ntea\nwashing powder\nco\n==> xab <==\nffee\ntoothpaste\ntea\nsoap\ntea # the two output files are roughly the same size\n$ wc x* 3 5 28 xaa 5 5 29 xab 8 10 57 total warning Since the division is based on file size, stdin data cannot be used. Newer versions of the coreutils package supports this use case by creating a temporary file before splitting. $ seq 6 | split -n2\nsplit: -: cannot determine file size By using K/N as the argument, you can view the Kth chunk of N parts on stdout. No output file will be created in this scenario. # divide the input into 2 parts\n# view only the 1st chunk on stdout\n$ split -n1/2 greeting.txt\nHi there\nHav To avoid splitting a line, use l/ as a prefix. Quoting from the manual : For l mode, chunks are approximately input size / N. The input is partitioned into N equal sized portions, with the last assigned any excess. If a line starts within a partition it is written completely to the corresponding file. Since lines or records are not split even if they overlap a partition, the files written can be larger or smaller than the partition size, and even empty if a line/record is so long as to completely overlap the partition. # divide input into 2 parts, but don't split lines\n$ split -nl/2 purchases.txt\n$ head x*\n==> xaa <==\ncoffee\ntea\nwashing powder\ncoffee ==> xab <==\ntoothpaste\ntea\nsoap\ntea Here's an example to view the Kth chunk without splitting lines: # 2nd chunk of 3 parts without splitting lines\n$ split -nl/2/3 sample.txt 7) Believe it 8) 9) banana\n10) papaya\n11) mango","breadcrumbs":"split » Divide based on file size","id":"163","title":"Divide based on file size"},"164":{"body":"The -n option will also help you create output files with interleaved lines. Since this is based on the line separator and not file size, stdin data can also be used. Use the r/ prefix to enable this feature. # two parts, lines distributed in round robin fashion\n$ seq 5 | split -nr/2 $ head x*\n==> xaa <==\n1\n3\n5 ==> xab <==\n2\n4 Here's an example to view the Kth chunk: $ split -nr/1/3 sample.txt 1) Hello World 4) How are you 7) Believe it\n10) papaya\n13) Much ado about nothing","breadcrumbs":"split » Interleaved lines","id":"164","title":"Interleaved lines"},"165":{"body":"You can use the -t option to specify a single byte character as the line separator. Use \\0 to specify NUL as the separator. Depending on your shell you can use ANSI-C quoting to use escapes like \\t instead of a literal tab character. $ printf 'apple\\nbanana\\n;mango\\npapaya\\n' | split -t';' -l1 $ head x*\n==> xaa <==\napple\nbanana\n;\n==> xab <==\nmango\npapaya","breadcrumbs":"split » Custom line separator","id":"165","title":"Custom line separator"},"166":{"body":"As seen earlier, x is the default prefix for output filenames. To change this prefix, pass an argument after the input source. # choose prefix as 'op_' instead of 'x'\n$ split -l1 greeting.txt op_ $ head op_*\n==> op_aa <==\nHi there ==> op_ab <==\nHave a nice day The -a option controls the length of the suffix. You'll get an error if this length isn't enough to cover all the output files. In such a case, you'll still get output files that can fit within the given length. $ seq 10 | split -l1 -a1\n$ ls x*\nxa xb xc xd xe xf xg xh xi xj\n$ rm x* $ seq 10 | split -l1 -a3\n$ ls x*\nxaaa xaab xaac xaad xaae xaaf xaag xaah xaai xaaj\n$ rm x* $ seq 100 | split -l1 -a1\nsplit: output file suffixes exhausted\n$ ls x*\nxa xc xe xg xi xk xm xo xq xs xu xw xy\nxb xd xf xh xj xl xn xp xr xt xv xx xz\n$ rm x* You can use the -d option to use numeric suffixes, starting from 00 (length can be changed using the -a option). You can use the long option --numeric-suffixes to specify a different starting number. $ seq 10 | split -l1 -d\n$ ls x*\nx00 x01 x02 x03 x04 x05 x06 x07 x08 x09\n$ rm x* $ seq 10 | split -l2 --numeric-suffixes=10\n$ ls x*\nx10 x11 x12 x13 x14 Use -x and --hex-suffixes options for hexadecimal numbering. $ seq 10 | split -l1 --hex-suffixes=8\n$ ls x*\nx08 x09 x0a x0b x0c x0d x0e x0f x10 x11 You can use the --additional-suffix option to add a constant string at the end of filenames. $ seq 10 | split -l2 -a1 --additional-suffix='.log'\n$ ls x*\nxa.log xb.log xc.log xd.log xe.log\n$ rm x* $ seq 10 | split -l2 -a1 -d --additional-suffix='.txt' - num_\n$ ls num_*\nnum_0.txt num_1.txt num_2.txt num_3.txt num_4.txt","breadcrumbs":"split » Customize filenames","id":"166","title":"Customize filenames"},"167":{"body":"You can sometimes end up with empty files. For example, trying to split into more parts than possible with the given criteria. In such cases, you can use the -e option to prevent empty files in the output. The split command will ensure that the filenames are sequential even if files in the middle are empty. # 'xac' is empty in this example\n$ split -nl/3 greeting.txt\n$ head x*\n==> xaa <==\nHi there ==> xab <==\nHave a nice day ==> xac <== $ rm x* # prevent empty files\n$ split -e -nl/3 greeting.txt\n$ head x*\n==> xaa <==\nHi there ==> xab <==\nHave a nice day","breadcrumbs":"split » Exclude empty files","id":"167","title":"Exclude empty files"},"168":{"body":"The --filter option will allow you to apply another command on the intermediate split results before saving the output files. Use $FILE to refer to the output filename of the intermediate parts. Here's an example of compressing the results: $ split -l1 --filter='gzip > $FILE.gz' greeting.txt $ ls x*\nxaa.gz xab.gz $ zcat xaa.gz\nHi there\n$ zcat xab.gz\nHave a nice day Here's an example of ignoring the first line of the results: $ cat body_sep.txt\n%=%=\napple\nbanana\n%=%=\nred\ngreen $ split -l3 --filter='tail -n +2 > $FILE' body_sep.txt $ head x*\n==> xaa <==\napple\nbanana ==> xab <==\nred\ngreen","breadcrumbs":"split » Process parts through another command","id":"168","title":"Process parts through another command"},"169":{"body":"info The exercises directory has all the files used in this section. info Remove the output files after every exercise. 1) Split the s1.txt file 3 lines at a time. ##### add your solution here $ head xa?\n==> xaa <==\napple\ncoffee\nfig ==> xab <==\nhoney\nmango\npasta ==> xac <==\nsugar\ntea $ rm xa? 2) Use appropriate options to get the output shown below. $ echo 'apple,banana,cherry,dates' | ##### add your solution here $ head xa?\n==> xaa <==\napple,\n==> xab <==\nbanana,\n==> xac <==\ncherry,\n==> xad <==\ndates $ rm xa? 3) What do the -b and -C options do? 4) Display the 2nd chunk of the ip.txt file after splitting it 4 times as shown below. ##### add your solution here\ncome back before the sky turns dark There are so many delights to cherish 5) What does the r prefix do when used with the -n option? 6) Split the ip.txt file 2 lines at a time. Customize the output filenames as shown below. ##### add your solution here $ head ip_*\n==> ip_0.txt <==\nit is a warm and cozy day\nlisten to what I say ==> ip_1.txt <==\ngo play in the park\ncome back before the sky turns dark ==> ip_2.txt <== There are so many delights to cherish ==> ip_3.txt <==\nApple, Banana and Cherry\nBread, Butter and Jelly ==> ip_4.txt <==\nTry them all before you perish $ rm ip_* 7) Which option would you use to prevent empty files in the output? 8) Split the items.txt file 5 lines at a time. Additionally, remove lines starting with a digit character as shown below. $ cat items.txt\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 ##### add your solution here $ head xa?\n==> xaa <==\napple 5\nbanana 10\ngreen ==> xab <==\nsky blue\ndragon 3\nunicorn 42 $ rm xa?","breadcrumbs":"split » Exercises","id":"169","title":"Exercises"},"17":{"body":"cat derives its name from con cat enation and provides other nifty options too. tac helps you to reverse the input line wise, usually used for further text processing.","breadcrumbs":"cat and tac » cat and tac","id":"17","title":"cat and tac"},"170":{"body":"The csplit command is useful to divide the input into smaller parts based on line numbers and regular expression patterns. Similar to split, this command also supports customizing output filenames. info Since a lot of output files will be generated in this chapter (often with same filenames), remove these files after every illustration.","breadcrumbs":"csplit » csplit","id":"170","title":"csplit"},"171":{"body":"You can split the input into two based on a particular line number. To do so, specify the line number after the input source (filename or stdin data). The first output file will have the input lines before the given line number and the second output file will have the rest of the contents. By default, the output files will be named xx00, xx01, xx02, and so on (where xx is the prefix). The numerical suffix will automatically use more digits if needed. You'll see examples with more than two output files later. # split input into two based on line number 4\n$ seq 10 | csplit - 4\n6\n15 # first output file will have the first 3 lines\n# second output file will have the rest\n$ head xx*\n==> xx00 <==\n1\n2\n3 ==> xx01 <==\n4\n5\n6\n7\n8\n9\n10 $ rm xx* info As seen in the example above, csplit will also display the number of bytes written for each output file. You can use the -q option to suppress this message. info warning As mentioned earlier, remove the output files after every illustration.","breadcrumbs":"csplit » Split on Nth line","id":"171","title":"Split on Nth line"},"172":{"body":"You can also split the input based on a line matching the given regular expression. The output produced will vary based on the // or %% delimiters being used to surround the regexp. When /regexp/ is used, output is similar to the line number based splitting. The first output file will have the input lines before the first occurrence of a line matching the given regexp and the second output file will have the rest of the contents. # match a line containing 't' followed by zero or more characters and then 'p'\n# 'toothpaste' is the only match for this input file\n$ csplit -q purchases.txt '/t.*p/' $ head xx*\n==> xx00 <==\ncoffee\ntea\nwashing powder\ncoffee ==> xx01 <==\ntoothpaste\ntea\nsoap\ntea When %regexp% is used, the lines occurring before the matching line won't be part of the output. Only the line matching the given regexp and the rest of the contents will be part of the single output file. $ csplit -q purchases.txt '%t.*p%' $ cat xx00\ntoothpaste\ntea\nsoap\ntea warning You'll get an error if the given regexp isn't found in the input. $ csplit -q purchases.txt '/xyz/'\ncsplit: ‘/xyz/’: match not found info See the Regular Expressions chapter from my GNU grep ebook if you want to learn more about regexp syntax and features.","breadcrumbs":"csplit » Split on regexp","id":"172","title":"Split on regexp"},"173":{"body":"You can also provide offset numbers that'll affect where the matching line and its surrounding lines should be placed. When the offset is greater than zero, the split will happen that many lines after the matching line. The default offset is zero. # when the offset is '1', the matching line will be part of the first file\n$ csplit -q purchases.txt '/t.*p/1'\n$ head xx*\n==> xx00 <==\ncoffee\ntea\nwashing powder\ncoffee\ntoothpaste ==> xx01 <==\ntea\nsoap\ntea # matching line and 1 line after won't be part of the output\n$ csplit -q purchases.txt '%t.*p%2'\n$ cat xx00\nsoap\ntea When the offset is less than zero, the split will happen that many lines before the matching line. # 2 lines before the matching line will be part of the second file\n$ csplit -q purchases.txt '/t.*p/-2'\n$ head xx*\n==> xx00 <==\ncoffee\ntea ==> xx01 <==\nwashing powder\ncoffee\ntoothpaste\ntea\nsoap\ntea warning You'll get an error if the offset goes beyond the number of lines available in the input. $ csplit -q purchases.txt '/t.*p/5'\ncsplit: ‘/t.*p/5’: line number out of range $ csplit -q purchases.txt '/t.*p/-5'\ncsplit: ‘/t.*p/-5’: line number out of range","breadcrumbs":"csplit » Regexp offset","id":"173","title":"Regexp offset"},"174":{"body":"You can perform line number and regexp based split more than once by adding the {N} argument after the pattern. Default behavior examples seen so far is same as specifying {0}. Any number greater than zero will result in that many more splits. # {1} means split one time more than the default split\n# so, two splits in total and three output files\n# in this example, split happens on the 4th and 8th line numbers\n$ seq 10 | csplit -q - 4 '{1}' $ head xx*\n==> xx00 <==\n1\n2\n3 ==> xx01 <==\n4\n5\n6\n7 ==> xx02 <==\n8\n9\n10 Here's an example with regexp: $ cat log.txt\n--> warning 1\na,b,c,d\n42\n--> warning 2\nx,y,z\n--> warning 3\n4,3,1 # split on the third (2+1) occurrence of a line containing 'warning'\n$ csplit -q log.txt '%warning%' '{2}'\n$ cat xx00\n--> warning 3\n4,3,1 As a special case, you can use {*} to repeat the split until the input is exhausted. This is especially useful with the /regexp/ form of splitting. Here's an example: # split on all lines matching 'paste' or 'powder'\n$ csplit -q purchases.txt '/paste\\|powder/' '{*}'\n$ head xx*\n==> xx00 <==\ncoffee\ntea ==> xx01 <==\nwashing powder\ncoffee ==> xx02 <==\ntoothpaste\ntea\nsoap\ntea warning You'll get an error if the repeat count goes beyond the number of matches possible with the given input. $ seq 10 | csplit -q - 4 '{2}'\ncsplit: ‘4’: line number out of range on repetition 2 $ csplit -q purchases.txt '/tea/' '{4}'\ncsplit: ‘/tea/’: match not found on repetition 3","breadcrumbs":"csplit » Repeat split","id":"174","title":"Repeat split"},"175":{"body":"By default, csplit will remove the created output files if there's an error or a signal that causes the command to stop. You can use the -k option to keep such files. One use case is line number based splitting with the {*} modifier. $ seq 7 | csplit -q - 4 '{*}'\ncsplit: ‘4’: line number out of range on repetition 1\n$ ls xx*\nls: cannot access 'xx*': No such file or directory # -k option will allow you to retain the created files\n$ seq 7 | csplit -qk - 4 '{*}'\ncsplit: ‘4’: line number out of range on repetition 1\n$ head xx*\n==> xx00 <==\n1\n2\n3 ==> xx01 <==\n4\n5\n6\n7","breadcrumbs":"csplit » Keep files on error","id":"175","title":"Keep files on error"},"176":{"body":"The --suppress-matched option will suppress the lines matching the split condition. $ seq 5 | csplit -q --suppress-matched - 3\n# 3rd line won't be part of the output\n$ head xx*\n==> xx00 <==\n1\n2 ==> xx01 <==\n4\n5 $ rm xx* $ seq 10 | csplit -q --suppress-matched - 4 '{1}'\n# 4th and 8th lines won't be part of the output\n$ head xx*\n==> xx00 <==\n1\n2\n3 ==> xx01 <==\n5\n6\n7 ==> xx02 <==\n9\n10 Here's an example with regexp based split: $ csplit -q --suppress-matched purchases.txt '/soap\\|powder/' '{*}'\n# lines matching 'soap' or 'powder' won't be part of the output\n$ head xx*\n==> xx00 <==\ncoffee\ntea ==> xx01 <==\ncoffee\ntoothpaste\ntea ==> xx02 <==\ntea Here's another example: $ seq 11 16 | csplit -q --suppress-matched - '/[35]/' '{1}'\n# lines matching '3' or '5' won't be part of the output\n$ head xx*\n==> xx00 <==\n11\n12 ==> xx01 <==\n14 ==> xx02 <==\n16 $ rm xx*","breadcrumbs":"csplit » Suppress matched lines","id":"176","title":"Suppress matched lines"},"177":{"body":"There are various cases that can result in empty output files. For example, first or last line matching the given split condition. Another possibility is the --suppress-matched option combined with consecutive lines matching during multiple splits. Here's an example: $ csplit -q --suppress-matched purchases.txt '/coffee\\|tea/' '{*}' $ head xx*\n==> xx00 <== ==> xx01 <== ==> xx02 <==\nwashing powder ==> xx03 <==\ntoothpaste ==> xx04 <==\nsoap ==> xx05 <== You can use the -z option to exclude empty files from the output. The suffix numbering will be automatically adjusted in such cases. $ csplit -qz --suppress-matched purchases.txt '/coffee\\|tea/' '{*}' $ head xx*\n==> xx00 <==\nwashing powder ==> xx01 <==\ntoothpaste ==> xx02 <==\nsoap","breadcrumbs":"csplit » Exclude empty files","id":"177","title":"Exclude empty files"},"178":{"body":"As seen earlier, xx is the default prefix for output filenames. Use the -f option to change this prefix. $ seq 4 | csplit -q -f'num_' - 3 $ head num_*\n==> num_00 <==\n1\n2 ==> num_01 <==\n3\n4 The -n option controls the length of the numeric suffix. The suffix length will automatically increment if filenames are exhausted. $ seq 4 | csplit -q -n1 - 3\n$ ls xx*\nxx0 xx1\n$ rm xx* $ seq 4 | csplit -q -n3 - 3\n$ ls xx*\nxx000 xx001 The -b option allows you to control the suffix using the printf formatting. Quoting from the manual : When this option is specified, the suffix string must include exactly one printf(3)-style conversion specification, possibly including format specification flags, a field width, a precision specifications, or all of these kinds of modifiers. The format letter must convert a binary unsigned integer argument to readable form. The format letters d and i are aliases for u, and the u, o, x, and X conversions are allowed. Here are some examples: # hexadecimal numbering\n# minimum two digits, zero filled\n$ seq 100 | csplit -q -b'%02x' - 3 '{20}'\n$ ls xx*\nxx00 xx02 xx04 xx06 xx08 xx0a xx0c xx0e xx10 xx12 xx14\nxx01 xx03 xx05 xx07 xx09 xx0b xx0d xx0f xx11 xx13 xx15\n$ rm xx* # custom prefix and suffix around decimal numbering\n# default minimum is a single digit\n$ seq 20 | csplit -q -f'num_' -b'%d.txt' - 3 '{4}'\n$ ls num_*\nnum_0.txt num_1.txt num_2.txt num_3.txt num_4.txt num_5.txt info Note that the -b option will override the -n option. See man 3 printf for more details about the formatting options.","breadcrumbs":"csplit » Customize filenames","id":"178","title":"Customize filenames"},"179":{"body":"info The exercises directory has all the files used in this section. info Remove the output files after every exercise. 1) Split the blocks.txt file such that the first 7 lines are in the first file and the rest are in the second file as shown below. ##### add your solution here $ head xx*\n==> xx00 <==\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000 ==> xx01 <==\n----\nsky blue\ndark green\n----\nhi hello $ rm xx* 2) Split the input file items.txt such that the text before a line containing colors is part of the first file and the rest are part of the second file as shown below. ##### add your solution here $ head xx*\n==> xx00 <==\n1) fruits\napple 5\nbanana 10 ==> xx01 <==\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx* 3) Split the input file items.txt such that the line containing magical and all the lines that come after are part of the single output file. ##### add your solution here $ cat xx00\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx00 4) Split the input file items.txt such that the line containing colors as well the line that comes after are part of the first output file. ##### add your solution here $ head xx*\n==> xx00 <==\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen ==> xx01 <==\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx* 5) Split the input file items.txt on the line that comes before a line containing magical. Generate only a single output file as shown below. ##### add your solution here $ cat xx00\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx00 6) Split the input file blocks.txt on the 4th occurrence of a line starting with the - character. Generate only a single output file as shown below. ##### add your solution here $ cat xx00\n----\nsky blue\ndark green\n----\nhi hello $ rm xx00 7) For the input file blocks.txt, determine the logic to produce the expected output shown below. ##### add your solution here $ head xx*\n==> xx00 <==\napple--banana\nmango---fig ==> xx01 <==\n3.14\n-42\n1000 ==> xx02 <==\nsky blue\ndark green ==> xx03 <==\nhi hello $ rm xx* 8) What does the -k option do? 9) Split the books.txt file on every line as shown below. ##### add your solution here\ncsplit: ‘1’: line number out of range on repetition 3 $ head row_*\n==> row_0 <==\nCradle:::Mage Errant::The Weirkey Chronicles ==> row_1 <==\nMother of Learning::Eight:::::Dear Spellbook:Ascendant ==> row_2 <==\nMark of the Fool:Super Powereds:::Ends of Magic $ rm row_* 10) Split the items.txt file on lines starting with a digit character. Matching lines shouldn't be part of the output and the files should be named group_0.txt, group_1.txt and so on. ##### add your solution here $ head group_*\n==> group_0.txt <==\napple 5\nbanana 10 ==> group_1.txt <==\ngreen\nsky blue ==> group_2.txt <==\ndragon 3\nunicorn 42 $ rm group_*","breadcrumbs":"csplit » Exercises","id":"179","title":"Exercises"},"18":{"body":"Yeah, cat can be used to write contents to a file by typing them from the terminal itself. If you invoke cat without providing file arguments or stdin data from a pipe, it will wait for you to type the content. After you are done typing all the text you want to save, press Enter and then the Ctrl+d key combinations. If you don't want the last line to have a newline character, press Ctrl+d twice instead of Enter and Ctrl+d. See also unix.stackexchange: difference between Ctrl+c and Ctrl+d . # press Enter and Ctrl+d after typing all the required characters\n$ cat > greeting.txt\nHi there\nHave a nice day In the above example, the output of cat is redirected to a file named greeting.txt. If you don't redirect the stdout data, each line will be echoed as you type. You can check the contents of the file you just created by using cat again. $ cat greeting.txt\nHi there\nHave a nice day Here Documents is another popular way to create such files. In this case, the termination condition is a line matching a predefined string which is specified after the << redirection operator. This is especially helpful for automation, since pressing Ctrl+d interactively isn't desirable. Here's an example: # > and a space at the start of lines represents the secondary prompt PS2\n# don't type them in a shell script\n# EOF is typically used as the identifier\n$ cat << 'EOF' > fruits.txt\n> banana\n> papaya\n> mango\n> EOF $ cat fruits.txt\nbanana\npapaya\nmango The termination string is enclosed in single quotes to prevent parameter expansion, command substitution, etc. You can also use \\string for this purpose. If you use <<- instead of <<, you can use leading tab characters for indentation purposes. See bash manual: Here Documents and stackoverflow: here-documents for more examples and details. info Note that creating files as shown above isn't restricted to cat, it can be applied to any command waiting for stdin. # 'tr' converts lowercase alphabets to uppercase in this example\n$ tr 'a-z' 'A-Z' << 'end' > op.txt\n> hi there\n> have a nice day\n> end $ cat op.txt\nHI THERE\nHAVE A NICE DAY","breadcrumbs":"cat and tac » Creating text files","id":"18","title":"Creating text files"},"180":{"body":"These two commands will help you convert tabs to spaces and vice versa. Both these commands support options to customize the width of tab stops and which occurrences should be converted.","breadcrumbs":"expand and unexpand » expand and unexpand","id":"180","title":"expand and unexpand"},"181":{"body":"The expand command converts tab characters to space characters. The default expansion aligns at multiples of 8 columns (calculated in terms of bytes). # sample stdin data\n$ printf 'apple\\tbanana\\tcherry\\na\\tb\\tc\\n' | cat -T\napple^Ibanana^Icherry\na^Ib^Ic\n# 'apple' = 5 bytes, \\t converts to 3 spaces\n# 'banana' = 6 bytes, \\t converts to 2 spaces\n# 'a' and 'b' = 1 byte, \\t converts to 7 spaces\n$ printf 'apple\\tbanana\\tcherry\\na\\tb\\tc\\n' | expand\napple banana cherry\na b c # 'αλε' = 6 bytes, \\t converts to 2 spaces\n$ printf 'αλε\\tπού\\n' | expand\nαλε πού Here's an example with strings of size 7 and 8 bytes before the tab character: $ printf 'deviate\\treached\\nbackdrop\\toverhang\\n' | expand\ndeviate reached\nbackdrop overhang The expand command also considers backspace characters to determine the number of spaces needed. # sample input with a backspace character\n$ printf 'cart\\bd\\tbard\\n' | cat -t\ncart^Hd^Ibard # 'card' = 4 bytes, \\t converts to 4 spaces\n$ printf 'cart\\bd\\tbard\\n' | expand\ncard bard\n$ printf 'cart\\bd\\tbard\\n' | expand | cat -t\ncart^Hd bard info expand will concatenate multiple files passed as input source, so cat will not be needed for such cases.","breadcrumbs":"expand and unexpand » Default expand","id":"181","title":"Default expand"},"182":{"body":"You can use the -i option to convert only the tab characters present at the start of a line. The first occurrence of a character that is not tab or space characters will stop the expansion. # 'a' present at the start of line is not a tab/space character\n# so no tabs are expanded for this input\n$ printf 'a\\tb\\tc\\n' | expand -i | cat -T\na^Ib^Ic # the first \\t gets expanded here, 'a' stops further expansion\n$ printf '\\ta\\tb\\tc\\n' | expand -i | cat -T a^Ib^Ic # first two \\t gets expanded here, 'a' stops further expansion\n# presence of space characters will not stop the expansion\n$ printf '\\t \\ta\\tb\\tc\\n' | expand -i | cat -T a^Ib^Ic","breadcrumbs":"expand and unexpand » Expand only the initial tabs","id":"182","title":"Expand only the initial tabs"},"183":{"body":"You can use the -t option to control the expansion width. Default is 8 as seen in the previous examples. This option provides various features. Here's an example where all the tab characters are converted equally to the given width: $ cat -T code.py\ndef compute(x, y):\n^Iif x > y:\n^I^Iprint('hello')\n^Ielse:\n^I^Iprint('bye') $ expand -t 2 code.py\ndef compute(x, y): if x > y: print('hello') else: print('bye') You can provide multiple widths separated by a comma character. In such a case, the given widths determine the stop locations for those many tab characters. These stop values refer to absolute positions from the start of the line, not the number of spaces they can expand to. Rest of the tab characters will be expanded to a single space character. # first tab character can expand till the 3rd column\n# second tab character can expand till the 7th column\n# rest of the tab characters will be expanded to a single space\n$ printf 'a\\tb\\tc\\td\\te\\n' | expand -t 3,7\na b c d e # here are two more examples with the same specification as above\n# second tab expands to two spaces to end at the 7th column\n$ printf 'a\\tbb\\tc\\td\\te\\n' | expand -t 3,7\na bb c d e\n# second tab expands to a single space since it goes beyond the 7th column\n$ printf 'a\\tbbbbbbbb\\tc\\td\\te\\n' | expand -t 3,7\na bbbbbbbb c d e If you prefix a / character to the last width, the remaining tab characters will use multiple of this position instead of a single space default. # first tab character can expand till the 3rd column\n# remaining tab characters can expand till 7/14/21/etc\n$ printf 'a\\tb\\tc\\td\\te\\tf\\tg\\n' | expand -t 3,/7\na b c d e f g # first tab character can expand till the 3rd column\n# second tab character can expand till the 7th column\n# remaining tab characters can expand till 10/15/20/etc\n$ printf 'a\\tb\\tc\\td\\te\\tf\\tg\\n' | expand -t 3,7,/5\na b c d e f g If you use + instead of / as the prefix for the last width, the multiple calculation will use the second last width as an offset. # first tab character can expand till the 3rd column\n# 3+7=10, so remaining tab characters can expand till 10/17/24/etc\n$ printf 'a\\tb\\tc\\td\\te\\tf\\tg\\n' | expand -t 3,+7\na b c d e f g # first tab character can expand till the 3rd column\n# second tab character can expand till the 7th column\n# 7+5=12, so remaining tab characters can expand till 12/17/22/etc\n$ printf 'a\\tb\\tc\\td\\te\\tf\\tg\\n' | expand -t 3,7,+5\na b c d e f g","breadcrumbs":"expand and unexpand » Customize the tab stop width","id":"183","title":"Customize the tab stop width"},"184":{"body":"By default, the unexpand command converts initial blank characters (space or tab) to tabs. The first occurrence of a non-blank character will stop the conversion. By default, every 8 columns worth of blanks is converted to a tab. # input is 8 spaces followed by 'a' and then more characters\n# the initial 8 spaces is converted to a tab character\n# 'a' stops any further conversion, since it is a non-blank character\n$ printf ' a b c\\n' | unexpand | cat -T\n^Ia b c # input is 9 spaces followed by 'a' and then more characters\n# the initial 8 spaces are converted to a tab character\n# remaining space is left as is\n$ printf ' a b c\\n' | unexpand | cat -T\n^I a b c # input has 16 initial spaces, gets converted to two tabs\n$ printf '\\t\\ta\\tb\\tc\\n' | expand | unexpand | cat -T\n^I^Ia b c # input has 4 spaces and a tab character (that expands till the 8th column)\n# output will have a single tab character at the start\n$ printf ' \\ta b\\n' | unexpand | cat -T\n^Ia b info The current locale determines which characters are considered as blanks. Also, unexpand will concatenate multiple files passed as input source, so cat will not be needed for such cases.","breadcrumbs":"expand and unexpand » Default unexpand","id":"184","title":"Default unexpand"},"185":{"body":"The -a option will allow you to convert all sequences of two or more blanks at tab boundaries. Here are some examples: # default unexpand stops at the first non-blank character\n$ printf ' a b c\\n' | unexpand | cat -T\n^Ia b c\n# -a option will convert all sequences of blanks at tab boundaries\n$ printf ' a b c\\n' | unexpand -a | cat -T\n^Ia^Ib^Ic # only two or more consecutive blanks are considered for conversion\n$ printf 'riddled reached\\n' | unexpand -a | cat -T\nriddled reached\n$ printf 'riddle reached\\n' | unexpand -a | cat -T\nriddle^Ireached # blanks at non-tab boundaries won't be converted\n$ printf 'oh hi hello\\n' | unexpand -a | cat -T\noh hi^Ihello The unexpand command also considers backspace characters to determine the tab boundary. # 'card' = 4 bytes, so the 4 spaces gets converted to a tab\n$ printf 'cart\\bd bard\\n' | unexpand -a | cat -T\ncard^Ibard\n$ printf 'cart\\bd bard\\n' | unexpand -a | cat -t\ncart^Hd^Ibard","breadcrumbs":"expand and unexpand » Unexpand all blanks","id":"185","title":"Unexpand all blanks"},"186":{"body":"The -t option has the same features as seen with the expand command. The -a option is also implied when this option is used. Here's an example of changing the tab stop width to 2: $ printf '\\ta\\n\\t\\tb\\n' | expand -t 2 a b $ printf '\\ta\\n\\t\\tb\\n' | expand -t 2 | unexpand -t 2 | cat -T\n^Ia\n^I^Ib Here are some examples with multiple tab widths: $ printf 'a\\tb\\tc\\td\\te\\n' | expand -t 3,7\na b c d e\n$ printf 'a b c d e\\n' | unexpand -t 3,7 | cat -T\na^Ib^Ic d e\n$ printf 'a\\tb\\tc\\td\\te\\n' | expand -t 3,7 | unexpand -t 3,7 | cat -T\na^Ib^Ic d e $ printf 'a\\tb\\tc\\td\\te\\tf\\n' | expand -t 3,/7\na b c d e f\n$ printf 'a b c d e f\\n' | unexpand -t 3,/7 | cat -T\na^Ib^Ic^Id^Ie^If $ printf 'a\\tb\\tc\\td\\te\\tf\\n' | expand -t 3,+7\na b c d e f\n$ printf 'a b c d e f\\n' | unexpand -t 3,+7 | cat -T\na^Ib^Ic^Id^Ie^If","breadcrumbs":"expand and unexpand » Change the tab stop width","id":"186","title":"Change the tab stop width"},"187":{"body":"info The exercises directory has all the files used in this section. 1) The items.txt file has space separated words. Convert the spaces to be aligned at 10 column widths as shown below. $ cat items.txt\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 ##### add your solution here\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 2) What does the expand -i option do? 3) Expand the first tab character to stop at the 10th column and the second one at the 16th column. Rest of the tabs should be converted to a single space character. $ printf 'app\\tfix\\tjoy\\tmap\\ttap\\n' | ##### add your solution here\napp fix joy map tap $ printf 'appleseed\\tfig\\tjoy\\n' | ##### add your solution here\nappleseed fig joy $ printf 'a\\tb\\tc\\td\\te\\n' | ##### add your solution here\na b c d e 4) Will the following code give back the original input? If not, is there an option that can help? $ printf 'a\\tb\\tc\\n' | expand | unexpand 5) How do the + and / prefix modifiers affect the -t option?","breadcrumbs":"expand and unexpand » Exercises","id":"187","title":"Exercises"},"188":{"body":"These handy commands allow you to extract filenames and directory portions of the given paths. You could also use Parameter Expansion or cut, sed, awk, etc for such purposes. The advantage is that these commands will also handle corner cases like trailing slashes and there are handy features like removing file extensions.","breadcrumbs":"basename and dirname » basename and dirname","id":"188","title":"basename and dirname"},"189":{"body":"By default, the basename command will remove the leading directory component from the given path argument. Any trailing slashes will be removed before determining the portion to be extracted. $ basename /home/learnbyexample/example_files/scores.csv\nscores.csv # quote the arguments when needed\n$ basename 'path with spaces/report.log'\nreport.log # one or more trailing slashes will not affect the output\n$ basename /home/learnbyexample/example_files/\nexample_files If there's no leading directory component or if slash alone is the input, the argument will be returned as is after removing any trailing slashes. $ basename filename.txt\nfilename.txt\n$ basename /\n/","breadcrumbs":"basename and dirname » Extract filename from paths","id":"189","title":"Extract filename from paths"},"19":{"body":"Here are some examples to showcase cat's main utility. One or more files can be passed as arguments. $ cat greeting.txt fruits.txt nums.txt\nHi there\nHave a nice day\nbanana\npapaya\nmango\n3.14\n42\n1000 info Visit the cli_text_processing_coreutils repo to get all the example files used in this book. To save the output of concatenation, use the shell's redirection features. $ cat greeting.txt fruits.txt nums.txt > op.txt $ cat op.txt\nHi there\nHave a nice day\nbanana\npapaya\nmango\n3.14\n42\n1000","breadcrumbs":"cat and tac » Concatenate files","id":"19","title":"Concatenate files"},"190":{"body":"You can use the -s option to remove a suffix from the filename. Usually used to remove the file extension. $ basename -s'.csv' /home/learnbyexample/example_files/scores.csv\nscores $ basename -s'_2' final_report.txt_2\nfinal_report.txt $ basename -s'.tar.gz' /backups/jan_2021.tar.gz\njan_2021 $ basename -s'.txt' purchases.txt.txt\npurchases.txt # -s will be ignored if it would have resulted in an empty output\n$ basename -s'report' /backups/report\nreport You can also pass the suffix to be removed after the path argument, but the -s option is preferred as it makes the intention clearer and works for multiple path arguments. $ basename example_files/scores.csv .csv\nscores","breadcrumbs":"basename and dirname » Remove file extension","id":"190","title":"Remove file extension"},"191":{"body":"By default, the dirname command removes the trailing path component (after removing any trailing slashes). $ dirname /home/learnbyexample/example_files/scores.csv\n/home/learnbyexample/example_files # one or more trailing slashes will not affect the output\n$ dirname /home/learnbyexample/example_files/\n/home/learnbyexample","breadcrumbs":"basename and dirname » Remove filename from path","id":"191","title":"Remove filename from path"},"192":{"body":"The dirname command accepts multiple path arguments by default. The basename command requires -a or -s (which implies -a) to work with multiple arguments. $ basename -a /backups/jan_2021.tar.gz /home/learnbyexample/report.log\njan_2021.tar.gz\nreport.log # -a is implied when the -s option is used\n$ basename -s'.txt' logs/purchases.txt logs/report.txt\npurchases\nreport # dirname accepts multiple path arguments by default\n$ dirname /home/learnbyexample/example_files/scores.csv ../report/backups/\n/home/learnbyexample/example_files\n../report","breadcrumbs":"basename and dirname » Multiple arguments","id":"192","title":"Multiple arguments"},"193":{"body":"You can use shell features like command substitution to combine the effects of the basename and dirname commands. # extract the second last path component\n$ basename $(dirname /home/learnbyexample/example_files/scores.csv)\nexample_files","breadcrumbs":"basename and dirname » Combining basename and dirname","id":"193","title":"Combining basename and dirname"},"194":{"body":"Use the -z option if you want to use NUL character as the output path separator. $ basename -zs'.txt' logs/purchases.txt logs/report.txt | cat -v\npurchases^@report^@ $ basename -z logs/purchases.txt | cat -v\npurchases.txt^@ $ dirname -z example_files/scores.csv ../report/backups/ | cat -v\nexample_files^@../report^@","breadcrumbs":"basename and dirname » NUL separator","id":"194","title":"NUL separator"},"195":{"body":"1) Is the following command valid? If so, what would be the output? $ basename -s.txt ~///test.txt/// 2) Given the file path in the shell variable p, how'd you obtain the outputs shown below? $ p='~/projects/square_tictactoe/python/game.py'\n##### add your solution here\n~/projects/square_tictactoe $ p='/backups/jan_2021.tar.gz'\n##### add your solution here\n/ 3) What would be the output of the basename command if the input has no leading directory component or only has the / character? 4) For the paths stored in the shell variable p, how'd you obtain the outputs shown below? $ p='/a/b/ip.txt /c/d/e/f/op.txt' # expected output 1\n##### add your solution here\nip\nop # expected output 2\n##### add your solution here\n/a/b\n/c/d/e/f 5) Given the file path in the shell variable p, how'd you obtain the outputs shown below? $ p='~/projects/python/square_tictactoe/game.py'\n##### add your solution here\nsquare_tictactoe $ p='/backups/aug_2024/ip.tar.gz'\n##### add your solution here\naug_2024","breadcrumbs":"basename and dirname » Exercises","id":"195","title":"Exercises"},"196":{"body":"Hope you've found this book interesting and useful. There are plenty of general purpose and specialized text processing tools. Here's a list of books I've written: CLI text processing with GNU grep and ripgrep CLI text processing with GNU sed CLI text processing with GNU awk Ruby One-Liners Guide Perl One-Liners Guide Command line text processing with Rust tools See also my curated list of resources on Linux CLI and Shell scripting .","breadcrumbs":"What next? » What next?","id":"196","title":"What next?"},"197":{"body":"","breadcrumbs":"Exercise solutions » Exercise solutions","id":"197","title":"Exercise solutions"},"198":{"body":"1) The given sample data has empty lines at the start and end of the input. Also, there are multiple empty lines between the paragraphs. How would you get the output shown below? # note that there's an empty line at the end of the output\n$ printf '\\n\\n\\ndragon\\n\\n\\n\\nunicorn\\nbee\\n\\n\\n' | cat -sb 1 dragon 2 unicorn 3 bee 2) Pass appropriate arguments to the cat command to get the output shown below. $ cat greeting.txt\nHi there\nHave a nice day $ echo '42 apples and 100 bananas' | cat - greeting.txt\n42 apples and 100 bananas\nHi there\nHave a nice day 3) What does the -v option of the cat command do? Displays nonprinting characters using the caret notation. 4) Which options of the cat command do the following stand in for? -e option is equivalent to -vE -t option is equivalent to -vT -A option is equivalent to -vET 5) Will the two commands shown below produce the same output? If not, why not? $ cat fruits.txt ip.txt | tac $ tac fruits.txt ip.txt No. The first command concatenates the input files before reversing the content linewise. With the second command, each file content will be reversed separately. 6) Reverse the contents of blocks.txt file as shown below, considering ---- as the separator. $ cat blocks.txt\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000\n----\nsky blue\ndark green\n----\nhi hello $ tac -bs '----' blocks.txt\n----\nhi hello\n----\nsky blue\ndark green\n----\n3.14\n-42\n1000\n----\napple--banana\nmango---fig 7) For the blocks.txt file, write solutions to display only the last such group and last two groups. # can also use: tac -bs '----' blocks.txt | awk '/----/ && ++c==2{exit} 1'\n$ tac blocks.txt | sed '/----/q' | tac\n----\nhi hello $ tac -bs '----' blocks.txt | awk '/----/ && ++c==3{exit} 1' | tac -bs '----'\n----\nsky blue\ndark green\n----\nhi hello 8) Reverse the contents of items.txt as shown below. Consider digits at the start of lines as the separator. $ cat items.txt\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ tac -brs '^[0-9]' items.txt\n3) magical beasts\ndragon 3\nunicorn 42\n2) colors\ngreen\nsky blue\n1) fruits\napple 5\nbanana 10","breadcrumbs":"Exercise solutions » cat and tac","id":"198","title":"cat and tac"},"199":{"body":"1) Use appropriate commands and shell features to get the output shown below. $ printf 'carpet\\njeep\\nbus\\n'\ncarpet\njeep\nbus # use the above 'printf' command for input data\n$ c=$(printf 'carpet\\njeep\\nbus\\n' | head -c3)\n$ echo \"$c\"\ncar 2) How would you display all the input lines except the first one? $ printf 'apple\\nfig\\ncarpet\\njeep\\nbus\\n' | tail -n +2\nfig\ncarpet\njeep\nbus 3) Which command would you use to get the output shown below? $ cat fruits.txt\nbanana\npapaya\nmango\n$ cat blocks.txt\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000\n----\nsky blue\ndark green\n----\nhi hello $ head -n2 fruits.txt blocks.txt\n==> fruits.txt <==\nbanana\npapaya ==> blocks.txt <==\n----\napple--banana 4) Use a combination of head and tail commands to get the 11th to 14th characters from the given input. # can also use: tail -c +11 | head -c4\n$ printf 'apple\\nfig\\ncarpet\\njeep\\nbus\\n' | head -c14 | tail -c +11\ncarp 5) Extract the starting six bytes from the input files ip.txt and fruits.txt. $ head -q -c6 ip.txt fruits.txt\nit is banana 6) Extract the last six bytes from the input files fruits.txt and ip.txt. $ tail -q -c6 fruits.txt ip.txt\nmango\nerish 7) For the input file ip.txt, display except the last 5 lines. $ head -n -5 ip.txt\nit is a warm and cozy day\nlisten to what I say\ngo play in the park\ncome back before the sky turns dark 8) Display the third line from the given stdin data. Consider the NUL character as the line separator. $ printf 'apple\\0fig\\0carpet\\0jeep\\0bus\\0' | head -z -n3 | tail -z -n1\ncarpet","breadcrumbs":"Exercise solutions » head and tail","id":"199","title":"head and tail"},"2":{"body":"You can buy the pdf/epub versions of the book using these links: https://learnbyexample.gumroad.com/l/cli_coreutils https://leanpub.com/cli_coreutils","breadcrumbs":"Buy PDF/EPUB versions » Purchase links","id":"2","title":"Purchase links"},"20":{"body":"You can represent the stdin data using - as a file argument. If the file arguments are not present, cat will read the stdin data if present or wait for interactive input as seen earlier. # only stdin (- is optional in this case)\n$ echo 'apple banana cherry' | cat\napple banana cherry # both stdin and file arguments\n$ echo 'apple banana cherry' | cat greeting.txt -\nHi there\nHave a nice day\napple banana cherry # here's an example without a newline character at the end of the first input\n$ printf 'Some\\nNumbers' | cat - nums.txt\nSome\nNumbers3.14\n42\n1000","breadcrumbs":"cat and tac » Accepting stdin data","id":"20","title":"Accepting stdin data"},"200":{"body":"1) What's wrong with the following command? $ echo 'apple#banana#cherry' | tr # :\ntr: missing operand\nTry 'tr --help' for more information. $ echo 'apple#banana#cherry' | tr '#' ':'\napple:banana:cherry As a good practice, always quote the arguments passed to the tr command to avoid conflict with shell metacharacters. Unless of course, you need the shell to interpret them. 2) Retain only alphabets, digits and whitespace characters. $ printf 'Apple_42 cool,blue\\tDragon:army\\n' | tr -dc '[:alnum:][:space:]'\nApple42 coolblue Dragonarmy 3) Similar to rot13, figure out a way to shift digits such that the same logic can be used both ways. $ echo '4780 89073' | tr '0-9' '5-90-4'\n9235 34528 $ echo '9235 34528' | tr '0-9' '5-90-4'\n4780 89073 4) Figure out the logic based on the given input and output data. Hint: use two ranges for the first set and only 6 characters in the second set. $ echo 'apple banana cherry damson etrog' | tr 'a-ep-z' '12345X'\n1XXl5 21n1n1 3h5XXX 41mXon 5XXog 5) Which option would you use to truncate the first set so that it matches the length of the second set? The -t option is needed for this. 6) What does the * notation do in the second set? The [c*n] notation repeats a character c by n times. You can specify n in decimal or octal formats. If n is omitted, the character c is repeated as many times as needed to equalize the length of the sets. 7) Change : to - and ; to the newline character. $ echo 'tea:coffee;brown:teal;dragon:unicorn' | tr ':;' '-\\n'\ntea-coffee\nbrown-teal\ndragon-unicorn 8) Convert all characters to * except digit and newline characters. $ echo 'ajsd45_sdg2Khnf4v_54as' | tr -c '0-9\\n' '*'\n****45****2****4**54** 9) Change consecutive repeated punctuation characters to a single punctuation character. $ echo '\"\"hi...\"\", good morning!!!!' | tr -s '[:punct:]'\n\"hi.\", good morning! 10) Figure out the logic based on the given input and output data. $ echo 'Aapple noon banana!!!!!' | tr -cs 'a-z\\n' ':'\n:apple:noon:banana: 11) The books.txt file has items separated by one or more : characters. Change this separator to a single newline character as shown below. $ cat books.txt\nCradle:::Mage Errant::The Weirkey Chronicles\nMother of Learning::Eight:::::Dear Spellbook:Ascendant\nMark of the Fool:Super Powereds:::Ends of Magic $ to separate the output columns. $ comm -2 --output-delimiter='==>' s1.txt s2.txt\napple\n==>coffee\n==>fig\n==>honey\nmango\npasta\nsugar\n==>tea 4) What does the --total option do? Gives you the count of lines for each of the three columns. 5) Will the comm command fail if there are repeated lines in the input files? If not, what'd be the expected output for the command shown below? The number of duplicate lines in the common column will be minimum of the duplicate occurrences between the two files. Rest of the duplicate lines, if any, will be considered as unique to the file having the excess lines. $ cat s3.txt\napple\napple\nguava\nhoney\ntea\ntea\ntea $ comm -23 s3.txt s1.txt\napple\nguava\ntea\ntea","breadcrumbs":"Exercise solutions » comm","id":"209","title":"comm"},"21":{"body":"As mentioned before, cat provides many features beyond concatenation. Consider this sample stdin data: $ printf 'hello\\n\\n\\nworld\\n\\nhave a nice day\\n\\n\\n\\n\\n\\napple\\n'\nhello world have a nice day apple You can use the -s option to squeeze consecutive empty lines to a single empty line. If present, leading and trailing empty lines will also be squeezed (won't be completely removed). You can modify the below example to test it out. $ printf 'hello\\n\\n\\nworld\\n\\nhave a nice day\\n\\n\\n\\n\\n\\napple\\n' | cat -s\nhello world have a nice day apple","breadcrumbs":"cat and tac » Squeeze consecutive empty lines","id":"21","title":"Squeeze consecutive empty lines"},"210":{"body":"info Assume that the input files are already sorted for these exercises. 1) Use appropriate options to get the expected outputs shown below. # no output\n$ join <(printf 'apple 2\\nfig 5') <(printf 'Fig 10\\nmango 4') # expected output 1\n$ join -i <(printf 'apple 2\\nfig 5') <(printf 'Fig 10\\nmango 4')\nfig 5 10 # expected output 2\n$ join -i -a1 -a2 <(printf 'apple 2\\nfig 5') <(printf 'Fig 10\\nmango 4')\napple 2\nfig 5 10\nmango 4 2) Use the join command to display only the non-matching lines based on the first field. $ cat j1.txt\napple 2\nfig 5\nlemon 10\ntomato 22\n$ cat j2.txt\nalmond 33\nfig 115\nmango 20\npista 42 # first field items present in j1.txt but not j2.txt\n$ join -v1 j1.txt j2.txt\napple 2\nlemon 10\ntomato 22 # first field items present in j2.txt but not j1.txt\n$ join -v2 j1.txt j2.txt\nalmond 33\nmango 20\npista 42 3) Filter lines from j1.txt and j2.txt that match the items from s1.txt. $ cat s1.txt\napple\ncoffee\nfig\nhoney\nmango\npasta\nsugar\ntea # note that sort -m is used since the input files are already sorted\n$ join s1.txt <(sort -m j1.txt j2.txt)\napple 2\nfig 115\nfig 5\nmango 20 4) Join the marks_1.csv and marks_2.csv files to get the expected output shown below. $ cat marks_1.csv\nName,Biology,Programming\nEr,92,77\nIth,100,100\nLin,92,100\nSil,86,98\n$ cat marks_2.csv\nName,Maths,Physics,Chemistry\nCy,97,98,95\nIth,100,100,100\nLin,78,83,80 $ join -t, --header marks_1.csv marks_2.csv\nName,Biology,Programming,Maths,Physics,Chemistry\nIth,100,100,100,100,100\nLin,92,100,78,83,80 5) By default, the first field is used to combine the lines. Which options are helpful if you want to change the key field to be used for joining? You can use -1 and -2 options followed by a field number to specify a different field number. You can use the -j option if the field number is the same for both the files. 6) Join the marks_1.csv and marks_2.csv files to get the expected output with specific fields as shown below. $ join -t, --header -o 1.1,1.3,2.2,1.2 marks_1.csv marks_2.csv\nName,Programming,Maths,Biology\nIth,100,100,100\nLin,100,78,92 7) Join the marks_1.csv and marks_2.csv files to get the expected output shown below. Use 50 as the filler data. $ join -t, --header -o auto -a1 -a2 -e '50' marks_1.csv marks_2.csv\nName,Biology,Programming,Maths,Physics,Chemistry\nCy,50,50,97,98,95\nEr,92,77,50,50,50\nIth,100,100,100,100,100\nLin,92,100,78,83,80\nSil,86,98,50,50,50 8) When you use the -o auto option, what'd happen to the extra fields compared to those in the first lines of the input data? If you use auto as the argument for the -o option, first line of both the input files will be used to determine the number of output fields. If the other lines have extra fields, they will be discarded. 9) From the input files j3.txt and j4.txt, filter only the lines are unique — i.e. lines that are not common to these files. Assume that the input files do not have duplicate entries. $ cat j3.txt\nalmond\napple pie\ncold coffee\nhoney\nmango shake\npasta\nsugar\ntea\n$ cat j4.txt\napple\nbanana shake\ncoffee\nfig\nhoney\nmango shake\nmilk\ntea\nyeast $ join -t '' -v1 -v2 j3.txt j4.txt\nalmond\napple\napple pie\nbanana shake\ncoffee\ncold coffee\nfig\nmilk\npasta\nsugar\nyeast 10) From the input files j3.txt and j4.txt, filter only the lines are common to these files. $ join -t '' j3.txt j4.txt\nhoney\nmango shake\ntea","breadcrumbs":"Exercise solutions » join","id":"210","title":"join"},"211":{"body":"1) nl and cat -n are always equivalent for numbering lines. True or False? True if there are no empty lines in the input data. cat -b and nl are always equivalent. 2) What does the -n option do? You can use the -n option to customize the number formatting. The available styles are: rn right justified with space fillers (default) rz right justified with leading zeros ln left justified with space fillers 3) Use nl to produce the two expected outputs shown below. $ cat greeting.txt\nHi there\nHave a nice day # expected output 1\n$ nl -w3 -n'rz' greeting.txt\n001 Hi there\n002 Have a nice day # expected output 2\n$ nl -w3 -n'rz' -s') ' greeting.txt\n001) Hi there\n002) Have a nice day 4) Figure out the logic based on the given input and output data. $ cat s1.txt\napple\ncoffee\nfig\nhoney\nmango\npasta\nsugar\ntea $ nl -w2 -s'. ' -v15 -i-2 s1.txt\n15. apple\n13. coffee\n11. fig 9. honey 7. mango 5. pasta 3. sugar 1. tea 5) What are the three types of sections supported by nl? nl recognizes three types of sections with the following default patterns: \\:\\:\\: as header \\:\\: as body \\: as footer These special lines will be replaced with an empty line after numbering. The numbering will be reset at the start of every section unless the -p option is used. 6) Only number the lines that start with ---- in the format shown below. $ cat blocks.txt\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000\n----\nsky blue\ndark green\n----\nhi hello $ nl -w2 -s') ' -bp'^----' blocks.txt 1) ---- apple--banana mango---fig 2) ---- 3.14 -42 1000 3) ---- sky blue dark green 4) ---- hi hello 7) For the blocks.txt file, determine the logic to produce the expected output shown below. $ nl -w1 -s'. ' -d'--' blocks.txt 1. apple--banana\n2. mango---fig 1. 3.14\n2. -42\n3. 1000 1. sky blue\n2. dark green 1. hi hello 8) What does the -l option do? The -l option controls how many consecutive empty lines should be considered as a single entry. Only the last empty line of such groupings will be numbered. 9) Figure out the logic based on the given input and output data. $ cat all_sections.txt\n\\:\\:\\:\nHeader\nteal\n\\:\\:\nHi there\nHow are you\n\\:\\:\nbanana\npapaya\nmango\n\\:\nFooter $ nl -p -w2 -s') ' -ha all_sections.txt 1) Header 2) teal 3) Hi there 4) How are you 5) banana 6) papaya 7) mango Footer","breadcrumbs":"Exercise solutions » nl","id":"211","title":"nl"},"212":{"body":"1) Save the number of lines in the greeting.txt input file to the lines shell variable. $ lines=$(wc -l xaa <==\napple\ncoffee\nfig ==> xab <==\nhoney\nmango\npasta ==> xac <==\nsugar\ntea $ rm xa? 2) Use appropriate options to get the output shown below. $ echo 'apple,banana,cherry,dates' | split -t, -l1 $ head xa?\n==> xaa <==\napple,\n==> xab <==\nbanana,\n==> xac <==\ncherry,\n==> xad <==\ndates $ rm xa? 3) What do the -b and -C options do? The -b option allows you to split the input by the number of bytes. This option also accepts suffixes such as K for 1024 bytes, KB for 1000 bytes, M for 1024 * 1024 bytes and so on. The -C option is similar to the -b option, but it will try to break on line boundaries if possible. The break will happen before the given byte limit. If a line exceeds the given limit, it will be broken down into multiple parts. 4) Display the 2nd chunk of the ip.txt file after splitting it 4 times as shown below. $ split -nl/2/4 ip.txt\ncome back before the sky turns dark There are so many delights to cherish 5) What does the r prefix do when used with the -n option? This creates output files with interleaved lines. 6) Split the ip.txt file 2 lines at a time. Customize the output filenames as shown below. $ split -l2 -a1 -d --additional-suffix='.txt' ip.txt ip_ $ head ip_*\n==> ip_0.txt <==\nit is a warm and cozy day\nlisten to what I say ==> ip_1.txt <==\ngo play in the park\ncome back before the sky turns dark ==> ip_2.txt <== There are so many delights to cherish ==> ip_3.txt <==\nApple, Banana and Cherry\nBread, Butter and Jelly ==> ip_4.txt <==\nTry them all before you perish $ rm ip_* 7) Which option would you use to prevent empty files in the output? The -e option prevents empty files in the output. 8) Split the items.txt file 5 lines at a time. Additionally, remove lines starting with a digit character as shown below. $ cat items.txt\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ split -l5 --filter='grep -v \"^[0-9]\" > $FILE' items.txt $ head xa?\n==> xaa <==\napple 5\nbanana 10\ngreen ==> xab <==\nsky blue\ndragon 3\nunicorn 42 $ rm xa?","breadcrumbs":"Exercise solutions » split","id":"213","title":"split"},"214":{"body":"info Remove the output files after every exercise. 1) Split the blocks.txt file such that the first 7 lines are in the first file and the rest are in the second file as shown below. $ csplit -q blocks.txt 8 $ head xx*\n==> xx00 <==\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000 ==> xx01 <==\n----\nsky blue\ndark green\n----\nhi hello $ rm xx* 2) Split the input file items.txt such that the text before a line containing colors is part of the first file and the rest are part of the second file as shown below. $ csplit -q items.txt '/colors/' $ head xx*\n==> xx00 <==\n1) fruits\napple 5\nbanana 10 ==> xx01 <==\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx* 3) Split the input file items.txt such that the line containing magical and all the lines that come after are part of the single output file. $ csplit -q items.txt '%magical%' $ cat xx00\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx00 4) Split the input file items.txt such that the line containing colors as well the line that comes after are part of the first output file. $ csplit -q items.txt '/colors/2' $ head xx*\n==> xx00 <==\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen ==> xx01 <==\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx* 5) Split the input file items.txt on the line that comes before a line containing magical. Generate only a single output file as shown below. $ csplit -q items.txt '%magical%-1' $ cat xx00\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ rm xx00 6) Split the input file blocks.txt on the 4th occurrence of a line starting with the - character. Generate only a single output file as shown below. $ csplit -q blocks.txt '%^-%' '{3}' $ cat xx00\n----\nsky blue\ndark green\n----\nhi hello $ rm xx00 7) For the input file blocks.txt, determine the logic to produce the expected output shown below. $ csplit -qz --suppress-matched blocks.txt '/----/' '{*}' $ head xx*\n==> xx00 <==\napple--banana\nmango---fig ==> xx01 <==\n3.14\n-42\n1000 ==> xx02 <==\nsky blue\ndark green ==> xx03 <==\nhi hello $ rm xx* 8) What does the -k option do? By default, csplit will remove the created output files if there's an error or a signal that causes the command to stop. You can use the -k option to keep such files. One use case is line number based splitting with the {*} modifier. $ seq 7 | csplit -q - 4 '{*}'\ncsplit: ‘4’: line number out of range on repetition 1\n$ ls xx*\nls: cannot access 'xx*': No such file or directory # -k option will allow you to retain the created files\n$ seq 7 | csplit -qk - 4 '{*}'\ncsplit: ‘4’: line number out of range on repetition 1\n$ head xx*\n==> xx00 <==\n1\n2\n3 ==> xx01 <==\n4\n5\n6\n7 $ rm xx* 9) Split the books.txt file on every line as shown below. # can also use: split -l1 -d -a1 books.txt row_\n$ csplit -qkz -f'row_' -n1 books.txt 1 '{*}'\ncsplit: ‘1’: line number out of range on repetition 3 $ head row_*\n==> row_0 <==\nCradle:::Mage Errant::The Weirkey Chronicles ==> row_1 <==\nMother of Learning::Eight:::::Dear Spellbook:Ascendant ==> row_2 <==\nMark of the Fool:Super Powereds:::Ends of Magic $ rm row_* 10) Split the items.txt file on lines starting with a digit character. Matching lines shouldn't be part of the output and the files should be named group_0.txt, group_1.txt and so on. $ csplit -qz --suppress-matched -q -f'group_' -b'%d.txt' items.txt '/^[0-9]/' '{*}' $ head group_*\n==> group_0.txt <==\napple 5\nbanana 10 ==> group_1.txt <==\ngreen\nsky blue ==> group_2.txt <==\ndragon 3\nunicorn 42 $ rm group_*","breadcrumbs":"Exercise solutions » csplit","id":"214","title":"csplit"},"215":{"body":"1) The items.txt file has space separated words. Convert the spaces to be aligned at 10 column widths as shown below. $ cat items.txt\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 $ warning 1\na,b,c,d\n42\n--> warning 2\nx,y,z\n--> warning 3\n4,3,1 $ tac log.txt | grep -m1 'warning'\n--> warning 3 $ tac log.txt | sed '/warning/q' | tac\n--> warning 3\n4,3,1 In the above example, log.txt has multiple lines containing warning. The task is to fetch lines based on the last match, which isn't usually supported by CLI tools. Matching the first occurrence is easy with tools like grep and sed. Hence, tac is helpful to reverse the condition from the last match to the first match. After processing with tools like sed, the result is then reversed again to get back the original order of input lines. Another benefit is that the first tac command will stop reading the input contents after the match is found. info Use the rev command if you want each input line to be reversed character wise.","breadcrumbs":"cat and tac » tac","id":"25","title":"tac"},"26":{"body":"By default, the newline character is used to split the input content into lines . You can use the -s option to specify a different string to be used as the separator. # use NUL as the line separator\n# -s $'\\0' can also be used instead of -s '' if ANSI-C quoting is supported\n$ printf 'car\\0jeep\\0bus\\0' | tac -s '' | cat -v\nbus^@jeep^@car^@ # as seen before, the last entry should also have the separator\n# otherwise it won't be present in the output\n$ printf 'apple banana cherry' | tac -s ' ' | cat -e\ncherrybanana apple $\n$ printf 'apple banana cherry ' | tac -s ' ' | cat -e\ncherry banana apple $ When the custom separator occurs before the content of interest, use the -b option to print those separators before the content in the output as well. $ cat body_sep.txt\n%=%=\napple\nbanana\n%=%=\nteal\ngreen $ tac -b -s '%=%=' body_sep.txt\n%=%=\nteal\ngreen\n%=%=\napple\nbanana The separator will be treated as a regular expression if you use the -r option as well. $ cat shopping.txt\napple 50\ntoys 5\nPizza 2\nmango 25\nBanana 10 # separator character is 'a' or 'm' at the start of a line\n$ tac -b -rs '^[am]' shopping.txt\nmango 25\nBanana 10\napple 50\ntoys 5\nPizza 2 # alternate solution for: tac log.txt | sed '/warning/q' | tac\n# separator is zero or more characters from the start of a line till 'warning'\n$ tac -b -rs '^.*warning' log.txt | awk '/warning/ && ++c==2{exit} 1'\n--> warning 3\n4,3,1 info See Regular Expressions chapter from my GNU grep ebook if you want to learn about regexp syntax and features.","breadcrumbs":"cat and tac » Customize line separator for tac","id":"26","title":"Customize line separator for tac"},"27":{"body":"info All the exercises are also collated together in one place at Exercises.md . For solutions, see Exercise_solutions.md . info The exercises directory has all the files used in this section. 1) The given sample data has empty lines at the start and end of the input. Also, there are multiple empty lines between the paragraphs. How would you get the output shown below? # note that there's an empty line at the end of the output\n$ printf '\\n\\n\\ndragon\\n\\n\\n\\nunicorn\\nbee\\n\\n\\n' | ##### add your solution here 1 dragon 2 unicorn 3 bee 2) Pass appropriate arguments to the cat command to get the output shown below. $ cat greeting.txt\nHi there\nHave a nice day $ echo '42 apples and 100 bananas' | cat ##### add your solution here\n42 apples and 100 bananas\nHi there\nHave a nice day 3) What does the -v option of the cat command do? 4) Which options of the cat command do the following stand in for? -e option is equivalent to -t option is equivalent to -A option is equivalent to 5) Will the two commands shown below produce the same output? If not, why not? $ cat fruits.txt ip.txt | tac $ tac fruits.txt ip.txt 6) Reverse the contents of blocks.txt file as shown below, considering ---- as the separator. $ cat blocks.txt\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000\n----\nsky blue\ndark green\n----\nhi hello ##### add your solution here\n----\nhi hello\n----\nsky blue\ndark green\n----\n3.14\n-42\n1000\n----\napple--banana\nmango---fig 7) For the blocks.txt file, write solutions to display only the last such group and last two groups. ##### add your solution here\n----\nhi hello ##### add your solution here\n----\nsky blue\ndark green\n----\nhi hello 8) Reverse the contents of items.txt as shown below. Consider digits at the start of lines as the separator. $ cat items.txt\n1) fruits\napple 5\nbanana 10\n2) colors\ngreen\nsky blue\n3) magical beasts\ndragon 3\nunicorn 42 ##### add your solution here\n3) magical beasts\ndragon 3\nunicorn 42\n2) colors\ngreen\nsky blue\n1) fruits\napple 5\nbanana 10","breadcrumbs":"cat and tac » Exercises","id":"27","title":"Exercises"},"28":{"body":"cat is useful to view entire contents of files. Pagers like less can be used if you are working with large files (man pages for example). Sometimes though, you just want a peek at the starting or ending lines of input files. Or, you know the line numbers for the information you are looking for. In such cases, you can use head or tail or a combination of both these commands to extract the content you want.","breadcrumbs":"head and tail » head and tail","id":"28","title":"head and tail"},"29":{"body":"Consider this sample file, with line numbers prefixed for convenience. $ cat sample.txt 1) Hello World 2) 3) Hi there 4) How are you 5) 6) Just do-it 7) Believe it 8) 9) banana\n10) papaya\n11) mango\n12) 13) Much ado about nothing\n14) He he he\n15) Adios amigo By default, head and tail will display the first and last 10 lines respectively. $ head sample.txt 1) Hello World 2) 3) Hi there 4) How are you 5) 6) Just do-it 7) Believe it 8) 9) banana\n10) papaya $ tail sample.txt 6) Just do-it 7) Believe it 8) 9) banana\n10) papaya\n11) mango\n12) 13) Much ado about nothing\n14) He he he\n15) Adios amigo If there are less than 10 lines in the input, only those lines will be displayed. # seq command will be discussed in detail later, generates 1 to 3 here\n# same as: seq 3 | tail\n$ seq 3 | head\n1\n2\n3 You can use the -nN option to customize the number of lines (N) needed. # first three lines\n# space between -n and N is optional\n$ head -n3 sample.txt 1) Hello World 2) 3) Hi there # last two lines\n$ tail -n2 sample.txt\n14) He he he\n15) Adios amigo","breadcrumbs":"head and tail » Leading and trailing lines","id":"29","title":"Leading and trailing lines"},"3":{"body":"You can also get the book as part of these bundles: All books bundle https://leanpub.com/b/learnbyexample-all-books https://learnbyexample.gumroad.com/l/all-books Linux CLI Text Processing https://leanpub.com/b/linux-cli-text-processing https://learnbyexample.gumroad.com/l/linux-cli-text-processing","breadcrumbs":"Buy PDF/EPUB versions » Bundles","id":"3","title":"Bundles"},"30":{"body":"By using head -n -N, you can get all the input lines except the ones you'll get when you use the tail -nN command. # except the last 11 lines\n# space between -n and -N is optional\n$ head -n -11 sample.txt 1) Hello World 2) 3) Hi there 4) How are you","breadcrumbs":"head and tail » Excluding the last N lines","id":"30","title":"Excluding the last N lines"},"31":{"body":"By using tail -n +N, you can get all the input lines except the ones you'll get when you use the head -n(N-1) command. # all lines starting from the 11th line\n# space between -n and +N is optional\n$ tail -n +11 sample.txt\n11) mango\n12) 13) Much ado about nothing\n14) He he he\n15) Adios amigo","breadcrumbs":"head and tail » Starting from the Nth line","id":"31","title":"Starting from the Nth line"},"32":{"body":"If you pass multiple input files to the head and tail commands, each file will be processed separately. By default, the output is nicely formatted with filename headers and empty line separators. $ seq 2 | head -n1 greeting.txt -\n==> greeting.txt <==\nHi there ==> standard input <==\n1 You can use the -q option to avoid filename headers and empty line separators. $ tail -q -n2 sample.txt nums.txt\n14) He he he\n15) Adios amigo\n42\n1000","breadcrumbs":"head and tail » Multiple input files","id":"32","title":"Multiple input files"},"33":{"body":"The -c option works similar to the -n option, but with bytes instead of lines. In the below examples, the shell prompt at the end of the output aren't shown for illustration purposes. # first three characters\n$ printf 'apple pie' | head -c3\napp # last three characters\n$ printf 'apple pie' | tail -c3\npie # excluding the last four characters\n$ printf 'car\\njeep\\nbus\\n' | head -c -4\ncar\njeep # all characters starting from the fifth character\n$ printf 'car\\njeep\\nbus\\n' | tail -c +5\njeep\nbus Since -c works byte wise, it may not be suitable for multibyte characters: # all input characters in this example occupy two bytes each\n$ printf 'αλεπού' | head -c2\nα # g̈ requires three bytes\n$ printf 'cag̈e' | tail -c4\ng̈e","breadcrumbs":"head and tail » Byte selection","id":"33","title":"Byte selection"},"34":{"body":"You can select a range of lines by combining both the head and tail commands. # 9th to 11th lines\n# same as: head -n11 sample.txt | tail -n +9\n$ tail -n +9 sample.txt | head -n3 9) banana\n10) papaya\n11) mango # 6th to 7th lines\n# same as: tail -n +6 sample.txt | head -n2\n$ head -n7 sample.txt | tail -n +6 6) Just do-it 7) Believe it info See unix.stackexchange: line X to line Y on a huge file for performance comparison with other commands like sed, awk, etc.","breadcrumbs":"head and tail » Range of lines","id":"34","title":"Range of lines"},"35":{"body":"The -z option sets the NUL character as the line separator instead of the newline character. $ printf 'car\\0jeep\\0bus\\0' | head -z -n2 | cat -v\ncar^@jeep^@ $ printf 'car\\0jeep\\0bus\\0' | tail -z -n2 | cat -v\njeep^@bus^@","breadcrumbs":"head and tail » NUL separator","id":"35","title":"NUL separator"},"36":{"body":"wikipedia: File monitoring with tail -f and -F options toolong — terminal application to view, tail, merge, and search log files unix.stackexchange: How does the tail -f option work? How to deal with output buffering?","breadcrumbs":"head and tail » Further Reading","id":"36","title":"Further Reading"},"37":{"body":"info The exercises directory has all the files used in this section. 1) Use appropriate commands and shell features to get the output shown below. $ printf 'carpet\\njeep\\nbus\\n'\ncarpet\njeep\nbus # use the above 'printf' command for input data\n$ c=##### add your solution here\n$ echo \"$c\"\ncar 2) How would you display all the input lines except the first one? $ printf 'apple\\nfig\\ncarpet\\njeep\\nbus\\n' | ##### add your solution here\nfig\ncarpet\njeep\nbus 3) Which command would you use to get the output shown below? $ cat fruits.txt\nbanana\npapaya\nmango\n$ cat blocks.txt\n----\napple--banana\nmango---fig\n----\n3.14\n-42\n1000\n----\nsky blue\ndark green\n----\nhi hello ##### add your solution here\n==> fruits.txt <==\nbanana\npapaya ==> blocks.txt <==\n----\napple--banana 4) Use a combination of head and tail commands to get the 11th to 14th characters from the given input. $ printf 'apple\\nfig\\ncarpet\\njeep\\nbus\\n' | ##### add your solution here\ncarp 5) Extract the starting six bytes from the input files ip.txt and fruits.txt. ##### add your solution here\nit is banana 6) Extract the last six bytes from the input files fruits.txt and ip.txt. ##### add your solution here\nmango\nerish 7) For the input file ip.txt, display except the last 5 lines. ##### add your solution here\nit is a warm and cozy day\nlisten to what I say\ngo play in the park\ncome back before the sky turns dark 8) Display the third line from the given stdin data. Consider the NUL character as the line separator. $ printf 'apple\\0fig\\0carpet\\0jeep\\0bus\\0' | ##### add your solution here\ncarpet","breadcrumbs":"head and tail » Exercises","id":"37","title":"Exercises"},"38":{"body":"tr helps you to map one set of characters to another set of characters. Features like range, repeats, character sets, squeeze, complement, etc makes it a must know text processing tool. To be precise, tr can handle only bytes. Multibyte character processing isn't supported yet.","breadcrumbs":"tr » tr","id":"38","title":"tr"},"39":{"body":"Here are some examples that map one set of characters to another. As a good practice, always enclose the sets in single quotes to avoid issues due to shell metacharacters. # 'l' maps to '1', 'e' to '3', 't' to '7' and 's' to '5'\n$ echo 'leet speak' | tr 'lets' '1375'\n1337 5p3ak # example with shell metacharacters\n$ echo 'apple;banana;cherry' | tr ; :\ntr: missing operand\nTry 'tr --help' for more information.\n$ echo 'apple;banana;cherry' | tr ';' ':'\napple:banana:cherry You can use - between two characters to construct a range (ascending order only). # uppercase to lowercase\n$ echo 'HELLO WORLD' | tr 'A-Z' 'a-z'\nhello world # swap case\n$ echo 'Hello World' | tr 'a-zA-Z' 'A-Za-z'\nhELLO wORLD # rot13\n$ echo 'Hello World' | tr 'a-zA-Z' 'n-za-mN-ZA-M'\nUryyb Jbeyq\n$ echo 'Uryyb Jbeyq' | tr 'a-zA-Z' 'n-za-mN-ZA-M'\nHello World tr works only on stdin data, so use shell input redirection for file inputs. $ tr 'a-z' 'A-Z' e\n$ paste -d' : - ' <(seq 3) e e <(seq 4 6) e e <(seq 7 9)\n1 : 4 - 7\n2 : 5 - 8\n3 : 6 - 9","breadcrumbs":"paste » Multicharacter delimiters","id":"78","title":"Multicharacter delimiters"},"79":{"body":"The -s option allows you to combine all the input lines from a file into a single line using the given delimiter. paste will ensure to add a final newline character even if it wasn't present in the input. # this will give you a trailing comma\n# and there won't be a newline character at the end\n$ seq - CLI text processing with GNU Coreutils

seq

The seq command is a handy tool to generate a sequence of numbers in ascending or descending order. Both integer and floating-point numbers are supported. You can also customize the formatting for numbers and the separator between them.

Integer sequences

You need three numbers to generate an arithmetic progression — start, step and stop. When you pass only a single number as the stop value, the default start and step values are assumed to be 1.

# start=1, step=1 and stop=3
+$ seq 3
+1
+2
+3
+

Passing two numbers are considered as start and stop values (in that order).

# start=25434, step=1 and stop=25437
+$ seq 25434 25437
+25434
+25435
+25436
+25437
+
+# start=-5, step=1 and stop=-3
+$ seq -5 -3
+-5
+-4
+-3
+

When you want to specify all the three numbers, the order is start, step and stop.

# start=1000, step=5 and stop=1010
+$ seq 1000 5 1010
+1000
+1005
+1010
+

By using a negative step value, you can generate sequences in descending order.

# no output
+$ seq 3 1
+
+# need to explicitly use a negative step value
+$ seq 3 -1 1
+3
+2
+1
+
+$ seq 5 -5 -10
+5
+0
+-5
+-10
+

Floating-point sequences

Since 1 is the default start and step values, you need to change at least one of them to get floating-point sequences.

$ seq 0.5 3
+0.5
+1.5
+2.5
+
+$ seq 0.25 0.33 1.12
+0.25
+0.58
+0.91
+

E-scientific notation is also supported.

$ seq 1.2e2 1.22e2
+120
+121
+122
+
+$ seq 1.2e2 0.752 1.22e2
+120.000
+120.752
+121.504
+

Customizing separator

You can use the -s option to change the separator between the numbers of a sequence. Multiple characters are allowed. Depending on your shell you can use ANSI-C quoting to use escapes like \t instead of a literal tab character. A newline is always added at the end of the output.

$ seq -s' ' 4
+1 2 3 4
+
+$ seq -s: -2 0.75 3
+-2.00:-1.25:-0.50:0.25:1.00:1.75:2.50
+
+$ seq -s' - ' 4
+1 - 2 - 3 - 4
+
+$ seq -s$'\n\n' 3
+1
+
+2
+
+3
+

Leading zeros

By default, the output will not have leading zeros, even if they are part of the numbers passed to the command.

$ seq 008 010
+8
+9
+10
+

The -w option will equalize the width of the output numbers using leading zeros. The largest width between the start and stop values will be used.

$ seq -w 8 10
+08
+09
+10
+
+$ seq -w 0002
+0001
+0002
+

printf style formatting

You can use the -f option for printf style floating-point number formatting. See bash manual: printf for more details on formatting options.

$ seq -f'%g' -s: 1 0.75 3
+1:1.75:2.5
+
+$ seq -f'%.4f' -s: 1 0.75 3
+1.0000:1.7500:2.5000
+
+$ seq -f'%.3e' 1.2e2 0.752 1.22e2
+1.200e+02
+1.208e+02
+1.215e+02
+

Limitations

As per the manual:

On most systems, seq can produce whole-number output for values up to at least 2^53. Larger integers are approximated. The details differ depending on your floating-point implementation.

# example with approximate values
+$ seq 100000000000000000000 3333 100000000000000010000
+100000000000000000000
+100000000000000003336
+100000000000000006664
+100000000000000010000
+

However, when limited to non-negative whole numbers, an increment of less than 200, and no format-specifying option, seq can print arbitrarily large numbers.

# no approximation for smaller step values
+$ seq 100000000000000000000000000000 100000000000000000000000000005
+100000000000000000000000000000
+100000000000000000000000000001
+100000000000000000000000000002
+100000000000000000000000000003
+100000000000000000000000000004
+100000000000000000000000000005
+

Exercises

info The exercises directory has all the files used in this section.

1) Generate numbers from 42 to 45 in ascending order.

##### add your solution here
+42
+43
+44
+45
+

2) Why does the command shown below produce no output?

# no output
+$ seq 45 42
+
+# expected output
+##### add your solution here
+45
+44
+43
+42
+

3) Generate numbers from 25 to 10 in descending order, with a step value of 5.

##### add your solution here
+25
+20
+15
+10
+

4) Is the sequence shown below possible to generate with seq? If so, how?

##### add your solution here
+01.5,02.5,03.5,04.5,05.5
+

5) Modify the command shown below to customize the output numbering format.

$ seq 30.14 3.36 40.72
+30.14
+33.50
+36.86
+40.22
+
+##### add your solution here
+3.014e+01
+3.350e+01
+3.686e+01
+4.022e+01
+
\ No newline at end of file diff --git a/shuf.html b/shuf.html new file mode 100644 index 0000000..2645b02 --- /dev/null +++ b/shuf.html @@ -0,0 +1,169 @@ +shuf - CLI text processing with GNU Coreutils

shuf

The shuf command helps you randomize the input lines. And there are features to limit the number of output lines, repeat lines and even generate random positive integers.

Randomize input lines

By default, shuf will randomize the order of input lines. Here's an example:

$ cat purchases.txt
+coffee
+tea
+washing powder
+coffee
+toothpaste
+tea
+soap
+tea
+
+$ shuf purchases.txt
+tea
+coffee
+tea
+toothpaste
+soap
+coffee
+washing powder
+tea
+

info You can use the --random-source=FILE option to provide your own source for randomness. With this option, the output will be the same across multiple runs. See Sources of random data for more details.

warning shuf doesn't accept multiple input files. Use cat for such cases.

Limit output lines

Use the -n option to limit the number of lines you want in the output. If the value is greater than the number of lines in the input, it would be similar to not using the -n option.

$ printf 'apple\nbanana\ncherry\nfig\nmango' | shuf -n2
+mango
+cherry
+

info As seen in the example above, shuf will add a newline character even if it is not present for the last input line.

Repeated lines

The -r option helps if you want to allow input lines to be repeated. This option is usually paired with -n to limit the number of lines in the output.

$ cat fruits.txt
+banana
+papaya
+mango
+
+$ shuf -n3 -r fruits.txt
+banana
+mango
+banana
+
+$ shuf -n5 -r fruits.txt
+papaya
+banana
+mango
+papaya
+papaya
+

info If a limit using -n is not specified, shuf -r will produce output lines indefinitely.

Specify input lines as arguments

You can use the -e option to specify multiple input lines as arguments to the command.

# quote the arguments as necessary
+$ shuf -e hi there 'hello world' good
+hello world
+good
+hi
+there
+
+$ shuf -n1 -e brown green blue
+blue
+
+$ shuf -n4 -r -e brown green blue
+blue
+green
+brown
+blue
+

The shell will autocomplete unquoted glob patterns (provided there are files that match the given expression). You can thus easily construct a solution to get a random selection of files matching the given glob pattern.

$ echo *.csv
+marks.csv mixed_fields.csv report_1.csv report_2.csv scores.csv
+
+$ shuf -n2 -e *.csv
+scores.csv
+marks.csv
+

Generate random numbers

The -i option will help generate random positive integers. The argument has to be a range, with - as the separator between the two numbers.

$ shuf -i 5-8
+5
+8
+7
+6
+
+$ shuf -n3 -i 100-200
+170
+112
+148
+
+$ shuf -n5 -r -i 0-1
+1
+0
+0
+1
+1
+

info 2^64 - 1 is the maximum allowed integer when I tested it on my machine.

$ shuf -i 18446744073709551612-18446744073709551615
+18446744073709551615
+18446744073709551614
+18446744073709551612
+18446744073709551613
+
+$ shuf -i 18446744073709551612-18446744073709551616
+shuf: invalid input range: ‘18446744073709551616’:
+Value too large for defined data type
+
+# seq can help in such cases
+# but remember that shuf needs to read the entire input
+$ seq 100000000000000000000000000000 100000000000000000000000000105 | shuf -n2
+100000000000000000000000000039
+100000000000000000000000000018
+

seq can also help when you need negative and floating-point numbers.

$ seq -10 -8 | shuf
+-9
+-10
+-8
+
+$ seq -f'%.4f' 100 0.25 3000 | shuf -n3
+1627.7500
+1303.5000
+2466.2500
+

info See unix.stackexchange: generate random strings if numbers aren't enough for you.

Specifying output file

The -o option can be used to specify the output file to be used for saving the results. This is more useful for in-place editing, since you can simply use shell redirection to save the output in a different file.

$ cat book_list.txt
+Cradle
+Mage Errant
+Mother of Learning
+Super Powereds
+The Umbral Storm
+The Weirkey Chronicles
+
+$ shuf book_list.txt -o book_list.txt
+$ cat book_list.txt
+Super Powereds
+Cradle
+Mage Errant
+The Weirkey Chronicles
+Mother of Learning
+The Umbral Storm
+

NUL separator

Use the -z option if you want to use NUL character as the line separator. In this scenario, shuf will ensure to add a final NUL character even if not present in the input.

$ printf 'apple\0banana\0cherry\0fig\0mango' | shuf -z -n3 | cat -v
+banana^@mango^@cherry^@
+

Exercises

info The exercises directory has all the files used in this section.

1) What's wrong with the given command?

$ shuf --random-source=greeting.txt fruits.txt books.txt
+shuf: extra operand ‘books.txt’
+Try 'shuf --help' for more information.
+
+# expected output
+##### add your solution here
+banana
+Cradle:::Mage Errant::The Weirkey Chronicles
+Mother of Learning::Eight:::::Dear Spellbook:Ascendant
+papaya
+Mark of the Fool:Super Powereds:::Ends of Magic
+mango
+

2) What do the -r and -n options do? Why are they often used together?

3) What does the following command do?

$ shuf -e apple banana cherry fig mango
+

4) Which option would you use to generate random numbers? Given an example.

5) How would you generate 5 random numbers between 0.125 and 0.789 with a step value of 0.023?

# output shown below is a sample, might differ for you
+##### add your solution here
+0.378
+0.631
+0.447
+0.746
+0.723
+
\ No newline at end of file diff --git a/sidebar.js b/sidebar.js new file mode 100644 index 0000000..7a8f9d3 --- /dev/null +++ b/sidebar.js @@ -0,0 +1,66 @@ +// Un-active everything when you click it +Array.prototype.forEach.call(document.getElementsByClassName("pagetoc")[0].children, function(el) { + el.addEventHandler("click", function() { + Array.prototype.forEach.call(document.getElementsByClassName("pagetoc")[0].children, function(el) { + el.classList.remove("active"); + }); + el.classList.add("active"); + }); +}); + +var updateFunction = function() { + + var id; + var elements = document.getElementsByClassName("header"); + Array.prototype.forEach.call(elements, function(el) { + if (window.pageYOffset >= el.offsetTop) { + id = el; + } + }); + + Array.prototype.forEach.call(document.getElementsByClassName("pagetoc")[0].children, function(el) { + el.classList.remove("active"); + }); + + Array.prototype.forEach.call(document.getElementsByClassName("pagetoc")[0].children, function(el) { + if (id.href.localeCompare(el.href) == 0) { + el.classList.add("active"); + } + }); +}; + +// Populate sidebar on load +window.addEventListener('load', function() { + var pagetoc = document.getElementsByClassName("pagetoc")[0]; + var elements = document.getElementsByClassName("header"); + Array.prototype.forEach.call(elements, function(el) { + var link = document.createElement("a"); + + // Indent shows hierarchy + var indent = ""; + switch (el.parentElement.tagName) { + case "H2": + indent = "20px"; + break; + case "H3": + indent = "40px"; + break; + case "H4": + indent = "60px"; + break; + default: + break; + } + + link.appendChild(document.createTextNode(el.text)); + link.style.paddingLeft = indent; + link.href = el.href; + pagetoc.appendChild(link); + }); + updateFunction.call(); +}); + + + +// Handle active elements on scroll +window.addEventListener("scroll", updateFunction); diff --git a/sort.html b/sort.html new file mode 100644 index 0000000..2c44db5 --- /dev/null +++ b/sort.html @@ -0,0 +1,482 @@ +sort - CLI text processing with GNU Coreutils

sort

The sort command provides a wide variety of features. In addition to lexicographic ordering, it supports various numerical formats. You can also sort based on particular columns. And there are nifty features like merging already sorted input, debugging, determining whether the input is already sorted and so on.

Default sort and Collating order

By default, sort orders the input in ascending order. If you know about ASCII codepoints, do you agree that the following two examples are showing the correct expected output?

$ cat greeting.txt
+Hi there
+Have a nice day
+# extract and sort space separated words
+$ <greeting.txt tr ' ' '\n' | sort
+a
+day
+Have
+Hi
+nice
+there
+
+$ printf '(banana)\n{cherry}\n[apple]' | sort
+[apple]
+(banana)
+{cherry}
+

From the sort manual:

Unless otherwise specified, all comparisons use the character collating sequence specified by the LC_COLLATE locale.

If you use a non-POSIX locale (e.g., by setting LC_ALL to en_US), then sort may produce output that is sorted differently than you're accustomed to. In that case, set the LC_ALL environment variable to C. Note that setting only LC_COLLATE has two problems. First, it is ineffective if LC_ALL is also set. Second, it has undefined behavior if LC_CTYPE (or LANG, if LC_CTYPE is unset) is set to an incompatible value. For example, you get undefined behavior if LC_CTYPE is ja_JP.PCK but LC_COLLATE is en_US.UTF-8.

My locale settings are based on en_IN, which is different from the POSIX sorting order. So, the fact to remember is that sort obeys the rules of the current locale. If you want POSIX sorting, one option is to use LC_ALL=C as shown below.

$ <greeting.txt tr ' ' '\n' | LC_ALL=C sort
+Have
+Hi
+a
+day
+nice
+there
+
+$ printf '(banana)\n{cherry}\n[apple]' | LC_ALL=C sort
+(banana)
+[apple]
+{cherry}
+

info Another benefit of C locale is that it will be significantly faster compared to Unicode parsing and sorting rules.

info Use the -f option if you want to explicitly ignore case. See also GNU Core Utilities FAQ: Sort does not sort in normal order!.

info See this unix.stackexchange thread if you want to create your own custom sort order.

Ignoring headers

You can use sed -u to consume only the header lines and leave the rest of the input for the sort command. Note that this unbuffered option is supported by GNU sed, might not be available with other implementations.

$ cat scores.csv
+Name,Maths,Physics,Chemistry
+Ith,100,100,100
+Cy,97,98,95
+Lin,78,83,80
+
+# 1q is used to quit after the first line
+$ ( sed -u '1q' ; sort ) <scores.csv
+Name,Maths,Physics,Chemistry
+Cy,97,98,95
+Ith,100,100,100
+Lin,78,83,80
+

info See this unix.stackexchange thread for more ways of ignoring headers. See bash manual: Grouping Commands for more details about the () grouping used in the above example.

Dictionary sort

The -d option will consider only alphabets, numbers and blanks for sorting. Space and tab characters are considered as blanks, but this would also depend on the locale.

$ printf '(banana)\n{cherry}\n[apple]' | LC_ALL=C sort -d
+[apple]
+(banana)
+{cherry}
+

info Use the -i option if you want to ignore only the non-printing characters.

Reversed order

The -r option will reverse the output order. Note that this doesn't change how sort performs comparisons, only the output is reversed. You'll see an example later where this distinction becomes clearer.

$ printf 'peace\nrest\nquiet' | sort -r
+rest
+quiet
+peace
+

info In case you haven't noticed yet, sort adds a newline character to the final line even if it wasn't present in the input.

Numeric sort

The sort command provides various options to work with numeric formats. For most cases, the -n option is enough. Here's an example:

# lexicographic ordering isn't suited for numbers
+$ printf '20\n2\n3\n111\n314' | sort
+111
+2
+20
+3
+314
+
+# -n helps in this case
+$ printf '20\n2\n3\n111\n314' | sort -n
+2
+3
+20
+111
+314
+

The -n option can handle negative and floating-point numbers as well. The decimal point and the thousands separator characters will depend on the locale settings.

$ cat mixed_numbers.txt
+12,345
+42
+31.24
+-100
+42
+5678
+
+# , is the thousands separator in en_IN
+# . is the decimal point in en_IN
+$ sort -n mixed_numbers.txt
+-100
+31.24
+42
+42
+5678
+12,345
+

Use the -g option if your input can have the + prefix for positive numbers or follows the E-scientific notation.

$ cat e_notation.txt
++120
+-1.53
+3.14e+4
+42.1e-2
+
+$ sort -g e_notation.txt
+-1.53
+42.1e-2
++120
+3.14e+4
+

info Unless otherwise specified, sort will break ties by using the entire input line content. In the case of -n, sorting will work even if there are extra characters after the number. Those extra characters will affect the output order if the numbers are equal. If a line doesn't start with a number (excluding blanks), it will be treated as 0.

# 'b' comes before 'p'
+$ printf '2 pins\n13 pens\n2 balls' | sort -n
+2 balls
+2 pins
+13 pens
+
+# 'z' and 'a2p' will be treated as '0'
+# 'a' comes before 'z'
+$ printf 'z\na2p\n13p\n2b\n-1\n    10' | sort -n
+-1
+a2p
+z
+2b
+    10
+13p
+

Human numeric sort

Commands like du (disk usage) have the -h and --si options to display numbers with SI suffixes like k, K, M, G and so on. In such cases, you can use sort -h to order them.

$ cat file_size.txt
+104K    power.log
+316M    projects
+746K    report.log
+20K     sample.txt
+1.4G    games
+
+$ sort -hr file_size.txt
+1.4G    games
+316M    projects
+746K    report.log
+104K    power.log
+20K     sample.txt
+

Version sort

The -V option is useful when you have a mix of alphabets and digits. It also helps when you want to treat digits after a decimal point as whole numbers, for example 1.10 should be greater than 1.2.

$ printf '1.10\n1.2' | sort -n
+1.10
+1.2
+$ printf '1.10\n1.2' | sort -V
+1.2
+1.10
+
+$ cat versions.txt
+file2
+cmd5.2
+file10
+cmd1.6
+file5
+cmd5.10
+$ sort -V versions.txt
+cmd1.6
+cmd5.2
+cmd5.10
+file2
+file5
+file10
+

Here's an example of dealing with numbers reported by the time command (assuming all the entries have the same format).

$ cat timings.txt
+5m35.363s
+3m20.058s
+4m11.130s
+3m42.833s
+4m3.083s
+
+$ sort -V timings.txt
+3m20.058s
+3m42.833s
+4m3.083s
+4m11.130s
+5m35.363s
+

info See Version sort ordering for more details. Note that the ls command uses lowercase -v for this task.

Random sort

The -R option will display the output in random order. Unlike shuf, this option will always place identical lines next to each other due to the implementation.

# the two lines with '42' will always be next to each other
+# use 'shuf' if you don't want this behavior
+$ sort -R mixed_numbers.txt
+31.24
+5678
+42
+42
+12,345
+-100
+

Unique sort

The -u option will keep only the first copy of lines that are deemed equal.

# (10) and [10] are deemed equal with dictionary sorting
+$ printf '(10)\n[20]\n[10]' | sort -du
+(10)
+[20]
+
+$ cat purchases.txt
+coffee
+tea
+washing powder
+coffee
+toothpaste
+tea
+soap
+tea
+$ sort -u purchases.txt
+coffee
+soap
+tea
+toothpaste
+washing powder
+

As seen earlier, the -n option will work even if there are extra characters after the number. When the -u option is also used, only the first such copy will be retained. Use the uniq command if you want to remove duplicates based on the whole line.

$ printf '2 balls\n13 pens\n2 pins\n13 pens\n' | sort -nu
+2 balls
+13 pens
+
+# note that only the output order is reversed
+# use tac if you want the last duplicate to be preserved instead of the first
+$ printf '2 balls\n13 pens\n2 pins\n13 pens\n' | sort -r -nu
+13 pens
+2 balls
+
+# use uniq when the entire line contents should be compared
+$ printf '2 balls\n13 pens\n2 pins\n13 pens\n' | sort -n | uniq
+2 balls
+2 pins
+13 pens
+

You can use the -f option to ignore case while determining duplicates.

$ printf 'mat\nbat\nMAT\ncar\nbat\n' | sort -u
+bat
+car
+mat
+MAT
+
+# the first copy between 'mat' and 'MAT' is retained
+$ printf 'mat\nbat\nMAT\ncar\nbat\n' | sort -fu
+bat
+car
+mat
+

Column sort

The -k option allows you to sort based on specific columns instead of the entire input line. By default, the empty string between non-blank and blank characters is considered as the separator and thus the blanks are also part of the field contents. The effect of blanks and mitigation will be discussed later.

The -k option accepts arguments in various ways. You can specify the starting and ending column numbers separated by a comma. If you specify only the starting column, the last column will be used as the ending column. Usually you just want to sort by a single column, in which case the same number is specified as both the starting and ending columns. Here's an example:

$ cat shopping.txt
+apple   50
+toys    5
+Pizza   2
+mango   25
+Banana  10
+
+# sort based on the 2nd column numbers
+$ sort -k2,2n shopping.txt
+Pizza   2
+toys    5
+Banana  10
+mango   25
+apple   50
+

info Note that in the above example, the -n option was also appended to the -k option. This makes it specific to that column and overrides global options, if any. Also, remember that the entire line will be used to break ties, unless otherwise specified.

You can use the -t option to specify a single byte character as the field separator. Use \0 to specify NUL as the separator. Depending on your shell you can use ANSI-C quoting to use escapes like \t instead of a literal tab character. When the -t option is used, the field separator won't be part of the field contents.

# department,name,marks
+$ cat marks.csv
+ECE,Raj,53
+ECE,Joel,72
+EEE,Moi,68
+CSE,Surya,81
+EEE,Raj,88
+CSE,Moi,62
+EEE,Tia,72
+ECE,Om,92
+CSE,Amy,67
+
+# name column is the primary sort key
+# entire line content will be used for breaking ties
+$ sort -t, -k2,2 marks.csv
+CSE,Amy,67
+ECE,Joel,72
+CSE,Moi,62
+EEE,Moi,68
+ECE,Om,92
+ECE,Raj,53
+EEE,Raj,88
+CSE,Surya,81
+EEE,Tia,72
+

You can use the -k option multiple times to specify your own order of tie breakers. Entire line will still be used to break ties if needed.

# second column is the primary key
+# reversed numeric sort on the third column is the secondary key
+# entire line will be used only if there are still tied entries
+$ sort -t, -k2,2 -k3,3nr marks.csv
+CSE,Amy,67
+ECE,Joel,72
+EEE,Moi,68
+CSE,Moi,62
+ECE,Om,92
+EEE,Raj,88
+ECE,Raj,53
+CSE,Surya,81
+EEE,Tia,72
+
+# sort by month first and then the day
+# -M option sorts based on abbreviated month names
+$ printf 'Aug-20\nMay-5\nAug-3' | sort -t- -k1,1M -k2,2n
+May-5
+Aug-3
+Aug-20
+

Use the -s option to retain the original order of input lines when two or more lines are deemed equal. You can still use multiple keys to specify your own tie breakers, -s only prevents the last resort comparison.

# -s prevents last resort comparison
+# so, lines having the same value in the 2nd column will retain input order
+$ sort -t, -s -k2,2 marks.csv
+CSE,Amy,67
+ECE,Joel,72
+EEE,Moi,68
+CSE,Moi,62
+ECE,Om,92
+ECE,Raj,53
+EEE,Raj,88
+CSE,Surya,81
+EEE,Tia,72
+

The -u option, as discussed earlier, will retain only the first copy of lines that are deemed equal.

# only the first copy of duplicates in the 2nd column will be retained
+$ sort -t, -u -k2,2 marks.csv
+CSE,Amy,67
+ECE,Joel,72
+EEE,Moi,68
+ECE,Om,92
+ECE,Raj,53
+CSE,Surya,81
+EEE,Tia,72
+

Character positions within columns

The -k option also accepts starting and ending character positions within the columns. These are specified after the column number, separated by a . character. If the character position is not specified for the ending column, the last character of that column is assumed.

The character positions start with 1 for the first character. Recall that when the -t option is used, the field separator is not part of the field contents.

# based on the second column number
+# 2.2 helps to ignore first character, otherwise -n won't have any effect here
+$ printf 'car,(20)\njeep,[10]\ntruck,(5)\nbus,[3]' | sort -t, -k2.2,2n
+bus,[3]
+truck,(5)
+jeep,[10]
+car,(20)
+
+# first character of the second column is the primary key
+# entire line acts as the last resort tie breaker
+$ printf 'car,(20)\njeep,[10]\ntruck,(5)\nbus,[3]' | sort -t, -k2.1,2.1
+car,(20)
+truck,(5)
+bus,[3]
+jeep,[10]
+

The default separation based on blank characters works differently. The empty string between non-blank and blank characters is considered as the separator and thus the blanks are also part of the field contents. You can use the -b option to ignore such leading blanks of field contents.

# the second column here starts with blank characters
+# adjusting the character position isn't feasible due to varying blanks
+$ printf 'car   (20)\njeep  [10]\ntruck (5)\nbus   [3]' | sort -k2.2,2n
+bus   [3]
+car   (20)
+jeep  [10]
+truck (5)
+
+# use -b in such cases to ignore the leading blanks
+$ printf 'car   (20)\njeep  [10]\ntruck (5)\nbus   [3]' | sort -k2.2b,2n
+bus   [3]
+truck (5)
+jeep  [10]
+car   (20)
+

Debugging

The --debug option can help you identify issues if the output isn't what you expected. Here's the previously seen -b example, now with --debug enabled. The underscores in the debug output shows which portions of the input are used as primary key, secondary key and so on. The collating order being used is also shown in the output.

$ printf 'car (20)\njeep [10]\ntruck (5)\nbus [3]' | sort -k2.2,2n --debug
+sort: text ordering performed using ‘en_IN’ sorting rules
+sort: leading blanks are significant in key 1; consider also specifying 'b'
+sort: note numbers use ‘.’ as a decimal point in this locale
+bus [3]
+    ^ no match for key
+_______
+car (20)
+    ^ no match for key
+________
+jeep [10]
+     ^ no match for key
+_________
+truck (5)
+      ^ no match for key
+_________
+
+$ printf 'car (20)\njeep [10]\ntruck (5)\nbus [3]' | sort -k2.2b,2n --debug
+sort: text ordering performed using ‘en_IN’ sorting rules
+sort: note numbers use ‘.’ as a decimal point in this locale
+bus [3]
+     _
+_______
+truck (5)
+       _
+_________
+jeep [10]
+      __
+_________
+car (20)
+     __
+________
+

Check if sorted

The -c option helps you spot the first unsorted entry in the given input. The uppercase -C option is similar but only affects the exit status. Note that these options will not work for multiple inputs.

$ cat shopping.txt
+apple   50
+toys    5
+Pizza   2
+mango   25
+Banana  10
+
+$ sort -c shopping.txt
+sort: shopping.txt:3: disorder: Pizza   2
+$ echo $?
+1
+
+$ sort -C shopping.txt
+$ echo $?
+1
+

Specifying output file

The -o option can be used to specify the output file to be used for saving the results.

$ sort -R nums.txt -o rand_nums.txt
+
+$ cat rand_nums.txt
+1000
+3.14
+42
+

You can use -o for in-place editing as well, but the documentation gives this warning:

However, it is often safer to output to an otherwise-unused file, as data may be lost if the system crashes or sort encounters an I/O or other serious error while a file is being sorted in place. Also, sort with --merge (-m) can open the output file before reading all input, so a command like cat F | sort -m -o F - G is not safe as sort might start writing F before cat is done reading it.

Merge sort

The -m option is useful if you have one or more sorted input files and need a single sorted output file. Typically the use case is that you want to add newly obtained data to existing sorted data. In such cases, you can sort only the new data separately and then combine all the sorted inputs using the -m option. Here's a sample timing comparison between different combinations of sorted/unsorted inputs.

$ shuf -n1000000 -i1-999999999999 > n1.txt
+$ shuf -n1000000 -i1-999999999999 > n2.txt
+$ sort -n n1.txt > n1_sorted.txt
+$ sort -n n2.txt > n2_sorted.txt
+
+$ time sort -n n1.txt n2.txt > op1.txt
+real    0m1.010s
+$ time sort -mn n1_sorted.txt <(sort -n n2.txt) > op2.txt
+real    0m0.535s
+$ time sort -mn n1_sorted.txt n2_sorted.txt > op3.txt
+real    0m0.218s
+
+$ diff -sq op1.txt op2.txt
+Files op1.txt and op2.txt are identical
+$ diff -sq op1.txt op3.txt
+Files op1.txt and op3.txt are identical
+
+$ rm n{1,2}{,_sorted}.txt op{1..3}.txt
+

info You might wonder if you can improve the performance of a single large file using the -m option. By default, sort already uses the available processors to split the input and merge. You can use the --parallel option to customize this behavior.

NUL separator

Use the -z option if you want to use NUL character as the line separator. In this scenario, sort will ensure to add a final NUL character even if not present in the input.

$ printf 'cherry\0apple\0banana' | sort -z | cat -v
+apple^@banana^@cherry^@
+

Further Reading

A few options like --compress-program and --files0-from aren't covered in this book. See the sort manual for details and examples. See also:

Exercises

info The exercises directory has all the files used in this section.

1) Default sort doesn't work for numbers. Which option would you use to get the expected output shown below?

$ printf '100\n10\n20\n3000\n2.45\n' | sort ##### add your solution here
+2.45
+10
+20
+100
+3000
+

2) Which sort option will help you ignore case? LC_ALL=C is used here to avoid differences due to locale.

$ printf 'Super\nover\nRUNE\ntea\n' | LC_ALL=C sort ##### add your solution here
+over
+RUNE
+Super
+tea
+

3) The -n option doesn't work for all sorts of numbers. Which sort option would you use to get the expected output shown below?

# wrong output
+$ printf '+120\n-1.53\n3.14e+4\n42.1e-2' | sort -n
+-1.53
++120
+3.14e+4
+42.1e-2
+
+# expected output
+$ printf '+120\n-1.53\n3.14e+4\n42.1e-2' | sort ##### add your solution here
+-1.53
+42.1e-2
++120
+3.14e+4
+

4) What do the -V and -h options do?

5) Is there a difference between shuf and sort -R?

6) Sort the scores.csv file numerically in ascending order using the contents of the second field. Header line should be preserved as the first line as shown below.

$ cat scores.csv
+Name,Maths,Physics,Chemistry
+Ith,100,100,100
+Cy,97,98,95
+Lin,78,83,80
+
+##### add your solution here
+Name,Maths,Physics,Chemistry
+Lin,78,83,80
+Cy,97,98,95
+Ith,100,100,100
+

7) Sort the contents of duplicates.csv by the fourth column numbers in descending order. Retain only the first copy of lines with the same number.

$ cat duplicates.csv
+brown,toy,bread,42
+dark red,ruby,rose,111
+blue,ruby,water,333
+dark red,sky,rose,555
+yellow,toy,flower,333
+white,sky,bread,111
+light red,purse,rose,333
+
+##### add your solution here
+dark red,sky,rose,555
+blue,ruby,water,333
+dark red,ruby,rose,111
+brown,toy,bread,42
+

8) Sort the contents of duplicates.csv by the third column item. Use the fourth column numbers as the tie-breaker.

##### add your solution here
+brown,toy,bread,42
+white,sky,bread,111
+yellow,toy,flower,333
+dark red,ruby,rose,111
+light red,purse,rose,333
+dark red,sky,rose,555
+blue,ruby,water,333
+

9) What does the -s option provide?

10) Sort the given input based on the numbers inside the brackets.

$ printf '(-3.14)\n[45]\n(12.5)\n{14093}' | ##### add your solution here
+(-3.14)
+(12.5)
+[45]
+{14093}
+

11) What do the -c, -C and -m options do?

\ No newline at end of file diff --git a/split.html b/split.html new file mode 100644 index 0000000..5823782 --- /dev/null +++ b/split.html @@ -0,0 +1,376 @@ +split - CLI text processing with GNU Coreutils

split

The split command is useful to divide the input into smaller parts based on the number of lines, bytes, file size, etc. You can also execute another command on the divided parts before saving the results. An example use case is sending a large file as multiple parts as a workaround for online transfer size limits.

info Since a lot of output files will be generated in this chapter (often with the same filenames), remove these files after every illustration.

Default split

By default, the split command divides the input 1000 lines at a time. Newline character is the default line separator. You can pass a single file or stdin data as the input. Use cat if you need to concatenate multiple input sources.

By default, the output files will be named xaa, xab, xac and so on (where x is the prefix). If the filenames are exhausted, two more letters will be appended and the pattern will continue as needed. If the number of input lines is not evenly divisible, the last file will contain less than 1000 lines.

# divide input 1000 lines at a time
+$ seq 10000 | split
+
+# output filenames
+$ ls x*
+xaa  xab  xac  xad  xae  xaf  xag  xah  xai  xaj
+
+# preview of some of the output files
+$ head -n1 xaa xab xae xaj
+==> xaa <==
+1
+
+==> xab <==
+1001
+
+==> xae <==
+4001
+
+==> xaj <==
+9001
+
+$ rm x*
+

info warning As mentioned earlier, remove the output files after every illustration.

Change number of lines

You can use the -l option to change the number of lines to be saved in each output file.

# maximum of 3 lines at a time
+$ split -l3 purchases.txt
+
+$ head x*
+==> xaa <==
+coffee
+tea
+washing powder
+
+==> xab <==
+coffee
+toothpaste
+tea
+
+==> xac <==
+soap
+tea
+

Split by byte count

The -b option allows you to split the input by the number of bytes. Similar to line based splitting, you can always reconstruct the input by concatenating the output files. This option also accepts suffixes such as K for 1024 bytes, KB for 1000 bytes, M for 1024 * 1024 bytes and so on.

# maximum of 15 bytes at a time
+$ split -b15 greeting.txt
+
+$ head x*
+==> xaa <==
+Hi there
+Have a
+==> xab <==
+ nice day
+
+# when you concatenate the output files, you'll the original input
+$ cat x*
+Hi there
+Have a nice day
+

The -C option is similar to the -b option, but it will try to break on line boundaries if possible. The break will happen before the given byte limit. Here's an example where input lines do not exceed the given byte limit:

$ split -C20 purchases.txt
+
+$ head x*
+==> xaa <==
+coffee
+tea
+
+==> xab <==
+washing powder
+
+==> xac <==
+coffee
+toothpaste
+
+==> xad <==
+tea
+soap
+tea
+
+$ wc -c x*
+11 xaa
+15 xab
+18 xac
+13 xad
+57 total
+

If a line exceeds the given limit, it will be broken down into multiple parts:

$ printf 'apple\nbanana\n' | split -C4
+
+$ head x*
+==> xaa <==
+appl
+==> xab <==
+e
+
+==> xac <==
+bana
+==> xad <==
+na
+
+$ cat x*
+apple
+banana
+

Divide based on file size

The -n option has several features. If you pass only a numeric argument N, the given input file will be divided into N chunks. The output files will be roughly the same size.

# divide the file into 2 parts
+$ split -n2 purchases.txt
+$ head x*
+==> xaa <==
+coffee
+tea
+washing powder
+co
+==> xab <==
+ffee
+toothpaste
+tea
+soap
+tea
+
+# the two output files are roughly the same size
+$ wc x*
+ 3  5 28 xaa
+ 5  5 29 xab
+ 8 10 57 total
+

warning Since the division is based on file size, stdin data cannot be used. Newer versions of the coreutils package supports this use case by creating a temporary file before splitting.

$ seq 6 | split -n2
+split: -: cannot determine file size
+

By using K/N as the argument, you can view the Kth chunk of N parts on stdout. No output file will be created in this scenario.

# divide the input into 2 parts
+# view only the 1st chunk on stdout
+$ split -n1/2 greeting.txt
+Hi there
+Hav
+

To avoid splitting a line, use l/ as a prefix. Quoting from the manual:

For l mode, chunks are approximately input size / N. The input is partitioned into N equal sized portions, with the last assigned any excess. If a line starts within a partition it is written completely to the corresponding file. Since lines or records are not split even if they overlap a partition, the files written can be larger or smaller than the partition size, and even empty if a line/record is so long as to completely overlap the partition.

# divide input into 2 parts, but don't split lines
+$ split -nl/2 purchases.txt
+$ head x*
+==> xaa <==
+coffee
+tea
+washing powder
+coffee
+
+==> xab <==
+toothpaste
+tea
+soap
+tea
+

Here's an example to view the Kth chunk without splitting lines:

# 2nd chunk of 3 parts without splitting lines
+$ split -nl/2/3 sample.txt
+ 7) Believe it
+ 8) 
+ 9) banana
+10) papaya
+11) mango
+

Interleaved lines

The -n option will also help you create output files with interleaved lines. Since this is based on the line separator and not file size, stdin data can also be used. Use the r/ prefix to enable this feature.

# two parts, lines distributed in round robin fashion
+$ seq 5 | split -nr/2
+
+$ head x*
+==> xaa <==
+1
+3
+5
+
+==> xab <==
+2
+4
+

Here's an example to view the Kth chunk:

$ split -nr/1/3 sample.txt
+ 1) Hello World
+ 4) How are you
+ 7) Believe it
+10) papaya
+13) Much ado about nothing
+

Custom line separator

You can use the -t option to specify a single byte character as the line separator. Use \0 to specify NUL as the separator. Depending on your shell you can use ANSI-C quoting to use escapes like \t instead of a literal tab character.

$ printf 'apple\nbanana\n;mango\npapaya\n' | split -t';' -l1
+
+$ head x*
+==> xaa <==
+apple
+banana
+;
+==> xab <==
+mango
+papaya
+

Customize filenames

As seen earlier, x is the default prefix for output filenames. To change this prefix, pass an argument after the input source.

# choose prefix as 'op_' instead of 'x'
+$ split -l1 greeting.txt op_
+
+$ head op_*
+==> op_aa <==
+Hi there
+
+==> op_ab <==
+Have a nice day
+

The -a option controls the length of the suffix. You'll get an error if this length isn't enough to cover all the output files. In such a case, you'll still get output files that can fit within the given length.

$ seq 10 | split -l1 -a1
+$ ls x*
+xa  xb  xc  xd  xe  xf  xg  xh  xi  xj
+$ rm x*
+
+$ seq 10 | split -l1 -a3
+$ ls x*
+xaaa  xaab  xaac  xaad  xaae  xaaf  xaag  xaah  xaai  xaaj
+$ rm x*
+
+$ seq 100 | split -l1 -a1
+split: output file suffixes exhausted
+$ ls x*
+xa  xc  xe  xg  xi  xk  xm  xo  xq  xs  xu  xw  xy
+xb  xd  xf  xh  xj  xl  xn  xp  xr  xt  xv  xx  xz
+$ rm x*
+

You can use the -d option to use numeric suffixes, starting from 00 (length can be changed using the -a option). You can use the long option --numeric-suffixes to specify a different starting number.

$ seq 10 | split -l1 -d
+$ ls x*
+x00  x01  x02  x03  x04  x05  x06  x07  x08  x09
+$ rm x*
+
+$ seq 10 | split -l2 --numeric-suffixes=10
+$ ls x*
+x10  x11  x12  x13  x14
+

Use -x and --hex-suffixes options for hexadecimal numbering.

$ seq 10 | split -l1 --hex-suffixes=8
+$ ls x*
+x08  x09  x0a  x0b  x0c  x0d  x0e  x0f  x10  x11
+

You can use the --additional-suffix option to add a constant string at the end of filenames.

$ seq 10 | split -l2 -a1 --additional-suffix='.log'
+$ ls x*
+xa.log  xb.log  xc.log  xd.log  xe.log
+$ rm x*
+
+$ seq 10 | split -l2 -a1 -d --additional-suffix='.txt' - num_
+$ ls num_*
+num_0.txt  num_1.txt  num_2.txt  num_3.txt  num_4.txt
+

Exclude empty files

You can sometimes end up with empty files. For example, trying to split into more parts than possible with the given criteria. In such cases, you can use the -e option to prevent empty files in the output. The split command will ensure that the filenames are sequential even if files in the middle are empty.

# 'xac' is empty in this example
+$ split -nl/3 greeting.txt
+$ head x*
+==> xaa <==
+Hi there
+
+==> xab <==
+Have a nice day
+
+==> xac <==
+
+$ rm x*
+
+# prevent empty files
+$ split -e -nl/3 greeting.txt
+$ head x*
+==> xaa <==
+Hi there
+
+==> xab <==
+Have a nice day
+

Process parts through another command

The --filter option will allow you to apply another command on the intermediate split results before saving the output files. Use $FILE to refer to the output filename of the intermediate parts. Here's an example of compressing the results:

$ split -l1 --filter='gzip > $FILE.gz' greeting.txt
+
+$ ls x*
+xaa.gz  xab.gz
+
+$ zcat xaa.gz
+Hi there
+$ zcat xab.gz
+Have a nice day
+

Here's an example of ignoring the first line of the results:

$ cat body_sep.txt
+%=%=
+apple
+banana
+%=%=
+red
+green
+
+$ split -l3 --filter='tail -n +2 > $FILE' body_sep.txt
+
+$ head x*
+==> xaa <==
+apple
+banana
+
+==> xab <==
+red
+green
+

Exercises

info The exercises directory has all the files used in this section.

info Remove the output files after every exercise.

1) Split the s1.txt file 3 lines at a time.

##### add your solution here
+
+$ head xa?
+==> xaa <==
+apple
+coffee
+fig
+
+==> xab <==
+honey
+mango
+pasta
+
+==> xac <==
+sugar
+tea
+
+$ rm xa?
+

2) Use appropriate options to get the output shown below.

$ echo 'apple,banana,cherry,dates' | ##### add your solution here
+
+$ head xa?
+==> xaa <==
+apple,
+==> xab <==
+banana,
+==> xac <==
+cherry,
+==> xad <==
+dates
+
+$ rm xa?
+

3) What do the -b and -C options do?

4) Display the 2nd chunk of the ip.txt file after splitting it 4 times as shown below.

##### add your solution here
+come back before the sky turns dark
+
+There are so many delights to cherish
+

5) What does the r prefix do when used with the -n option?

6) Split the ip.txt file 2 lines at a time. Customize the output filenames as shown below.

##### add your solution here
+
+$ head ip_*
+==> ip_0.txt <==
+it is a warm and cozy day
+listen to what I say
+
+==> ip_1.txt <==
+go play in the park
+come back before the sky turns dark
+
+==> ip_2.txt <==
+
+There are so many delights to cherish
+
+==> ip_3.txt <==
+Apple, Banana and Cherry
+Bread, Butter and Jelly
+
+==> ip_4.txt <==
+Try them all before you perish
+
+$ rm ip_*
+

7) Which option would you use to prevent empty files in the output?

8) Split the items.txt file 5 lines at a time. Additionally, remove lines starting with a digit character as shown below.

$ cat items.txt
+1) fruits
+apple 5
+banana 10
+2) colors
+green
+sky blue
+3) magical beasts
+dragon 3
+unicorn 42
+
+##### add your solution here
+
+$ head xa?
+==> xaa <==
+apple 5
+banana 10
+green
+
+==> xab <==
+sky blue
+dragon 3
+unicorn 42
+
+$ rm xa?
+
\ No newline at end of file diff --git a/style.css b/style.css new file mode 100644 index 0000000..540aa18 --- /dev/null +++ b/style.css @@ -0,0 +1,46 @@ +h1 { + color: #ff6600; +} + +@media only screen and (max-width:1439px) { + .sidetoc { + display: none; + } +} + +@media only screen and (min-width:1440px) { + main { + position: relative; + } + .sidetoc { + margin-left: auto; + margin-right: auto; + left: calc(100% + (var(--content-max-width))/4 - 140px); + position: absolute; + } + .pagetoc { + position: fixed; + width: 200px; + height: calc(100vh - var(--menu-bar-height) - 0.67em * 4); + overflow: auto; + } + .pagetoc a { + border-left: 1px solid var(--sidebar-bg); + color: var(--fg) !important; + display: block; + padding-bottom: 5px; + padding-top: 5px; + padding-left: 10px; + text-align: left; + text-decoration: none; + } + .pagetoc a:hover, + .pagetoc a.active { + background: var(--sidebar-bg); + color: var(--sidebar-fg) !important; + } + .pagetoc .active { + background: var(--sidebar-bg); + color: var(--sidebar-fg); + } +} diff --git a/tomorrow-night.css b/tomorrow-night.css new file mode 100644 index 0000000..5b4aca7 --- /dev/null +++ b/tomorrow-night.css @@ -0,0 +1,102 @@ +/* Tomorrow Night Theme */ +/* http://jmblog.github.com/color-themes-for-google-code-highlightjs */ +/* Original theme - https://github.com/chriskempson/tomorrow-theme */ +/* http://jmblog.github.com/color-themes-for-google-code-highlightjs */ + +/* Tomorrow Comment */ +.hljs-comment { + color: #969896; +} + +/* Tomorrow Red */ +.hljs-variable, +.hljs-attribute, +.hljs-tag, +.hljs-regexp, +.ruby .hljs-constant, +.xml .hljs-tag .hljs-title, +.xml .hljs-pi, +.xml .hljs-doctype, +.html .hljs-doctype, +.css .hljs-id, +.css .hljs-class, +.css .hljs-pseudo { + color: #cc6666; +} + +/* Tomorrow Orange */ +.hljs-number, +.hljs-preprocessor, +.hljs-pragma, +.hljs-built_in, +.hljs-literal, +.hljs-params, +.hljs-constant { + color: #de935f; +} + +/* Tomorrow Yellow */ +.ruby .hljs-class .hljs-title, +.css .hljs-rule .hljs-attribute { + color: #f0c674; +} + +/* Tomorrow Green */ +.hljs-string, +.hljs-value, +.hljs-inheritance, +.hljs-header, +.hljs-name, +.ruby .hljs-symbol, +.xml .hljs-cdata { + color: #b5bd68; +} + +/* Tomorrow Aqua */ +.hljs-title, +.css .hljs-hexcolor { + color: #8abeb7; +} + +/* Tomorrow Blue */ +.hljs-function, +.python .hljs-decorator, +.python .hljs-title, +.ruby .hljs-function .hljs-title, +.ruby .hljs-title .hljs-keyword, +.perl .hljs-sub, +.javascript .hljs-title, +.coffeescript .hljs-title { + color: #81a2be; +} + +/* Tomorrow Purple */ +.hljs-keyword, +.javascript .hljs-function { + color: #b294bb; +} + +.hljs { + display: block; + overflow-x: auto; + background: #1d1f21; + color: #c5c8c6; +} + +.coffeescript .javascript, +.javascript .xml, +.tex .hljs-formula, +.xml .javascript, +.xml .vbscript, +.xml .css, +.xml .hljs-cdata { + opacity: 0.5; +} + +.hljs-addition { + color: #718c00; +} + +.hljs-deletion { + color: #c82829; +} diff --git a/tr.html b/tr.html new file mode 100644 index 0000000..afc6865 --- /dev/null +++ b/tr.html @@ -0,0 +1,161 @@ +tr - CLI text processing with GNU Coreutils

tr

tr helps you to map one set of characters to another set of characters. Features like range, repeats, character sets, squeeze, complement, etc makes it a must know text processing tool.

To be precise, tr can handle only bytes. Multibyte character processing isn't supported yet.

Transliteration

Here are some examples that map one set of characters to another. As a good practice, always enclose the sets in single quotes to avoid issues due to shell metacharacters.

# 'l' maps to '1', 'e' to '3', 't' to '7' and 's' to '5'
+$ echo 'leet speak' | tr 'lets' '1375'
+1337 5p3ak
+
+# example with shell metacharacters
+$ echo 'apple;banana;cherry' | tr ; :
+tr: missing operand
+Try 'tr --help' for more information.
+$ echo 'apple;banana;cherry' | tr ';' ':'
+apple:banana:cherry
+

You can use - between two characters to construct a range (ascending order only).

# uppercase to lowercase
+$ echo 'HELLO WORLD' | tr 'A-Z' 'a-z'
+hello world
+
+# swap case
+$ echo 'Hello World' | tr 'a-zA-Z' 'A-Za-z'
+hELLO wORLD
+
+# rot13
+$ echo 'Hello World' | tr 'a-zA-Z' 'n-za-mN-ZA-M'
+Uryyb Jbeyq
+$ echo 'Uryyb Jbeyq' | tr 'a-zA-Z' 'n-za-mN-ZA-M'
+Hello World
+

tr works only on stdin data, so use shell input redirection for file inputs.

$ tr 'a-z' 'A-Z' <greeting.txt
+HI THERE
+HAVE A NICE DAY
+

Different length sets

If the second set is longer, the extra characters are simply ignored. If the first set is longer, the last character of the second set is reused for the missing mappings.

# only abc gets converted to uppercase
+$ echo 'apple banana cherry' | tr 'abc' 'A-Z'
+Apple BAnAnA Cherry
+
+# c-z will be converted to C
+$ echo 'apple banana cherry' | tr 'a-z' 'ABC'
+ACCCC BACACA CCCCCC
+

You can use the -t option to truncate the first set so that it matches the length of the second set.

# d-z won't be converted
+$ echo 'apple banana cherry' | tr -t 'a-z' 'ABC'
+Apple BAnAnA Cherry
+

You can also use [c*n] notation to repeat a character c by n times. You can specify n in decimal format or octal format (starts with 0). If n is omitted, the character c is repeated as many times as needed to equalize the length of the sets.

# a-e will be translated to A
+# f-z will be uppercased
+$ echo 'apple banana cherry' | tr 'a-z' '[A*5]F-Z'
+APPLA AANANA AHARRY
+
+# a-c and x-z will be uppercased
+# rest of the characters will be translated to -
+$ echo 'apple banana cherry' | tr 'a-z' 'ABC[-*]XYZ'
+A---- BA-A-A C----Y
+

Escape sequences and character sets

Certain characters like newline, tab, etc can be represented using escape sequences. You can also specify characters using the \NNN octal representation.

# same as: tr '\011' '\072'
+$ printf 'apple\tbanana\tcherry\n' | tr '\t' ':'
+apple:banana:cherry
+
+$ echo 'apple:banana:cherry' | tr ':' '\n'
+apple
+banana
+cherry
+

Certain commonly useful groups of characters like alphabets, digits, punctuation, etc have named character sets that you can use instead of manually creating the sets. Only [:lower:] and [:upper:] can be used by default, others will require -d or -s options.

# same as: tr 'a-z' 'A-Z' <greeting.txt
+$ tr '[:lower:]' '[:upper:]' <greeting.txt
+HI THERE
+HAVE A NICE DAY
+

To override the special meaning for - and \ characters, you can escape them using the \ character. You can also place the - character at the end of a set to represent it literally. Can you reason out why placing the - character at the start of a set can cause issues?

$ echo '/python-projects/programs' | tr '/-' '\\_'
+\python_projects\programs
+

info See the tr manual for more details and a list of all the escape sequences and character sets.

Deleting characters

Use the -d option to specify a set of characters to be deleted.

$ echo '2024-08-12' | tr -d '-'
+20240812
+
+# delete all punctuation characters
+$ s='"Hi", there! How *are* you? All fine here.'
+$ echo "$s" | tr -d '[:punct:]'
+Hi there How are you All fine here
+

Complement

The -c option will invert the first set of characters. This is often used in combination with the -d option.

$ s='"Hi", there! How *are* you? All fine here.'
+
+# retain alphabets, whitespaces, period, exclamation and question mark
+$ echo "$s" | tr -cd 'a-zA-Z.!?[:space:]'
+Hi there! How are you? All fine here.
+

If you use -c for transliteration, you can only provide a single character for the second set. In other words, all the characters except those provided by the first set will be mapped to the character specified by the second set.

$ s='"Hi", there! How *are* you? All fine here.'
+
+$ echo "$s" | tr -c 'a-zA-Z.!?[:space:]' '1%'
+tr: when translating with complemented character classes,
+string2 must map all characters in the domain to one
+
+$ echo "$s" | tr -c 'a-zA-Z.!?[:space:]' '%'
+%Hi%% there! How %are% you? All fine here.
+

Squeeze

The -s option changes consecutive repeated characters to a single copy of that character.

# squeeze lowercase alphabets
+$ echo 'HELLO... hhoowwww aaaaaareeeeee yyouuuu!!' | tr -s 'a-z'
+HELLO... how are you!!
+
+# translate and squeeze
+$ echo 'hhoowwww aaaaaareeeeee yyouuuu!!' | tr -s 'a-z' 'A-Z'
+HOW ARE YOU!!
+
+# delete and squeeze
+$ echo 'hhoowwww aaaaaareeeeee yyouuuu!!' | tr -sd '!' 'a-z'
+how are you
+
+# squeeze other than lowercase alphabets
+$ echo 'apple    noon     banana!!!!!' | tr -cs 'a-z'
+apple noon banana!
+

Exercises

info The exercises directory has all the files used in this section.

1) What's wrong with the following command?

$ echo 'apple#banana#cherry' | tr # :
+

2) Retain only alphabets, digits and whitespace characters.

$ printf 'Apple_42  cool,blue\tDragon:army\n' | ##### add your solution here
+Apple42  coolblue       Dragonarmy
+

3) Similar to rot13, figure out a way to shift digits such that the same logic can be used both ways.

$ echo '4780 89073' | ##### add your solution here
+9235 34528
+
+$ echo '9235 34528' | ##### add your solution here
+4780 89073
+

4) Figure out the logic based on the given input and output data. Hint: use two ranges for the first set and only 6 characters in the second set.

$ echo 'apple banana cherry damson etrog' | ##### add your solution here
+1XXl5 21n1n1 3h5XXX 41mXon 5XXog
+

5) Which option would you use to truncate the first set so that it matches the length of the second set?

6) What does the * notation do in the second set?

7) Change : to - and ; to the newline character.

$ echo 'tea:coffee;brown:teal;dragon:unicorn' | ##### add your solution here
+tea-coffee
+brown-teal
+dragon-unicorn
+

8) Convert all characters to * except digit and newline characters.

$ echo 'ajsd45_sdg2Khnf4v_54as' | ##### add your solution here
+****45****2****4**54**
+

9) Change consecutive repeated punctuation characters to a single punctuation character.

$ echo '""hi..."", good morning!!!!' | ##### add your solution here
+"hi.", good morning!
+

10) Figure out the logic based on the given input and output data.

$ echo 'Aapple    noon     banana!!!!!' | ##### add your solution here
+:apple:noon:banana:
+

11) The books.txt file has items separated by one or more : characters. Change this separator to a single newline character as shown below.

$ cat books.txt
+Cradle:::Mage Errant::The Weirkey Chronicles
+Mother of Learning::Eight:::::Dear Spellbook:Ascendant
+Mark of the Fool:Super Powereds:::Ends of Magic
+
+##### add your solution here
+Cradle
+Mage Errant
+The Weirkey Chronicles
+Mother of Learning
+Eight
+Dear Spellbook
+Ascendant
+Mark of the Fool
+Super Powereds
+Ends of Magic
+
\ No newline at end of file diff --git a/uniq.html b/uniq.html new file mode 100644 index 0000000..254f96d --- /dev/null +++ b/uniq.html @@ -0,0 +1,224 @@ +uniq - CLI text processing with GNU Coreutils

uniq

The uniq command identifies similar lines that are adjacent to each other. There are various options to help you filter unique or duplicate lines, count them, group them, etc.

Retain single copy of duplicates

This is the default behavior of the uniq command. If adjacent lines are the same, only the first copy will be displayed in the output.

# only the adjacent lines are compared to determine duplicates
+# which is why you get 'red' twice in the output for this input
+$ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | uniq
+red
+green
+red
+blue
+

You'll need sorted input to make sure all the input lines are considered to determine duplicates. For some cases, sort -u is enough, like the example shown below:

# same as sort -u for this case
+$ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | sort | uniq
+blue
+green
+red
+

Sometimes though, you may need to sort based on some specific criteria and then identify duplicates based on the entire line contents. Here's an example:

# can't use sort -n -u here
+$ printf '2 balls\n13 pens\n2 pins\n13 pens\n' | sort -n | uniq
+2 balls
+2 pins
+13 pens
+

info sort+uniq won't be suitable if you need to preserve the input order as well. You can use alternatives like awk, perl and huniq for such cases.

# retain only the first copy of duplicates, maintain input order
+$ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | awk '!seen[$0]++'
+red
+green
+blue
+

Duplicates only

The -d option will display only the duplicate entries. That is, only if a line is seen more than once.

$ cat purchases.txt
+coffee
+tea
+washing powder
+coffee
+toothpaste
+tea
+soap
+tea
+
+$ sort purchases.txt | uniq -d
+coffee
+tea
+

To display all the copies of duplicates, use the -D option.

$ sort purchases.txt | uniq -D
+coffee
+coffee
+tea
+tea
+tea
+

Unique only

The -u option will display only the unique entries. That is, only if a line doesn't occur more than once.

$ sort purchases.txt | uniq -u
+soap
+toothpaste
+washing powder
+
+# reminder that uniq works based on adjacent lines only
+$ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | uniq -u
+green
+red
+

Grouping similar lines

The --group options allows you to visually separate groups of similar lines with an empty line. This option can accept four values — separate, prepend, append and both. The default is separate, which adds a newline character between the groups. prepend will add a newline before the first group as well and append will add a newline after the last group. both combines the prepend and append behavior.

$ sort purchases.txt | uniq --group
+coffee
+coffee
+
+soap
+
+tea
+tea
+tea
+
+toothpaste
+
+washing powder
+

The --group option cannot be used with the -c, -d, -D or -u options. The --all-repeated alias for the -D option uses none as the default grouping. You can change that to separate or prepend values.

$ sort purchases.txt | uniq --all-repeated=prepend
+
+coffee
+coffee
+
+tea
+tea
+tea
+

Prefix count

If you want to know how many times a line has been repeated, use the -c option. This will be added as a prefix.

$ sort purchases.txt | uniq -c
+      2 coffee
+      1 soap
+      3 tea
+      1 toothpaste
+      1 washing powder
+
+$ sort purchases.txt | uniq -dc
+      2 coffee
+      3 tea
+

The output of this option is usually piped to sort for ordering the output based on the count.

$ sort purchases.txt | uniq -c | sort -n
+      1 soap
+      1 toothpaste
+      1 washing powder
+      2 coffee
+      3 tea
+
+$ sort purchases.txt | uniq -c | sort -nr
+      3 tea
+      2 coffee
+      1 washing powder
+      1 toothpaste
+      1 soap
+

Ignoring case

Use the -i option to ignore case while determining duplicates.

# depending on your locale, sort and sort -f can give the same results
+$ printf 'hat\nbat\nHAT\ncar\nbat\nmat\nmoat' | sort -f | uniq -iD
+bat
+bat
+hat
+HAT
+

Partial match

uniq has three options to change the matching criteria to partial parts of the input line. These aren't as powerful as the sort -k option, but they do come in handy for some use cases.

The -f option allows you to skip the first N fields. Field separation is based on one or more space/tab characters only. Note that these separators will still be part of the field contents, so this will not work with variable number of blanks.

# skip the first field, works as expected since the no. of blanks is consistent
+$ printf '2 cars\n5 cars\n10 jeeps\n5 jeeps\n3 trucks\n' | uniq -f1 --group
+2 cars
+5 cars
+
+10 jeeps
+5 jeeps
+
+3 trucks
+
+# example with variable number of blanks
+# 'cars' entries were identified as duplicates, but not 'jeeps'
+$ printf '2 cars\n5 cars\n1 jeeps\n5  jeeps\n3 trucks\n' | uniq -f1
+2 cars
+1 jeeps
+5  jeeps
+3 trucks
+

The -s option allows you to skip the first N characters (calculated as bytes).

# skip the first character
+$ printf '* red\n- green\n* green\n* blue\n= blue' | uniq -s1
+* red
+- green
+* blue
+

The -w option restricts the comparison to the first N characters (calculated as bytes).

# compare only the first 2 characters
+$ printf '1) apple\n1) almond\n2) banana\n3) cherry' | uniq -w2
+1) apple
+2) banana
+3) cherry
+

When these options are used simultaneously, the priority is -f first, then -s and finally the -w option. Remember that blanks are part of the field content.

# skip the first field
+# then skip the first two characters (including the blank character)
+# use the next two characters for comparison ('bl' and 'ch' in this example)
+$ printf '2 @blue\n10 :black\n5 :cherry\n3 @chalk' | uniq -f1 -s2 -w2
+2 @blue
+5 :cherry
+

info If a line doesn't have enough fields or characters to satisfy the -f and -s options respectively, a null string is used for comparison.

Specifying output file

uniq can accept filename as the source of input contents, but only a maximum of one file. If you specify another file, it will be used as the output file.

$ printf 'apple\napple\nbanana\ncherry\ncherry\ncherry' > ip.txt
+$ uniq ip.txt op.txt
+
+$ cat op.txt
+apple
+banana
+cherry
+

NUL separator

Use the -z option if you want to use NUL character as the line separator. In this scenario, uniq will ensure to add a final NUL character even if not present in the input.

$ printf 'cherry\0cherry\0cherry\0apple\0banana' | uniq -z | cat -v
+cherry^@apple^@banana^@
+

info If grouping is specified, NUL will be used as the separator instead of the newline character.

Alternatives

Here are some alternate commands you can explore if uniq isn't enough to solve your task.

Exercises

info The exercises directory has all the files used in this section.

1) Will uniq throw an error if the input is not sorted? What do you think will be the output for the following input?

$ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | uniq
+

2) Are there differences between sort -u file and sort file | uniq?

3) What are the differences between sort -u and uniq -u options, if any?

4) Filter the third column items from duplicates.csv. Construct three solutions to display only unique items, duplicate items and all duplicates.

$ cat duplicates.csv
+brown,toy,bread,42
+dark red,ruby,rose,111
+blue,ruby,water,333
+dark red,sky,rose,555
+yellow,toy,flower,333
+white,sky,bread,111
+light red,purse,rose,333
+
+# unique
+##### add your solution here
+flower
+water
+
+# duplicates
+##### add your solution here
+bread
+rose
+
+# all duplicates
+##### add your solution here
+bread
+bread
+rose
+rose
+rose
+

5) What does the --group option do? What customization features are available?

6) Count the number of times input lines are repeated and display the results in the format shown below.

$ s='brown\nbrown\nbrown\ngreen\nbrown\nblue\nblue'
+$ printf '%b' "$s" | ##### add your solution here
+      1 green
+      2 blue
+      4 brown
+

7) For the input file f1.txt, retain only unique entries based on the first two characters of each line. For example, abcd and ab12 should be considered as duplicates and neither of them will be part of the output.

$ cat f1.txt
+3) cherry
+1) apple
+2) banana
+1) almond
+4) mango
+2) berry
+3) chocolate
+1) apple
+5) cherry
+
+##### add your solution here
+4) mango
+5) cherry
+

8) For the input file f1.txt, display only the duplicate items without considering the first two characters of each line. For example, abcd and 12cd should be considered as duplicates. Assume that the third character of each line is always a space character.

##### add your solution here
+1) apple
+3) cherry
+

9) What does the -s option do?

10) Filter only unique lines, but ignore differences due to case.

$ printf 'cat\nbat\nCAT\nCar\nBat\nmat\nMat' | ##### add your solution here
+Car
+
\ No newline at end of file diff --git a/wc.html b/wc.html new file mode 100644 index 0000000..96a90de --- /dev/null +++ b/wc.html @@ -0,0 +1,157 @@ +wc - CLI text processing with GNU Coreutils

wc

The wc command is useful to count the number of lines, words and characters for the given inputs.

Line, word and byte counts

By default, the wc command reports the number of lines, words and bytes (in that order). The byte count includes the newline characters, so you can use that as a measure of file size as well. Here's an example:

$ cat greeting.txt
+Hi there
+Have a nice day
+
+$ wc greeting.txt
+ 2  6 25 greeting.txt
+

Wondering why there are leading spaces in the output? They help in aligning results for multiple files (discussed later).

Individual counts

Instead of the three default values, you can use options to get only the particular counts you are interested in. These options are:

  • -l for line count
  • -w for word count
  • -c for byte count
$ wc -l greeting.txt
+2 greeting.txt
+
+$ wc -w greeting.txt
+6 greeting.txt
+
+$ wc -c greeting.txt
+25 greeting.txt
+
+$ wc -wc greeting.txt
+ 6 25 greeting.txt
+

With stdin data, you'll get only the count value (unless you use - for stdin). Useful for assigning the output to shell variables.

$ printf 'hello' | wc -c
+5
+$ printf 'hello' | wc -c -
+5 -
+
+$ lines=$(wc -l <greeting.txt)
+$ echo "$lines"
+2
+

Multiple files

If you pass multiple files to the wc command, the count values will be displayed separately for each file. You'll also get a summary at the end, which sums the respective count of all the input files.

$ wc greeting.txt nums.txt purchases.txt
+ 2  6 25 greeting.txt
+ 3  3 13 nums.txt
+ 8  9 57 purchases.txt
+13 18 95 total
+$ wc greeting.txt nums.txt purchases.txt | tail -n1
+13 18 95 total
+
+$ wc *[ck]*.csv
+  9   9 101 marks.csv
+  4   4  70 scores.csv
+ 13  13 171 total
+

If you have NUL separated filenames (for example, output from find -print0, grep -lZ, etc), you can use the --files0-from option. This option accepts a file containing the NUL separated data (use - for stdin).

$ printf 'greeting.txt\0nums.txt' | wc --files0-from=-
+2 6 25 greeting.txt
+3 3 13 nums.txt
+5 9 38 total
+

Character count

Use the -m option instead of -c if the input has multibyte characters.

# byte count
+$ printf 'αλεπού' | wc -c
+12
+
+# character count
+$ printf 'αλεπού' | wc -m
+6
+

info Note that the current locale will affect the behavior of the -m option.

$ printf 'αλεπού' | LC_ALL=C wc -m
+12
+

Longest line length

You can use the -L option to report the length of the longest line in the input (excluding the newline character of a line).

$ echo 'apple' | wc -L
+5
+# last line not ending with newline won't be a problem
+$ printf 'apple\nbanana' | wc -L
+6
+
+$ wc -L sample.txt
+26 sample.txt
+$ wc -L <sample.txt
+26
+

If multiple files are passed, the last line summary will show the maximum length among the given inputs.

$ wc -L greeting.txt nums.txt purchases.txt
+15 greeting.txt
+ 4 nums.txt
+14 purchases.txt
+15 total
+

Corner cases

Line count is based on the number of newline characters. So, if the last line of the input doesn't end with the newline character, it won't be counted.

$ printf 'good\nmorning\n' | wc -l
+2
+
+$ printf 'good\nmorning' | wc -l
+1
+
+$ printf '\n\n\n' | wc -l
+3
+

Word count is based on whitespace separation. You'll have to pre-process the input if you do not want certain non-whitespace characters to influence the results.

$ echo 'apple ; banana ; cherry' | wc -w
+5
+
+# remove characters other than alphabets and whitespaces
+$ echo 'apple ; banana ; cherry' | tr -cd 'a-zA-Z[:space:]'
+apple  banana  cherry
+$ echo 'apple ; banana ; cherry' | tr -cd 'a-zA-Z[:space:]' | wc -w
+3
+
+# allow numbers as well
+$ echo '2 : apples ;' | tr -cd '[:alnum:][:space:]' | wc -w
+2
+

-L won't count non-printable characters and tabs are converted to equivalent spaces. Multibyte characters will each be counted as 1 (depending on the locale, they might become non-printable too).

# tab characters can occupy up to 8 columns
+$ printf '\t' | wc -L
+8
+$ printf 'a\tb' | wc -L
+9
+
+# example for non-printable character
+$ printf 'a\34b' | wc -L
+2
+
+# multibyte characters are counted as 1 each in supported locales
+$ printf 'αλεπού' | wc -L
+6
+# non-supported locales can cause them to be treated as non-printable
+$ printf 'αλεπού' | LC_ALL=C wc -L
+0
+

-m and -L options count grapheme clusters differently.

$ printf 'cag̈e' | wc -m
+5
+
+$ printf 'cag̈e' | wc -L
+4
+

Exercises

info The exercises directory has all the files used in this section.

1) Save the number of lines in the greeting.txt input file to the lines shell variable.

$ lines=##### add your solution here
+$ echo "$lines"
+2
+

2) What do you think will be the output of the following command?

$ echo 'dragons:2 ; unicorns:10' | wc -w
+

3) Use appropriate options and arguments to get the output as shown below. Also, why is the line count showing as 2 instead of 3 for the stdin data?

$ printf 'apple\nbanana\ncherry' | ##### add your solution here
+      2      25 greeting.txt
+      2      19 -
+      4      44 total
+

4) Use appropriate options and arguments to get the output shown below.

$ printf 'greeting.txt\0scores.csv' | ##### add your solution here
+2 6 25 greeting.txt
+4 4 70 scores.csv
+6 10 95 total
+

5) What is the difference between wc -c and wc -m options? And which option would you use to get the longest line length?

6) Calculate the number of comma separated words from the scores.csv file.

$ cat scores.csv
+Name,Maths,Physics,Chemistry
+Ith,100,100,100
+Cy,97,98,95
+Lin,78,83,80
+
+##### add your solution here
+16
+
\ No newline at end of file diff --git a/what_next.html b/what_next.html new file mode 100644 index 0000000..92af2cc --- /dev/null +++ b/what_next.html @@ -0,0 +1,31 @@ +What next? - CLI text processing with GNU Coreutils
\ No newline at end of file