Skip to content

Commit 0bdb22d

Browse files
committed
started on markov chains
1 parent 8c5fae1 commit 0bdb22d

File tree

8 files changed

+114
-0
lines changed

8 files changed

+114
-0
lines changed

appendix/markov/1-out.gv.pdf

12.5 KB
Binary file not shown.

appendix/markov/2-out.gv.pdf

12.8 KB
Binary file not shown.

appendix/markov/3-out.gv.pdf

12.5 KB
Binary file not shown.

appendix/markov/4-out.gv.pdf

12.2 KB
Binary file not shown.

appendix/markov/README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Markov Chains
2+
3+
Read about Markov chains:
4+
5+
* Claude Shannon's 1948 MS thesis, "A Mathematical Theory of Communication" (https://onlinelibrary.wiley.com/doi/abs/10.1002/j.1538-7305.1948.tb01338.x)
6+
* https://en.wikipedia.org/wiki/Markov_chain
7+
* Chapter 3 of _The Practice of Programming_ by Brian Kernighan and Rob Pike where they discuss implementations in C, C++, Java, awk, and Perl
8+
* "Computer Recreations", A. K. Dewdney, Scientific American, 1989 (https://archive.org/details/ComputerRecreationsMarkovChainer)
9+
10+
I'd like you to consider how a Markov chain creates a graph structure. Consult the three PDFs (generated by the `mk-graphs.sh` program) that visualize the graphs created by k-mer sizes of 1, 2, 3, and 4 when given this input:
11+
12+
````
13+
$ cat words.txt
14+
maamselle
15+
mabi
16+
mabolo
17+
mac
18+
macaasim
19+
macabre
20+
````
21+
22+
Notice that sometimes the branches terminate and sometimes you can find multiple paths through the graphs. As `k` grows, there are fewer options.

appendix/markov/graph.py

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Author : Ken Youens-Clark <kyclark@gmail.com>
4+
Date : 2019-05-24
5+
Purpose: Markov chain for characters/words
6+
"""
7+
8+
import argparse
9+
import os
10+
import re
11+
import sys
12+
from collections import defaultdict
13+
from graphviz import Digraph
14+
15+
16+
# --------------------------------------------------
17+
def get_args():
18+
"""Get command-line arguments"""
19+
20+
parser = argparse.ArgumentParser(
21+
description='Markov chain for characters/words',
22+
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
23+
24+
parser.add_argument('file',
25+
metavar='FILE',
26+
nargs='+',
27+
help='Training file(s)')
28+
29+
parser.add_argument('-k',
30+
'--kmer_size',
31+
help='Kmer size',
32+
metavar='int',
33+
type=int,
34+
default=2)
35+
36+
parser.add_argument('-o',
37+
'--outfile',
38+
help='Output file',
39+
metavar='str',
40+
type=str,
41+
default='out.gv')
42+
43+
return parser.parse_args()
44+
45+
46+
# --------------------------------------------------
47+
def main():
48+
"""Make a jazz noise here"""
49+
50+
args = get_args()
51+
k = args.kmer_size
52+
53+
chains = defaultdict(list)
54+
for file in args.file:
55+
for line in open(file):
56+
for word in line.lower().split():
57+
word = re.sub('[^a-z]', '', word)
58+
for i in range(0, len(word) - k):
59+
kmer = word[i:i + k + 1]
60+
chains[kmer[:-1]].append(kmer[-1])
61+
62+
63+
dot = Digraph(comment='Paths for K={}'.format(k))
64+
edges = set()
65+
for kmer in chains:
66+
dot.node(kmer)
67+
for char in chains[kmer]:
68+
edges.add((kmer, kmer[1:] + char))
69+
70+
dot.edges(edges)
71+
dot.render(args.outfile, view=True)
72+
73+
74+
# --------------------------------------------------
75+
if __name__ == '__main__':
76+
main()

appendix/markov/mk-graphs.sh

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/usr/bin/env bash
2+
3+
PRG="./graph.py"
4+
INPUT="words.txt"
5+
6+
for K in 1 2 3 4; do
7+
$PRG -k $K -o "$K-out.gv" "$INPUT"
8+
done
9+
10+
echo "Done."

appendix/markov/words.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
maamselle
2+
mabi
3+
mabolo
4+
mac
5+
macaasim
6+
macabre

0 commit comments

Comments
 (0)