@@ -40,7 +40,76 @@ three edits:
40402 . sitt** e** n → sitt** i** n (substitution of "i" for "e")
41413 . sittin → sittin** g** (insertion of "g" at the end).
4242
43+ ## Applications
44+
45+ This has a wide range of applications, for instance, spell checkers, correction
46+ systems for optical character recognition, fuzzy string searching, and software
47+ to assist natural language translation based on translation memory.
48+
49+ ## Dynamic Programming Approach Explanation
50+
51+ Let’s take a simple example of finding minimum edit distance between
52+ strings ` ME ` and ` MY ` . Intuitively you already know that minimum edit distance
53+ here is ` 1 ` operation and this operation. And it is a replacing ` E ` with ` Y ` . But
54+ let’s try to formalize it in a form of the algorithm in order to be able to
55+ do more complex examples like transforming ` Saturday ` into ` Sunday ` .
56+
57+ To apply the mathematical formula mentioned above to ` ME → MY ` transformation
58+ we need to know minimum edit distances of ` ME → M ` , ` M → MY ` and ` M → M ` transformations
59+ in prior. Then we will need to pick the minimum one and add _ one_ operation to
60+ transform last letters ` E → Y ` . So minimum edit distance of ` ME → MY ` transformation
61+ is being calculated based on three previously possible transformations.
62+
63+ To explain this further let’s draw the following matrix:
64+
65+ ![ Levenshtein Matrix] ( https://cdn-images-1.medium.com/max/1600/1*2d46ug_PL5LfeqztckoYGw.jpeg )
66+
67+ - Cell ` (0:1) ` contains red number 1. It means that we need 1 operation to
68+ transform ` M ` to an empty string. And it is by deleting ` M ` . This is why this number is red.
69+ - Cell ` (0:2) ` contains red number 2. It means that we need 2 operations
70+ to transform ` ME ` to an empty string. And it is by deleting ` E ` and ` M ` .
71+ - Cell ` (1:0) ` contains green number 1. It means that we need 1 operation
72+ to transform an empty string to ` M ` . And it is by inserting ` M ` . This is why this number is green.
73+ - Cell ` (2:0) ` contains green number 2. It means that we need 2 operations
74+ to transform an empty string to ` MY ` . And it is by inserting ` Y ` and ` M ` .
75+ - Cell ` (1:1) ` contains number 0. It means that it costs nothing
76+ to transform ` M ` into ` M ` .
77+ - Cell ` (1:2) ` contains red number 1. It means that we need 1 operation
78+ to transform ` ME ` to ` M ` . And it is be deleting ` E ` .
79+ - And so on...
80+
81+ This looks easy for such small matrix as ours (it is only ` 3x3 ` ). But here you
82+ may find basic concepts that may be applied to calculate all those numbers for
83+ bigger matrices (let’s say ` 9x7 ` one, for ` Saturday → Sunday ` transformation).
84+
85+ According to the formula you only need three adjacent cells ` (i-1:j) ` , ` (i-1:j-1) ` , and ` (i:j-1) ` to
86+ calculate the number for current cell ` (i:j) ` . All we need to do is to find the
87+ minimum of those three cells and then add ` 1 ` in case if we have different
88+ letters in ` i ` 's row and ` j ` 's column.
89+
90+ You may clearly see the recursive nature of the problem.
91+
92+ ![ Levenshtein Matrix] ( https://cdn-images-1.medium.com/max/2000/1*JdHQ5TeKiDlE-iKK1s_2vw.jpeg )
93+
94+ Let's draw a decision graph for this problem.
95+
96+ ![ Minimum Edit Distance Decision Graph] ( https://cdn-images-1.medium.com/max/1600/1*SGwYUpXH9H1xUeTvJk0e7Q.jpeg )
97+
98+ You may see a number of overlapping sub-problems on the picture that are marked
99+ with red. Also there is no way to reduce the number of operations and make it
100+ less then a minimum of those three adjacent cells from the formula.
101+
102+ Also you may notice that each cell number in the matrix is being calculated
103+ based on previous ones. Thus the tabulation technique (filling the cache in
104+ bottom-up direction) is being applied here.
105+
106+ Applying this principles further we may solve more complicated cases like
107+ with ` Saturday → Sunday ` transformation.
108+
109+ ![ Levenshtein distance] ( https://cdn-images-1.medium.com/max/1600/1*fPEHiImYLKxSTUhrGbYq3g.jpeg )
110+
43111## References
44112
45113- [ Wikipedia] ( https://en.wikipedia.org/wiki/Levenshtein_distance )
46114- [ YouTube] ( https://www.youtube.com/watch?v=We3YDTzNXEk&list=PLLXdhg_r2hKA7DPDsunoDZ-Z769jWn4R8 )
115+ - [ ITNext] ( https://itnext.io/dynamic-programming-vs-divide-and-conquer-2fea680becbe )
0 commit comments