kitten → sitten (substitution of 's' for 'k')
sitten → sittin (substitution of 'i' for 'e')
sittin → sitting (insertion of 'g' at the end).
Data Structure Used: 2 D Array
Algorithm:
A straightforward implementation, as pseudocode for a function LevenshteinDistance that takes two strings, s of length m, and t of length n, and returns the Levenshtein distance between them:
int LevenshteinDistance(char s[1..m], char t[1..n])
{
// for all i and j, d[i,j] will hold the Levenshtein distance between
// the first i characters of s and the first j characters of t;
// note that d has (m+1)x(n+1) values
declare int d[0..m, 0..n]
for i from 0 to m
d[i, 0] := i // the distance of any first string to an empty second string
for j from 0 to n
d[0, j] := j // the distance of any second string to an empty first string
for j from 1 to n
{
for i from 1 to m
{
if s[i] = t[j] then
d[i, j] := d[i-1, j-1] // no operation required
else
d[i, j] := minimum
(
d[i-1, j] + 1, // a deletion
d[i, j-1] + 1, // an insertion
d[i-1, j-1] + 1 // a substitution
)
}
}
return d[m,n]
}
Two examples of the resulting matrix (hovering over a number reveals the operation performed to get that number):
k i t t e n
0 1 2 3 4 5 6
s 1 1 2 3 4 5 6
i 2 2 1 2 3 4 5
t 3 3 2 1 2 3 4
t 4 4 3 2 1 2 3
i 5 5 4 3 2 2 3
n 6 6 5 4 3 3 2
g 7 7 6 5 4 4 3
S a t u r d a y
0 1 2 3 4 5 6 7 8
S 1 0 1 2 3 4 5 6 7
u 2 1 1 2 2 3 4 5 6
n 3 2 2 2 3 3 4 5 6
d 4 3 3 3 3 4 3 4 5
a 5 4 3 4 4 4 4 3 4
y 6 5 4 4 5 5 5 4 3
Proof of correctness
As mentioned earlier, the invariant is that we can transform the initial segment s[1..i] into t[1..j] using a minimum of d[i,j] operations. This invariant holds since:
It is initially true on row and column 0 because s[1..i] can be transformed into the empty string t[1..0] by simply dropping all i characters. Similarly, we can transform s[1..0] to t[1..j] by simply adding all j characters.
If s[i] = t[j], and we can transform s[1..i-1] to t[1..j-1] in k operations, then we can do the same to s[1..i] and just leave the last character alone, giving k operations.
Otherwise, the distance is the minimum of the three possible ways to do the transformation:
If we can transform s[1..i] to t[1..j-1] in k operations, then we can simply add t[j] afterwards to get t[1..j] in k+1 operations (insertion).
If we can transform s[1..i-1] to t[1..j] in k operations, then we can remove s[i] and then do the same transformation, for a total of k+1 operations (deletion).
If we can transform s[1..i-1] to t[1..j-1] in k operations, then we can do the same to s[1..i], and exchange the original s[i] for t[j] afterwards, for a total of k+1 operations
The operations required to transform s[1..n] into t[1..m] is of course the number required to transform all of s into all of t, and so d[n,m] holds our result.
This proof fails to validate that the number placed in d[i,j] is in fact minimal; this is more difficult to show, and involves an argument by contradiction in which we assume d[i,j] is smaller than the minimum of the three, and use this to show one of the three is not minimal.
Working Code
#include
#include
#include
#define MAX 2000
using namespace std;
int minimum(int a,int b,int c)
{
if(a if(b
}
int LevenshteinDistance(char *a, char *b)
{
int **d;
int m=0,n=0,i,j;
char s[MAX]="0";
char t[MAX]="0";
d = (int **)malloc(MAX*sizeof(int));
for (i=0;i
strcat(s,a);
strcat(t,b);
m=strlen(s);
n=strlen(t);
printf("%s%s",s,t);
for(i=0;i<=m;i++) d[i][0]=i ;
for(j=0;j<=n;j++) d[0][j]=j;
for(j=1;j<=n;j++)
{
for(i=1;i<=m;i++)
{
if (s[i] == t[j])
d[i][j]=d[i-1][j-1];
else
d[i][j]=minimum(d[i-1][j] + (s[i]==t[i]?0:1),d[i][j-1] +
1,d[i-1][j-1] + 1 );
}
}
//TODO: Free "d"
return d[m][n];
}
int main()
{
int t,ed;
char s1[MAX],t1[MAX];
scanf("%d",&t);
while( t--)
{
scanf("%s%s",s1,t1);
//cout<
printf("%d\n",ed);
}
return 0;
}
Time Complexity O(N^2)
Space Complexity O(1)
Auxiliary Space O(N^2)
Run Here http://www.ideone.com/0BHid
No comments :
Post a Comment