Monday, July 4, 2011

WAP to Find The Minimum Number of Character Required from 1 String to Other String

For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
kitten → sitten (substitution of 's' for 'k')
sitten → sittin (substitution of 'i' for 'e')
sittin → sitting (insertion of 'g' at the end).

Data Structure Used: 2 D Array

Algorithm:

A straightforward implementation, as pseudocode for a function LevenshteinDistance that takes two strings, s of length m, and t of length n, and returns the Levenshtein distance between them:
int LevenshteinDistance(char s[1..m], char t[1..n])
{
// for all i and j, d[i,j] will hold the Levenshtein distance between
// the first i characters of s and the first j characters of t;
// note that d has (m+1)x(n+1) values
declare int d[0..m, 0..n]

for i from 0 to m
d[i, 0] := i // the distance of any first string to an empty second string
for j from 0 to n
d[0, j] := j // the distance of any second string to an empty first string

for j from 1 to n
{
for i from 1 to m
{
if s[i] = t[j] then
d[i, j] := d[i-1, j-1] // no operation required
else
d[i, j] := minimum
(
d[i-1, j] + 1, // a deletion
d[i, j-1] + 1, // an insertion
d[i-1, j-1] + 1 // a substitution
)
}
}

return d[m,n]
}

Two examples of the resulting matrix (hovering over a number reveals the operation performed to get that number):
k i t t e n
0 1 2 3 4 5 6
s 1 1 2 3 4 5 6
i 2 2 1 2 3 4 5
t 3 3 2 1 2 3 4
t 4 4 3 2 1 2 3
i 5 5 4 3 2 2 3
n 6 6 5 4 3 3 2
g 7 7 6 5 4 4 3

S a t u r d a y
0 1 2 3 4 5 6 7 8
S 1 0 1 2 3 4 5 6 7
u 2 1 1 2 2 3 4 5 6
n 3 2 2 2 3 3 4 5 6
d 4 3 3 3 3 4 3 4 5
a 5 4 3 4 4 4 4 3 4
y 6 5 4 4 5 5 5 4 3


Proof of correctness
As mentioned earlier, the invariant is that we can transform the initial segment s[1..i] into t[1..j] using a minimum of d[i,j] operations. This invariant holds since:
It is initially true on row and column 0 because s[1..i] can be transformed into the empty string t[1..0] by simply dropping all i characters. Similarly, we can transform s[1..0] to t[1..j] by simply adding all j characters.

If s[i] = t[j], and we can transform s[1..i-1] to t[1..j-1] in k operations, then we can do the same to s[1..i] and just leave the last character alone, giving k operations.
Otherwise, the distance is the minimum of the three possible ways to do the transformation:
If we can transform s[1..i] to t[1..j-1] in k operations, then we can simply add t[j] afterwards to get t[1..j] in k+1 operations (insertion).

If we can transform s[1..i-1] to t[1..j] in k operations, then we can remove s[i] and then do the same transformation, for a total of k+1 operations (deletion).
If we can transform s[1..i-1] to t[1..j-1] in k operations, then we can do the same to s[1..i], and exchange the original s[i] for t[j] afterwards, for a total of k+1 operations

The operations required to transform s[1..n] into t[1..m] is of course the number required to transform all of s into all of t, and so d[n,m] holds our result.
This proof fails to validate that the number placed in d[i,j] is in fact minimal; this is more difficult to show, and involves an argument by contradiction in which we assume d[i,j] is smaller than the minimum of the three, and use this to show one of the three is not minimal.


Working Code

#include
#include
#include
#define MAX 2000
using namespace std;

int minimum(int a,int b,int c)
{
if(a if(b return c;
}

int LevenshteinDistance(char *a, char *b)
{
int **d;
int m=0,n=0,i,j;
char s[MAX]="0";
char t[MAX]="0";
d = (int **)malloc(MAX*sizeof(int));
for (i=0;i d[i] = (int *)malloc(MAX*sizeof(int));
strcat(s,a);
strcat(t,b);
m=strlen(s);
n=strlen(t);
printf("%s%s",s,t);
for(i=0;i<=m;i++) d[i][0]=i ;
for(j=0;j<=n;j++) d[0][j]=j;
for(j=1;j<=n;j++)
{
for(i=1;i<=m;i++)
{
if (s[i] == t[j])
d[i][j]=d[i-1][j-1];
else
d[i][j]=minimum(d[i-1][j] + (s[i]==t[i]?0:1),d[i][j-1] +
1,d[i-1][j-1] + 1 );
}
}
//TODO: Free "d"
return d[m][n];
}

int main()
{
int t,ed;
char s1[MAX],t1[MAX];
scanf("%d",&t);
while( t--)
{
scanf("%s%s",s1,t1);
//cout< ed=LevenshteinDistance(s1,t1);
printf("%d\n",ed);
}
return 0;
}

Time Complexity O(N^2)
Space Complexity O(1)
Auxiliary Space O(N^2)
Run Here http://www.ideone.com/0BHid

No comments :