Monday, July 4, 2011

WAP to Find The Minimum Number of Character Required from 1 String to Other String

For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
kitten → sitten (substitution of 's' for 'k')
sitten → sittin (substitution of 'i' for 'e')
sittin → sitting (insertion of 'g' at the end).

Data Structure Used: 2 D Array

Algorithm:

A straightforward implementation, as pseudocode for a function LevenshteinDistance that takes two strings, s of length m, and t of length n, and returns the Levenshtein distance between them:
int LevenshteinDistance(char s[1..m], char t[1..n])
{
// for all i and j, d[i,j] will hold the Levenshtein distance between
// the first i characters of s and the first j characters of t;
// note that d has (m+1)x(n+1) values
declare int d[0..m, 0..n]

for i from 0 to m
d[i, 0] := i // the distance of any first string to an empty second string
for j from 0 to n
d[0, j] := j // the distance of any second string to an empty first string

for j from 1 to n
{
for i from 1 to m
{
if s[i] = t[j] then
d[i, j] := d[i-1, j-1] // no operation required
else
d[i, j] := minimum
(
d[i-1, j] + 1, // a deletion
d[i, j-1] + 1, // an insertion
d[i-1, j-1] + 1 // a substitution
)
}
}

return d[m,n]
}

Two examples of the resulting matrix (hovering over a number reveals the operation performed to get that number):
k i t t e n
0 1 2 3 4 5 6
s 1 1 2 3 4 5 6
i 2 2 1 2 3 4 5
t 3 3 2 1 2 3 4
t 4 4 3 2 1 2 3
i 5 5 4 3 2 2 3
n 6 6 5 4 3 3 2
g 7 7 6 5 4 4 3

S a t u r d a y
0 1 2 3 4 5 6 7 8
S 1 0 1 2 3 4 5 6 7
u 2 1 1 2 2 3 4 5 6
n 3 2 2 2 3 3 4 5 6
d 4 3 3 3 3 4 3 4 5
a 5 4 3 4 4 4 4 3 4
y 6 5 4 4 5 5 5 4 3


Proof of correctness
As mentioned earlier, the invariant is that we can transform the initial segment s[1..i] into t[1..j] using a minimum of d[i,j] operations. This invariant holds since:
It is initially true on row and column 0 because s[1..i] can be transformed into the empty string t[1..0] by simply dropping all i characters. Similarly, we can transform s[1..0] to t[1..j] by simply adding all j characters.

If s[i] = t[j], and we can transform s[1..i-1] to t[1..j-1] in k operations, then we can do the same to s[1..i] and just leave the last character alone, giving k operations.
Otherwise, the distance is the minimum of the three possible ways to do the transformation:
If we can transform s[1..i] to t[1..j-1] in k operations, then we can simply add t[j] afterwards to get t[1..j] in k+1 operations (insertion).

If we can transform s[1..i-1] to t[1..j] in k operations, then we can remove s[i] and then do the same transformation, for a total of k+1 operations (deletion).
If we can transform s[1..i-1] to t[1..j-1] in k operations, then we can do the same to s[1..i], and exchange the original s[i] for t[j] afterwards, for a total of k+1 operations

The operations required to transform s[1..n] into t[1..m] is of course the number required to transform all of s into all of t, and so d[n,m] holds our result.
This proof fails to validate that the number placed in d[i,j] is in fact minimal; this is more difficult to show, and involves an argument by contradiction in which we assume d[i,j] is smaller than the minimum of the three, and use this to show one of the three is not minimal.


Working Code

#include
#include
#include
#define MAX 2000
using namespace std;

int minimum(int a,int b,int c)
{
if(a if(b return c;
}

int LevenshteinDistance(char *a, char *b)
{
int **d;
int m=0,n=0,i,j;
char s[MAX]="0";
char t[MAX]="0";
d = (int **)malloc(MAX*sizeof(int));
for (i=0;i d[i] = (int *)malloc(MAX*sizeof(int));
strcat(s,a);
strcat(t,b);
m=strlen(s);
n=strlen(t);
printf("%s%s",s,t);
for(i=0;i<=m;i++) d[i][0]=i ;
for(j=0;j<=n;j++) d[0][j]=j;
for(j=1;j<=n;j++)
{
for(i=1;i<=m;i++)
{
if (s[i] == t[j])
d[i][j]=d[i-1][j-1];
else
d[i][j]=minimum(d[i-1][j] + (s[i]==t[i]?0:1),d[i][j-1] +
1,d[i-1][j-1] + 1 );
}
}
//TODO: Free "d"
return d[m][n];
}

int main()
{
int t,ed;
char s1[MAX],t1[MAX];
scanf("%d",&t);
while( t--)
{
scanf("%s%s",s1,t1);
//cout< ed=LevenshteinDistance(s1,t1);
printf("%d\n",ed);
}
return 0;
}

Time Complexity O(N^2)
Space Complexity O(1)
Auxiliary Space O(N^2)
Run Here http://www.ideone.com/0BHid

Reorder 1st string based on 2nd string. eg: (tractor,car) output: carrtto or carrott or carrtot.

You have a stream of random numbers that are inserted into an expanding array. How can you maintain the current median and what is the complexity.

Given a linked list structure where every node represents a linked list and contains two pointers of its type: (i) pointer to next node in the main list. (ii) pointer to a linked list where this node is head. Write a C function to flatten the list into a single linked list. (from here)

Given an Sorted Array of 0s and 1s of infinite length fin the position of transition in array

e.g. 0000000....01......11111.....1.. infinite
ith position this wher 0 is converted in to 1


Data Structure Used: Array

Algorithm:(Done Will Post Soon)


Working Code:


Time Complexity O(logn) yes its logn ??
Space Complexity O(1)
Run Here

WAP to Find The Difference the between even-level-nodes sum & odd-level-nodes sum in binary tree consider root at 0th level which even level starting form 0th level

Example

Data Structure Used: Binary Tree

Algorithm:(Done Will Post Soon)


Working Code:


Time Complexity O(n)
Space Complexity O(1)
Run Here

Given An Unsorted with lot of duplicates & having Special property with it that every concecutive pair has maximum difference of 1 & also given the 1st last element in array , WAP to Search An Element X Efficiently?

Example
1 2 1 2 3 2 3 4 5 4 6 7 6 7 8 ....10000
Given for all in i we have to search an element x(say 3) & return any index of it.


Data Structure Used: Array

Algorithm:(Done Will Post Soon)


Working Code:


Time Complexity O(logn) yes its logn ??
Space Complexity O(1)
Run Here

Thursday, June 30, 2011

Knapsack Problem (Unbounded & 0/1)

The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most useful items.

In the following, we have n kinds of items, 1 through n. Each kind of item i has a value vi and a weight wi. We usually assume that all values and weights are nonnegative. To simplify the representation, we can also assume that the items are listed in increasing order of weight. The maximum weight that we can carry in the bag is W.
The most common formulation of the problem is the 0-1 knapsack problem, which restricts the number xi of copies of each kind of item to zero or one. Mathematically the 0-1-knapsack problem can be formulated as:

maximize sum(vi,xi) for i=0 to n

such that sum(wi*xi)<=W & x(0,1) The unbounded knapsack problem (UKP) places no upper bound on the number of copies of each kind of item. Of particular interest is the special case of the problem with these properties: it is a decision problem, it is a 0-1 problem, for each kind of item, the weight equals the value: wi = vi. Notice that in this special case, the problem is equivalent to this: given a set of nonnegative integers, does any subset of it add up to exactly W? Or, if negative weights are allowed and W is chosen to be zero, the problem is: given a set of integers, does any nonempty subset add up to exactly 0? This special case is called the subset sum problem. In the field of cryptography the term knapsack problem is often used to refer specifically to the subset sum problem.\ Dynamic Programming Solution: Unbounded knapsack problem If all weights () are nonnegative integers, the knapsack problem can be solved in pseudo-polynomial time using dynamic programming. The following describes a dynamic programming solution for the unbounded knapsack problem. To simplify things, assume all weights are strictly positive (wi > 0). We wish to maximize total value subject to the constraint that total weight is less than or equal to W. Then for each w ≤ W, define m[w] to be the maximum value that can be attained with total weight less than or equal to w. m[W] then is the solution to the problem.
Observe that m[w] has the following properties:

m[0]=0 (the sum of zero items, i.e., the summation of the empty set)
m[i]=max(m[w-1],vi+m[w-wi]) where wi<=w where vi is the value of the i-th kind of item. Here the maximum of the empty set is taken to be zero. Tabulating the results from m[0] up through m[W] gives the solution. Since the calculation of each m[w] involves examining n items, and there are W values of m[w] to calculate, the running time of the dynamic programming solution is O(nW). Dividing by their greatest common divisor is an obvious way to improve the running time. Working Code: import java.io.*; class knapsack10 { public static void main(String args[]) throws IOException { int capacity,n; BufferedReader br=new BufferedReader(new InputStreamReader(System.in)); System.out.println ("\n Enter the number of items u want to enter:"); n= Integer.parseInt(br.readLine()); float p[]=new float[n+1]; float x[]=new float[n+1]; int i,j,k,w; int WEIGHT[]=new int[n+1],PROFIT[]=new int[n+1]; WEIGHT[0]=0; PROFIT[0]=0; for(i=1;i<=n;i++) { System.out.println ("\n Enter the weight and profit of "+i+" : item "); WEIGHT[i]= Integer.parseInt(br.readLine()); PROFIT[i]= Integer.parseInt(br.readLine()); p[i]=0; x[i]=0; } System.out.println ("\n Enter the capacity of the knapsack : "); capacity = Integer.parseInt(br.readLine()); float c[][]=new float [n+1][capacity+1]; for(i=0;i<=n;i++) for(j=0;j<=capacity;j++) c[i][j] = 0; for(i=1;i<=n;i++) for(w=1;w<=capacity;w++) if(WEIGHT[i]<=w){ if ((PROFIT[i]+c[i-1][w-WEIGHT[i]])>c[i-1][w])
{
c[i][w] = PROFIT[i] + c[i-1][w-WEIGHT[i]];
p[i] = 1;
}
else
{
c[i][w]=c[i-1][w];
p[i] = 0;
}}
else
{
c[i][w]=c[i-1][w];
p[i] = 0;
}

float temp=0;
int t=0;
for(j=1;j<=capacity;j++)
{
temp = c[n-1][j]-c[n-1][j-1];
for(i=1;i if(temp==PROFIT[i])
t=i;
x[t] = 1;


}
for(j=0;j<=n;j++)
System.out.println (j+" "+x[j] );
System.out.println ("The profit obtained is "+c[n][capacity]);
}
}

The O(nW) complexity does not contradict the fact that the knapsack problem is NP-complete, since W, unlike n, is not polynomial in the length of the input to the problem. The length of the W input to the problem is proportional to the number of bits in W, logW, not to W itself.

Time Complexity O(NlogN) W<=logW
Space Complexity (N^2)

More Info.
www.es.ele.tue.nl/education/5MC10/Solutions/knapsack.pdf
www.cse.unl.edu/~goddard/Courses/CSCE310J/Lectures/Lecture8-DynamicProgramming.pdf

2nd Knapsack 0/1 Problem(In Progress)

Constructing a Binary Search Tree from Post Order traversal Efficiently.

Data Structure: Array,Binary Search Tree


Algorithm & Approach
1.As we have given Postorder Traversal we know last value will be root insert it in BST as Root.
2.While Iterating Over Array for each element (from as last position Obvious) compare root value
from next value in array so if
a. it is greater then root then insert it into right subtree of root Avg (logn),Worst O(N)
b. if next value is less then root then insert it into left subtree of root
3.End


Working Code:

#include
#include

typedef struct Tnode {
int data;
struct Tnode *left;
struct Tnode *right;
} Tnode;

#include
#include


/* Helper function that allocates a new node with the
given data and NULL left and right pointers. */
Tnode * newNode(int data)
{
Tnode * node = (Tnode *)
malloc(sizeof(Tnode ));
node->data = data;
node->left = NULL;
node->right = NULL;

return(node);
}


Tnode *constructBSTfrompostorder(Tnode *root,int a[],int n)
{
int i=0;
Tnode *t=NULL;
for(i=n-1;i>=0;i--)
{
t=root;
if(root==NULL)
{
root=newNode(a[i]);
continue;
}
while(root!=NULL)
{
if(a[i]>root->data)
{
if(root->right==NULL)
{
root->right=newNode(a[i]);
break;
}
root=root->right;
}
else
{
if(root->left==NULL)
{
root->left=newNode(a[i]);
break;
}
root=root->left;
}
}
root=t;
}
return root;
}

void postorder(Tnode *root)
{
if(root==NULL)
return;
postorder(root->left);
postorder(root->right);
printf("%d,",root->data);
}

int main(int argc, char** argv) {

Tnode *root=NULL;

/*root = createtreenode(15);

root->left = createtreenode(7);
root->right = createtreenode(25);

root->left->left = createtreenode(3);
root->left->right = createtreenode(10);

root->right->left = NULL;
root->right->right = createtreenode(50);

root->left->right->left = createtreenode(8);
root->left->right->right = createtreenode(12);

root->left->right->left->left = NULL;
root->left->right->left->right = createtreenode(9);

root->left->right->right->left = createtreenode(11);
root->left->right->right->right = createtreenode(14);

root->right->right->left = createtreenode(30);
root->right->right->right = createtreenode(55);

root->right->right->right->left = createtreenode(52);
root->right->right->right->right = NULL;*/

//postorder(root);

int a[]={3,9,8,11,14,12,10,7,30,52,55,50,25,15};
root=constructBSTfrompostorder(root,a,14);
postorder(root);
return (EXIT_SUCCESS);
}


Time Complexity O(NlogN) but in worst case it will take o(N^2) Time if we have given postorder
of right skewed tree
Space Complexity O(1)
Run Here http://ideone.com/3KMy7

Monday, June 27, 2011

Longest Repeated SubString e.g. Maximum Length SubString

For instance, the longest repeated string in ``Ask not what your
country can do for you, but what you can do for your country'' is `` can do for you'', with `` your country'' a close
second place.

Suppose, though, that you had a chance to preprocess the body of text before performing searches. You could make a hash table (or search tree) to index every distinct word of the document, and store a list of every occurrence of
each word. Such an ``inverted index'' allows a program to look up a given word quickly. One can look up phrases by intersecting the lists of the words they contain, but this is subtle to implement and potentially slow. (Some web
search engines do, however, take exactly this approach.) We'll turn now to a powerful data structure and apply it to a small problem: given an input file of text, find the longest duplicated substring of characters in it. For instance, the longest repeated string in ``Ask not what your country can do for you, but what you can do for your country'' is `` can do for you'', with `` your country'' a close
second place. How would you write a program to solve this problem?

Data Structure :Suffix Array

Algorithm

This problem is reminiscent of the anagram problem that we saw in Section 2.4. If the input string is stored in c[0..n-1], then we could start by comparing every pair of substrings using pseudocode like this

maxlen = -1
for i = [0, n)
for j = (i, n)
if (thislen = comlen(&c[i], &c[j])) > maxlen
maxlen = thislen
maxi = i
maxj = j

The comlen function returns the length that its two parameter strings have in common, starting with their first
characters:

int comlen(char *p, char *q)
i = 0
while *p && (*p++ == *q++)
i++
return i

Because this algorithm looks at all pairs of substrings, it takes time proportional to n2, at least. We might be able to
speed it up by using a hash table to search for words in the phrases, but we'll instead take an entirely new approach.

Our program will process at most MAXN characters, which it stores in the array c:

#define MAXN 5000000
char c[MAXN], *a[MAXN];

We'll use a simple data structure known as a ``suffix array''; the structure has been used at least since the 1970's,
though the term was introduced in the 1990's. The structure is an array a of pointers to characters. As we read the
input, we initialize a so that each element points to the corresponding character in the input string:

while (ch = getchar()) != EOF
a[n] = &c[n]
c[n++] = ch
c[n] = 0

The final element of c contains a null character, which terminates all strings.
The element a[0] points to the entire string; the next element points to the suffix of the array beginning with the
second character, and so on. On the input string ``banana'', the array will represent these suffixes:

a[0]: banana
a[1]: anana
a[2]: nana
a[3]: ana
a[4]: na
a[5]: a

The pointers in the array a together point to every suffix in the string, hence the name ``suffix array''.If a long string occurs twice in the array c, it appears in two different suffixes. We will therefore sort the array to bring together equal suffixes (just as sorting brought together anagrams in Section 2.4). The ``banana'' array sorts to

a[0]: a
a[1]: ana
a[2]: anana
a[3]: banana
a[4]: na
a[5]: nana

We can then scan through this array comparing adjacent elements to find the longest repeated string, which in this case is ``ana''. We'll sort the suffix array with the qsort function: qsort(a, n, sizeof(char *), pstrcmp) The pstrcmp comparison function adds one level of indirection to the library strcmp function. This scan through the array uses the comlen function to count the number of letters that two adjacent words have in common:

for i = [0, n)
if comlen(a[i], a[i+1]) > maxlen
maxlen = comlen(a[i], a[i+1])
maxi = i
printf("%.*s\n", maxlen, a[maxi])
The printf statement uses the ``*'' precision to print maxlen characters of the string.

I ran the resulting program to find the longest repeated string in the 807,503 characters in Samuel Butler's
translation of Homer's Iliad. The program took 4.8 seconds to locate this string:
whose sake so many of the Achaeans have died at Troy, far from their homes? Go about at once among the
host, and speak fairly to them, man by man, that they draw not their ships into the sea.
The text first occurs when Juno suggests it to Minerva as an argument that might keep the Greeks (Achaeans) from
departing from Troy; it occurs shortly thereafter when Minerva repeats the argument verbatim to

Working Code:

#include
#include
#include
int pstrcmp(char **p, char **q)
{ return strcmp(*p, *q); }
int comlen(char *p, char *q)
{ int i = 0;
while (*p && (*p++ == *q++))
i++;
return i;
}
#define M 1
#define MAXN 5000000
char c[MAXN]="banana", *a[MAXN];
int main()
{ int i, ch, n = 0, maxi, maxlen = -1;
while ((ch = getchar()) != EOF) {
a[n] = &c[n];
c[n++] = ch;
}
c[n] = 0;
qsort(a, n, sizeof(char *), pstrcmp);
for (i = 0; i < n-M; i++) if (comlen(a[i], a[i+M]) > maxlen) {
maxlen = comlen(a[i], a[i+M]);
maxi = i;
}
printf("%.*s\n", maxlen, a[maxi]);
return 0;
}

Time Complexity O(NLogN)
Space Comple4xity O(1)
Auxillary Space O(N)
Run Here https://ideone.com/IQK8g

Source Jhon Bentely Programming Pearls
http://www.cs.pitt.edu/~kirk/cs1501/Pruhs/Fall2006/Assignments/repeatedstrings/index.html


Follow Up:
How would you modify the program for finding duplicated strings to find the longest string that occurs more than M times?

#include
#include
#include
int pstrcmp(char **p, char **q)
{ return strcmp(*p, *q); }
int comlen(char *p, char *q)
{ int i = 0;
while (*p && (*p++ == *q++))
i++;
return i;
}
#define M 1
#define MAXN 5000000
char c[MAXN], *a[MAXN];
int main()
{ int i, ch, n = 0, maxi, maxlen = -1;
while ((ch = getchar()) != EOF) {
a[n] = &c[n];
c[n++] = ch;
}
c[n] = 0;
qsort(a, n, sizeof(char *), pstrcmp);
for (i = 0; i < n-M; i++) if (comlen(a[i], a[i+M]) > maxlen) {
maxlen = comlen(a[i], a[i+M]);
maxi = i;
}
printf("%.*s\n", maxlen, a[maxi]);
return 0;
}


Here is good lecture on suffix tree http://www.youtube.com/watch?v=jXAHLqQthKw
Here you can try some of applications of suffix tree http://algs4.cs.princeton.edu/63suffix/