Monday, June 27, 2011

Longest Repeated SubString e.g. Maximum Length SubString

For instance, the longest repeated string in ``Ask not what your
country can do for you, but what you can do for your country'' is `` can do for you'', with `` your country'' a close
second place.

Suppose, though, that you had a chance to preprocess the body of text before performing searches. You could make a hash table (or search tree) to index every distinct word of the document, and store a list of every occurrence of
each word. Such an ``inverted index'' allows a program to look up a given word quickly. One can look up phrases by intersecting the lists of the words they contain, but this is subtle to implement and potentially slow. (Some web
search engines do, however, take exactly this approach.) We'll turn now to a powerful data structure and apply it to a small problem: given an input file of text, find the longest duplicated substring of characters in it. For instance, the longest repeated string in ``Ask not what your country can do for you, but what you can do for your country'' is `` can do for you'', with `` your country'' a close
second place. How would you write a program to solve this problem?

Data Structure :Suffix Array

Algorithm

This problem is reminiscent of the anagram problem that we saw in Section 2.4. If the input string is stored in c[0..n-1], then we could start by comparing every pair of substrings using pseudocode like this

maxlen = -1
for i = [0, n)
for j = (i, n)
if (thislen = comlen(&c[i], &c[j])) > maxlen
maxlen = thislen
maxi = i
maxj = j

The comlen function returns the length that its two parameter strings have in common, starting with their first
characters:

int comlen(char *p, char *q)
i = 0
while *p && (*p++ == *q++)
i++
return i

Because this algorithm looks at all pairs of substrings, it takes time proportional to n2, at least. We might be able to
speed it up by using a hash table to search for words in the phrases, but we'll instead take an entirely new approach.

Our program will process at most MAXN characters, which it stores in the array c:

#define MAXN 5000000
char c[MAXN], *a[MAXN];

We'll use a simple data structure known as a ``suffix array''; the structure has been used at least since the 1970's,
though the term was introduced in the 1990's. The structure is an array a of pointers to characters. As we read the
input, we initialize a so that each element points to the corresponding character in the input string:

while (ch = getchar()) != EOF
a[n] = &c[n]
c[n++] = ch
c[n] = 0

The final element of c contains a null character, which terminates all strings.
The element a[0] points to the entire string; the next element points to the suffix of the array beginning with the
second character, and so on. On the input string ``banana'', the array will represent these suffixes:

a[0]: banana
a[1]: anana
a[2]: nana
a[3]: ana
a[4]: na
a[5]: a

The pointers in the array a together point to every suffix in the string, hence the name ``suffix array''.If a long string occurs twice in the array c, it appears in two different suffixes. We will therefore sort the array to bring together equal suffixes (just as sorting brought together anagrams in Section 2.4). The ``banana'' array sorts to

a[0]: a
a[1]: ana
a[2]: anana
a[3]: banana
a[4]: na
a[5]: nana

We can then scan through this array comparing adjacent elements to find the longest repeated string, which in this case is ``ana''. We'll sort the suffix array with the qsort function: qsort(a, n, sizeof(char *), pstrcmp) The pstrcmp comparison function adds one level of indirection to the library strcmp function. This scan through the array uses the comlen function to count the number of letters that two adjacent words have in common:

for i = [0, n)
if comlen(a[i], a[i+1]) > maxlen
maxlen = comlen(a[i], a[i+1])
maxi = i
printf("%.*s\n", maxlen, a[maxi])
The printf statement uses the ``*'' precision to print maxlen characters of the string.

I ran the resulting program to find the longest repeated string in the 807,503 characters in Samuel Butler's
translation of Homer's Iliad. The program took 4.8 seconds to locate this string:
whose sake so many of the Achaeans have died at Troy, far from their homes? Go about at once among the
host, and speak fairly to them, man by man, that they draw not their ships into the sea.
The text first occurs when Juno suggests it to Minerva as an argument that might keep the Greeks (Achaeans) from
departing from Troy; it occurs shortly thereafter when Minerva repeats the argument verbatim to

Working Code:

#include
#include
#include
int pstrcmp(char **p, char **q)
{ return strcmp(*p, *q); }
int comlen(char *p, char *q)
{ int i = 0;
while (*p && (*p++ == *q++))
i++;
return i;
}
#define M 1
#define MAXN 5000000
char c[MAXN]="banana", *a[MAXN];
int main()
{ int i, ch, n = 0, maxi, maxlen = -1;
while ((ch = getchar()) != EOF) {
a[n] = &c[n];
c[n++] = ch;
}
c[n] = 0;
qsort(a, n, sizeof(char *), pstrcmp);
for (i = 0; i < n-M; i++) if (comlen(a[i], a[i+M]) > maxlen) {
maxlen = comlen(a[i], a[i+M]);
maxi = i;
}
printf("%.*s\n", maxlen, a[maxi]);
return 0;
}

Time Complexity O(NLogN)
Space Comple4xity O(1)
Auxillary Space O(N)
Run Here https://ideone.com/IQK8g

Source Jhon Bentely Programming Pearls
http://www.cs.pitt.edu/~kirk/cs1501/Pruhs/Fall2006/Assignments/repeatedstrings/index.html


Follow Up:
How would you modify the program for finding duplicated strings to find the longest string that occurs more than M times?

#include
#include
#include
int pstrcmp(char **p, char **q)
{ return strcmp(*p, *q); }
int comlen(char *p, char *q)
{ int i = 0;
while (*p && (*p++ == *q++))
i++;
return i;
}
#define M 1
#define MAXN 5000000
char c[MAXN], *a[MAXN];
int main()
{ int i, ch, n = 0, maxi, maxlen = -1;
while ((ch = getchar()) != EOF) {
a[n] = &c[n];
c[n++] = ch;
}
c[n] = 0;
qsort(a, n, sizeof(char *), pstrcmp);
for (i = 0; i < n-M; i++) if (comlen(a[i], a[i+M]) > maxlen) {
maxlen = comlen(a[i], a[i+M]);
maxi = i;
}
printf("%.*s\n", maxlen, a[maxi]);
return 0;
}


Here is good lecture on suffix tree http://www.youtube.com/watch?v=jXAHLqQthKw
Here you can try some of applications of suffix tree http://algs4.cs.princeton.edu/63suffix/

Largest Sum Contiguous Subarray

Problem Statement:Write an efficient C program to find the sum of contiguous subarray within a one-dimensional array of numbers which has the largest sum.

Data Structure Used:Array

O(N^3) Algorithm

#include
#include
#include


Find the longest arithmetic series that can be formed by choosing a sub-collection (possibly the entire collection)..TC problem

An arithmetic series consists of a sequence of terms such that each term minus its immediate predecessor gives the same result. For example, the sequence 3,7,11,15 is the terms of the arithmetic series 3+7+11+15; each term minus its predecessor equals 4. (Of course there is no requirement on the first term since it has no predecessor.)

Given a collection of integers, we want to find the longest arithmetic series that can be formed by choosing a sub-collection (possibly the entire collection). Create a class ASeries that contains a method longest that is given a int[] values and returns the length of the longest arithmetic series that can be formed from values.

Definition

Class: ASeries
Method: longest
Parameters: int[]
Returns: int
Method signature:int longest(int[] values) (be sure your method is public)

Constraints
- values will contain between 2 and 50 elements inclusive.
- Each element of values will be between -1,000,000 and 1,000,000 inclusive.

Examples
0)



{3,8,4,5,6,2,2}

Returns: 5

No arithmetic series using these values is longer than 2,3,4,5,6.
1)



{-1,-5,1,3}

Returns: 3

-1, 1, 3 is an arithmetic series (so is 3,-1,-5).
2)



{-10,-20,-10,-10}

Returns: 3

-10,-10,-10 is an arithmetic series.

I Took The Problem From My Friend Rizwan's Blog (he is Kind oF Computer Geek) :)
he told that success rate of this problem is about 52% so that makes me crazy to,solve it.

first let me try myself thinking how i can approach

Given an array of integers A, give an algorithm to find the longest Arithmetic progression in it

i.e find a sequence i1 < i2 < … < ik, such that
A[i1], A[i2], …, A[ik] forms an arithmetic progression, and k is the largest possible.
The sequence S1, S2, …, Sk is called an arithmetic progression if
Sj+1 – Sj is a constant

http://theory.cs.uiuc.edu/~jeffe/pubs/pdf/arith.pdf

Sunday, June 26, 2011

Given an array and find two numbers in the array having difference equal to given number

Data Structure:Array

Algorithm:
1.start using two pointer p1 & p2 which pints 1st & next element repectively
2.initialize p1=0 & p2=1;
3.loop down over array
a.check if their difference a[p2]-a[p1]==num if yes the print then & increment
both p1 & p2;
b.else if their difference a[p2]-a[p1]>num is less then number then increment j
only.
c.else if their difference a[p2]-a[p1]

int main()
{
int arr[9]={10, 14, 17, 22, 23, 25, 26, 27, 45};
int i = 0;
int j = 1;
int num = 2;
while(j<9) { if(arr[j] - arr[i] == num) { printf("two numbers whose difference is equal to num are %d %d",arr[i],arr[j]); i++;j++; } else if((arr[j]-arr[i])>num)
i++;
else if((arr[j]-arr[i]) j++;
}
}

Suggested by Rajcools
Time Complexity O(N)
Space complexity O(1)
Run Here https://ideone.com/zbdDh

Wap to Count Number of Set Bits in Hexadecimal Number

class Solution { public int hex_bitcount(String S); }

that given a string S containing big-endian hexadecimal representation of a non-negative integer N returns the number of bits set to 1 in binary representation of N.

Assume that the length of the string does not exceed 100,000. Assume that the string contains only characters '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'.

For example the function should return 5 when given S = "2F". The string "2F" represents the number 47. The binary representation of 47 is 101111 and it contains 5 bits set to 1.

Algorithm Used: Brian Kernighan’s Algorithm:
Subtraction of 1 from a number toggles all the bits (from right to left) till the rightmost set bit(including the righmost set bit). So if we subtract a number by 1 and do bitwise & with itself (n & (n-1)), we unset the righmost set bit. If we do n & (n-1) in a loop and count the no of times loop executes we get the set bit count.
Beauty of the this solution is number of times it loops is equal to the number of set bits in a given integer.

1 Initialize count: = 0
2 If integer n is not zero
(a) Do bitwise & with (n-1) and assign the value back to n
n: = n&(n-1)
(b) Increment count by 1
(c) go to step 2
3 Else return count


class Solution
{
int hex_bitcount ( String S )
{
int n=0,count=0;
if(S.length()<=100000)
{
n=Integer.parseInt(S,16);

while (n!=0)
{
n &= (n-1) ;
count++;
}
return count;
}
return -1;
}
public static void main(String a[])
{
Solution obj= new Solution();
System.out.println(obj.hex_bitcount("2F"));

}
}

Time Complexity O(logn)
Space Complexity O(1)
Run Here https://ideone.com/9vZ4V

Saturday, June 25, 2011

implement Bit operation for setting & re-setting the bit in range & at ith position

Operations are
Set(i) sets the ith bit O(1)
Unset(j) unsets the jth bit Set(i,j) O(1)
sets the bits b/w i and j Unset(i,j) o(logn)..??
unsets the bits b/w i and j O(logn) ???
Design a data structure for this problem with minimum time and space complexity.

#include
int set(int N,int i){
N = N | (1<=j;k--){
N = set(N,k);
}
return N;
}

int main(){

int N=7;
int i=3,j=2;
N=set(N,i);
printf("%d\n",N);

N=unset(N,j);
printf("%d\n",N);
N=set_range(0,3,1);
printf("%d\n",N);
/*unset(N,i,j);*/
return 0;
}

Time Complexity O(n)
Space Complexity O(1)

Optmization O(logn)
Yes we can use RMQ mechenism to do this , Segment Tree is Awesome Option to set upper bound for set(i,j,) & unset(i,j) will be O(logn)

Write an Efficient Method to Check if a Number is Multiple of 3 Efficiently

There is a pattern in binary representation of the number that can be used to find if number is a multiple of 3. If difference between count of odd set bits (Bits set at odd positions) and even set bits is multiple of 3 then is the number.

Example: 23 (00..10111)
1) Get count of all set bits at odd positions (For 23 it’s 3).
2) Get count of all set bits at even positions (For 23 it’s 1).
3) If difference of above two counts is a multiple of 3 then number is also a multiple of 3.

(For 23 it’s 2 so 23 is not a multiple of 3)

Take some more examples like 21, 15, etc…

Algorithm: isMutlipleOf3(n)
1) Make n positive if n is negative.
2) If number is 0 then return 1
3) If number is 1 then return 0
4) Initialize: odd_count = 0, even_count = 0
5) Loop while n != 0
a) If rightmost bit is set then increment odd count.
b) Right-shift n by 1 bit
c) If rightmost bit is set then increment even count.
d) Right-shift n by 1 bit
6) return isMutlipleOf3(odd_count - even_count)


This can be continued for all decimal numbers.
Above concept can be proved for 3 in binary numbers in the same way.

Time Complexity: O(logn)

Program:
?
#include

/* Fnction to check if n is a multiple of 3*/
int isMultipleOf3(int n)
{
int odd_count = 0;
int even_count = 0;

/* Make no positive if +n is multiple of 3
then is -n. We are doing this to avoid
stack overflow in recursion*/
if(n < 0) n = -n; if(n == 0) return 1; if(n == 1) return 0; while(n) { /* If odd bit is set then increment odd counter */ if(n & 1) odd_count++; n = n>>1;

/* If even bit is set then
increment even counter */
if(n & 1)
even_count++;
n = n>>1;
}

return isMultipleOf3(abs(odd_count - even_count));
}

/* Program to test function isMultipleOf3 */
int main()
{
int num = 23;
if (isMultipleOf3(num))
printf("num is multiple of 3");
else
printf("num is not a multiple of 3");
getchar();
return 0;
}

Time Complexity O(logn)
Space Complexity O(1)
Source http://www.geeksforgeeks?211

Write a program to merge two BST efficiently in O(n)

There are two methods to do this :-

1st
1)convert both tree into doubly linked list
2) merge both these doubly linked list
3) then create a tree from these merge linked list by taking median of this list as root and traverse left from root to make left subtree and traverse right from root to make right sub tree
this has complexity O(n1+n2) and space complexity O(1)

BinaryTree* sortedListToBST(ListNode *& list, int start, int end) {
if (start > end) return NULL;
// same as (start+end)/2, avoids overflow
int mid = start + (end - start) / 2;
BinaryTree *leftChild = sortedListToBST(list, start, mid-1);
BinaryTree *parent = new BinaryTree(list->data);
parent->left = leftChild;
list = list->next;
parent->right = sortedListToBST(list, mid+1, end);
return parent;
}



1)Find the inorder traversal of both the trees
2)Merge them into one (Giving one sorted tree)
3)Find the median of this sorted array(O(1))
Call this procedure recursively

struct tree
{
int data ;
tree * left , *right ;
};
void insert(int * array , int left , int right)
{
if (left<=right){ int mid=(left+right)/2; tree * root= malloc(sizeof (tree)); root->data=array[mid];
root->left=insert(array , left , mid-1);
root->right=insert(array , mid+1 , right);
return root;
}
}

Time Complexity O(N)

Imagine you have a special keyboard with the following keys: 1. A 2. Ctrl+A 3. Ctrl+C 4. Ctrl+V where CTRL+A, CTRL+C, CTRL+V each acts as one function key for “Select All”, “Copy”, and “Paste” operations respectively.

If you can only press the keyboard for N times (with the above four keys), please write a program to produce maximum numbers of A. If possible, please also print out the sequence of keys.

That is to say, the input parameter is N (No. of keys that you can press), the output is M (No. of As that you can produce).













A Most Effective Solution Is Given at
www.ihas1337code.com/2011/01/ctrla-ctrlc-ctrlv.html