Wednesday, August 3, 2011

Given a file containing 4,300,000,000 integers, how can you find one that appears at least twice

A Theoritical Approach From Programming Pearls

Binary search find an element that occurs at least twice by recursively searching the subinterval that contains more than half of the integers. My original solution did not guarantee that the number of integers is halved in each iteration, so the worst case run time of its log2 n passes was proportional to n·log n. Jim Saxe reduced that to linear time by observing that the search can avoid carrying too many duplicates.

When his search knows that a duplicate must be in a current range of m integers, it will only store m+1 integers on its current work tape;

If more integers would have gone on the tape, his program discards them. Although his method frequently ignores input variables, its strategy is conservative enough to ensure that it finds at least one duplicate.

The algorithm that Bentley is talking about works by repeatedly halving
the candidate range in which the duplicate element must lie. Initially
this range is 0..2^32-1. On each pass it throws away the half of the
range that contains fewer data values and it also throws away the data
lying in that half. Eventually the range decreases to a single value,
which must be a duplicate because the remaining data has at least two
elements. The problem that Bentley notes is that the data may still
have 4 billion elements at this stage! The final improvement he
mentions is that you can throw away data _inside_ the candidate range
as long as you keep enough data around to ensure that at least one
duplicate occurs in it. This is equal to the size of the current
candidate range plus 1!


Another Similer Approach I Found

Create a bit array of length 2^32 bits (initialize to zero), that would be about 512MB and will fit into RAM on any modern machine.

Start reading the file, int by int, check bit with the same index as the value of the int, if the bit is set you have found a duplicate, if it is zero, set to one and proceed with the next int from the file.

The trick is to find a suitable data structure and algorithm. In this case everything fits into RAM with a suitable data structure and a simple and efficient algorithm can be used.
If the numbers are int64 you need to find a suitable sorting strategy or make multiple passes, depending on how much additional storage you have available.

The Pigeonhole Principle -- If you have N pigeons in M pigeonholes, and N>M, there are at least 2 pigeons in a hole. The set of 32-bit integers are our 2^32 pigeonholes, the 4.3 billion numbers in our file are the pigeons. Since 4.3x10^9 > 2^32, we know there are duplicates.

You can apply this principle to test if a duplicate we're looking for is in a subset of the numbers at the cost of reading the whole file, without loading more than a little at a time into RAM-- just count the number of times you see a number in your test range, and compare to the total number of integers in that range. For example, to check for a duplicate between 1,000,000 and 2,000,000 inclusive:

int pigeons = 0;
int pigeonholes = 2000000 - 1000000 + 1; // include both fenceposts
for (each number N in file) {
if ( N >= 1000000 && N <= 2000000 ) { pigeons++ } } if (pigeons > pigeonholes) {
// one of the duplicates is between 1,000,000 and 2,000,000
// try again with a narrower range
}

Picking how big of range(s) to check vs. how many times you want to read 16GB of data is up to you :)

As far as a general algorithm category goes, this is a combinatorics (math about counting) problem.

Given an array[] of integers (+ive , -ive, 0) find the subarray which gives the largest product.

Data Structure Used: Array

Algorithm & Expliantion

Lets us take an array = { 2,-25,4,5,-3,-5}

We take 3 variables P , N , Val

P=1 , N=0 , Val=0

first value is 2 , A[0] which is +ve . So we multiply both P & N with A[0] .
P=2 , N=0

now V[1] = -25 -ve .

We multiply P with -V[1] & N with -V[1] .

P=50
N=0

As V[1] is -ve we swap P & N .
P=0 , N=50

if V[i] == 0
We initialise P=1 , N=0

if( P < 1 ) /* analogous to kadane's algo's if( sumSoFar < 0 ) */ { P=1; N=0; } at every step val = max( val , P ) We proceed in the same fashion till the end . What we are trying to do is to maintain max Positive product so far & max -ve product so far . Hence when we encounter a -ve value we multiply both with - V[i] & then swap as -ve * -ve = +ve -ve * +ve = -ve Working Code #include

int swap(int *a,int *b)
{
int t;
t=*a;
*a=*b;
*b=t;
}

int main()
{
int a[]={6,-1,2,-33,4,15,-7,28,-9,-10};
int maxp=1,maxn=0;
int i=0;
for(i=0;i<10;i++) { if(a[i]>0)
{
maxp*=a[i];
maxn*=a[i];
}
else
{
maxp*=-a[i];
maxn*=-a[i];
swap(&maxp,&maxn);
}


}

printf("%d",maxp);
}

Time Complexity O(N)
Space Conmplexity O(1)
Run Here https://ideone.com/VaKgk

Design An Algorithm to Exhibit Did You Mean Feature of Google Search Engine

When people search at Google, they often type two words without space. For example, instead of typing "binary tree", someone can type "binarytree". Given an input, what is a good way of finding if it is a combination of two valid words e.g. In Directly You have Add Space between two such or if are using some data structure , whene words would have been inserted , you would have mainted a boolen varible to defined the end of word (eod)


An Exanmple Will be Let's say you have a phrase without any spaces - eg. "thisisawesome". Given a dictionary, how would you add spaces in this string?





When people search at Google, they often type two words without space. For example, instead of typing "binary tree", someone can type "binarytree". Given an input, what is a good way of finding if it is a combination of two valid words e.g. In Directly You have Add Space between two such or if are using some data structure , where words would have been inserted , you would have maintained a boolean variable to defined the end of word (eod)
An Example Will be Let's say you have a phrase without any spaces - eg. "thisisawesome". Given a dictionary, how would you add spaces in this string?



        Trie trie = new Trie();
        trie.insert("i");
        trie.insert("this");
        trie.insert("saw");
        trie.insert("is");
        trie.insert("a");
        trie.insert("some");
        trie.insert("awesome");
        String inputString = "thisisawesome";
        String outputString = "";
      
        int left = 0;
        int right = 1;
        while ( right <= inputString.length()) 
        {
            {
                String str = inputString.substring(left,right);
                boolean found = trie.containsKey(str);
                if ( found) 
                {
                    outputString += str;
                    outputString += " ";
                    left += str.length();
                    right = left + 1;
                }
                 else
                 {
                    right++;
                }
            }
        }
      
   

Another Good Working Code  You Can Refer Avinash Solution


Time Complexity O(K) k is length of input string

Given a set of letters and a length N, produce all possible output.(Not permutation).

For example, give the letter (p,o) and length of 3, produce the following output(in any order you want, not just my example order)

ppp ppo poo pop opp opo oop ooo

another example would be given (a,b) and length 2

answer: ab aa bb ba

Tuesday, August 2, 2011

Division Operation without using division operator!!! FAQ in Core Dev Compny Telephonic

 We can use bit logic to reduce the time complexity to O(logn) where n is Quotient

Algorithm will be as follow as we know

1. Left shifting an unsigned number by 1 multiplies that number by 2.
2. Right shifting an unsigned number by 1 divides that number by 2.

Therefore the procedure for the division algorithm, given a dividend and a divisor .
core logic will be to left shift (multiply by 2) untill its greater then dividend , then continue this routine with the the difference between the dividend and divisor and divisor till the point where dividend is less than divisor or their difference is zero.

Lets see one example: dividend=23 divisor=3

then left shift for 3 i.e 3 will converted to3, 6,12,24,… since 12 <23 hence now dividend is 11 and quotient in 4(two time shift operation) now again 3,6,12.. 6<11 hence dividend is 11-6=5 and quotient =4+2=6 now only 3<5 hence remainder =2 quotient =6+1=7 so answer.

Time Complexity O(logn) Number of bits in Quotient

#include
using namespace std;

int dividend,divisor,remainder;
int division(int p,int q){
int quotient=1;
/*if divisor and diviend are equal then quotient=1*/
if(p==q){
remainder=0;
return 1;
}
/*if dividend is smaller than divisor then remainder=dividend*/
if(p dividend*/
while(p>=q){
q<<=1; quotient<<=1; } /*shift right for one time so that divisor become smaller than dividend*/ q>>=1;
quotient>>=1;
/*again call division recurcively*/
quotient+=division(p-q,divisor);
return quotient;
}
int main(){
cout<<"\nEnter dividend:"; cin>>dividend;
cout<<"\nEnter divisor:"; cin>>divisor;
cout<<"\nQuotient:"<<<"\nRemainder:"


}

Time Compelxity O(log(Quotient))
Space Complexity O(1)

(left shift ) one time i.e divide by 2 and if we do <<(right shift) one time i.e multiplied by 2 if we do n times the 2power n time it will do same operation) simultanuosly we are doing left shift for quotient also. Thus we get quotient for that division. After doing this one time again we call the function recursively after subtracting the with remaining numbers.

Lets see one example: dividend=23 divisor=3

then left shift for 3 i.e 3 will converted to3, 6,12,24,… since 12 <23 hence now dividend is 11 and quotient in 4(two time shift operation) now again 3,6,12.. 6<11 hence dividend is 11-6=5 and quotient =4+2=6 now only 3<5 hence remainder =2 quotient =6+1=7 so answer got


Similar  Iterative program for doing the same is


int div(int a, int b)
{
if(!b) return -1;
if(a<b) {
printf("%d / %d = %d , remainder %d\n", a, b, 0, a);
return 0;
}
int q = 1, t=d, d=1;
while (t<a)
{
d=t;
t=t<<2;
q=q<<1;
}
while (d+b<a)
{
d=d+b;
++q;
}
printf("%d / %d = %d , remainder %d\n", a, b, q, a-d);
return 0;
}


A Good Discussion You Can Also Found here Algogeek

Design an algorithm such that maximum number of jobs can be performed in that given time interval.

You are given N jobs with their start and end time mentioned.These jobs may have their timings overlapped.You have to suggest an algorithm such that maximum number of jobs can be performed in that given time interval.

Given a set of n jobs j1, j2, · · · , jn, find a set of compatible jobs with the

maximum weight.
• Each job ji starts at sji and and finishes at fji .
• Each job ji has weight (value) vji .
• Two jobs are compatible if their intervals do not overlap.
In other words, we are trying to find a set of jobs ji1 , ji2 , · · · , jik such that the following is maximized:

sum(vjix) for each x=1 to k

The dynamic programming paradigm gives us two options when examining
each job. The optimal solution either contains the job, or it doesn’t. As
an example, if the optimal solution contains j1, then we pick j1 and find the
optimal solution for jobs compatible with j1. If the optimal solution does
not contain j1, then we find the optimal solution for j2, j3, · · · , jn. (This assumes that the jobs are arranged in order of their finish time, such that fj1
Once we select a job to be a part of the optimal solution, we exclude a
host of other jobs that are incompatible with the selection. To assist in our
definition of an optimal solution, we define jip to be the largest index job,
ip < i, that is compatible with ji. In other words, jip is the job for which
sji − fjip is minimized.

We can then let OPT(i) be the set containing the optimal solution for
jobs j1, j2, · · · , ji. It is defined as follows for each ji:

• Case 1: OPT selects ji
– We cannot select jobs jip+1, · · · , ji−1, so we exclude them from
consideration.
– We recurse and find the optimal solution for jobs j1, j2, · · · , jip by
setting OPT(i) = vji + OPT(ip).

• Case 2: OPT does not select ji
– We recurse and find the optimal solution for jobs j1, j2, · · · , ji−1
by setting OPT(i) = OPT(i − 1).

Since we don’t know which case will return the optimal solution, we must
try both. Once we have calculated both possibilities, we pick the case which
returns the larger set. Now we can create a more compact definition, together
with a simple implementation:

OPT(i) = max{vji + OPT(ip),OPT(i − 1)} | i 1
= 0 | otherwise

The recursive algorithm looks like

COMPUTE_OPT(i):
if i == 0:
return 0
return max {
COMPUTE_OPT(i - 1),
v[j(i)] + COMPUTE_OPT(p(i))
}

The problem with the algorithm as it stands is that a lot of computation is
repeated. This is visualized by looking at the recursive tree structure:

-COMPUTE_OPT(5)
-COMPUTE_OPT(3)
-COMPUTE_OPT(1)
-COMPUTE_OPT(2)
-COMPUTE_OPT(0)
-COMPUTE_OPT(1)
-COMPUTE_OPT(4)
-COMPUTE_OPT(2)
-COMPUTE_OPT(0)
-COMPUTE_OPT(1)
-COMPUTE_OPT(3)
-COMPUTE_OPT(1)
-COMPUTE_OPT(2)
-COMPUTE_OPT(0)
-COMPUTE_OPT(1)

It is easy to see that for even a small set of jobs, the amount of repeated
computation is huge. To avoid this, we introduce memoization, which stores
the values for OPT(i) in a global array as they are calculated. If the value is
needed later, it is simply retrieved from the array, pruning the recursive tree
such that each descent only happens once. The following revised algorithm
implements memoization:
M_COMPUTE_OPT(i):
if M[i] != null:
return M[i]
if i == 0:
M[0] = 0
else:
M[i] = max {
M_COMPUTE_OPT(i - 1),
v[j(i)] + M_COMPUTE_OPT(p(i))
}
return M[i]

This memoized version saves valuable computational resources, but each
recursive call adds overhead. The algorithm runs in O(n) time, but its recursive
nature makes it bulky.

We can eliminate the overhead of recursion by constructing an iterative (nonrecursive)
algorithm that builds a bottom-up list of the values.
M[0] = 0
for i = 1 to n:
M[i] = max {
M(i - 1),
v[j(i)] + M(p(i))
}
This is obviously linear, and without the added computational weight of
a recursive algorithm. To reconstruct the solution, we simply run backwards
through the array (take M[n] and compare to M[n-1], etc.).
PRINT_SOLUTION(i):
if i == 0:
return
if v[j(i)] + M[p(i)] == M[i]:
print j(i)
PRINT_SOLUTION(p(i))
else:
PRINT_SOLUTION(i - 1)

Time Complexity O(N)
Space Compleixty O(N)


Resources
http://en.wikipedia.org/wiki/Interval_scheduling
http://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec3.pdf
http://www.cs.cornell.edu/courses/cs482/2007sp/dynamic.pdf
http://www.cs.uiuc.edu/class/sp08/cs473/Lectures/lec10.pdf
http://www.cs.sfu.ca/~ssa121/personal/spring08/705/dp.pdf
http://www.cs.uiowa.edu/~hzhang/c31/6-dynamic.pdf
classes.soe.ucsc.edu/.../06dynamic-programming-weighted-interv-sched.ppt
http://www.cs.princeton.edu/courses/archive/spr05/cos423/lectures/06dynamic-programming.pdf
http://www.cse.psu.edu/~asmith/courses/cse565/F08/www/lec-notes/CSE565-F08-Lec-17.pdf

Fill the rectangle with squares

Given a rectangle with known width and height, design an algorithm to fill the rectangle using n squares(n is integer given) and make sure in the result the wasting area is minimized. Length of square doesn't have to be integer.
i.e, given width=3,height=2,n=5, one solution is that rectangle can be filled with five 1x1 squares and the wasting area is 1. Another solution could be filled with five 0.9x0.9 squares, but the wasting area is more than first solution.

Given two arrays of numbers, find if each of the two arrays have the same set of ntegers ? Suggest an algo which can run faster than NlogN without extra space?

Monday, August 1, 2011

Design a data structure that offers the following operations in O(1) time: e.g. in contant time * insert * remove * contains * get random element (one of Only those elemnt currently present in data structure

we have to think of all possible DS that can do some of the operation in desired complexity then we can come up with some combinations of Data structure so that we can get what question is asking for :)

So its clear O(1) lookup implies a hashed data structure.

By comparison:

* O(1) insert/delete with O(N) lookup implies a linked list.
* O(1) insert, O(N) delete, and O(N) lookup implies an array-backed list
* O(logN) insert/delete/lookup implies a tree(BST) or heap.

we basically need to find out some kind of combinations from above DS .

First I thought About Doubly Linked List instead of Array but soon realized that we won't get desired complexity isn't it ? even we can achieve some of operation in O(1) but how u will make sure all will be done in O(1) ??

so lets Consider a data structure composed of a hashtable H and an array A. The hashtable keys are the elements in the data structure, and the values are their positions in the array.

here can be possible way to achieve the desired time complexity

1. insert(value): append the value to array and let i be it's index in A. Set H[value]=i.

2. remove(value): We are going to replace the cell that contains value in A with the last element in A. let d be the last element in the array A at index m. let i be H[value], the index in the array of the value to be removed. Set A[i]=d, H[d]=i, decrease the size of the array by one, and remove value from H.

3. contains(value): return H.contains(value) no doubt in this :)

4. getRandomElement(): let r=random(current size of A). return A[r].
e.g. index=random()%size of current array will generate index in range & then we can return element at given index . this will make sure that this function will generate the random number from  present elements of data structure only.

Time Complexity Will be O(1)
Space Complexity O(N) , N is size of hashtable

Design a data structure that offers the following operations in O(1) time: * insert * remove * contains * get random element (one of Only those elemnt currently present in data structures)