Thursday, August 4, 2011
Design a Data Structure of size O(n) (e.g. of Given Tree Size n) so that you can answer any such query in O(log n) time.
Given a rooted tree of size n . You receive a series of online queries : "Give nearest common ancestor of u,v " . Your objective is to preprocess the tree in O(n) time to get a data structure of size O(n) so that you can answer any such query in O(log n) time.
Labels:Data
Amazon Interview
,
Google Interview
Give a data-structure which will guarantee O(log n) time per operation.
Complete binary tree as an efficient data-structure
You are given an array of size n(n being a power of two). All the entries of the array are initialized to zero. You have to perform a sequence of the following online operations :
* (i) Add(i,x) which adds x to the entry A[i].
* (ii) Report sum(i,j) = sum of the entries in the array from indices i to j for any 0 < i < j <= n.
It can be seen easily that we can perform the first operation in O(1) time whereas the second operation may cost O(n) in worst case. Your objective is to perform these operations efficiently. Give a data-structure which will guarantee O(log n) time per operation. (The title of the problem is a hint).
You are given an array of size n(n being a power of two). All the entries of the array are initialized to zero. You have to perform a sequence of the following online operations :
* (i) Add(i,x) which adds x to the entry A[i].
* (ii) Report sum(i,j) = sum of the entries in the array from indices i to j for any 0 < i < j <= n.
It can be seen easily that we can perform the first operation in O(1) time whereas the second operation may cost O(n) in worst case. Your objective is to perform these operations efficiently. Give a data-structure which will guarantee O(log n) time per operation. (The title of the problem is a hint).
Labels:Data
Amazon Interview
,
Google Interview
Merge the two Binary Search Trees in time O(log m + log n)
You are given two height balanced binary search trees T and T', storing m and n elements respectively. Every element of tree T is smaller than every element of tree T'. Every node u also stores height of the subtree rooted at it. Using this extra information how can you merge the two trees in time O(log m + log n) (preserving both the height balance and the order)?
Labels:Data
Amazon Interview
Determine whether the kth largest element of the heap is greater than x or not.
Consider a binary heap containing n numbers (the root stores the greatest number). You are given a positive integer k < n and a number x . You have to determine whether the kth largest element of the heap is greater than x or not. Your algorithm must take O(k) time. You may use O(k) extra storage.
Labels:Data
Amazon Interview
Searching for a friend
You are standing at a crossing from where there emerge four roads extending to infinity. Your friend is somewhere on one of the four roads. You do not know on which road he is and how far he is from you. You have to walk to your friend and the total distance traveled by you must be at most a constant times the actual distance of your friend from you. In terminology of algorithms, you should traverse O(d) distance, where d is the distance of your friend from you.
Labels:Data
Google Interview
Searching For Celebrity
Celebrity is a person whom everybody knows but he knows nobody. You have gone to a party. There are total n persons in the party. Your job is to find the celebrity in the party. You can ask questions of the form Does Mr. X know Mr. Y ?. You will get a binary answer for each such question asked. Find the celebrity by asking only O(n) questions.
Labels:Data
Amazon Interview
,
Google Interview
Wednesday, August 3, 2011
Given a file containing 4,300,000,000 integers, how can you find one that appears at least twice
A Theoritical Approach From Programming Pearls
Binary search find an element that occurs at least twice by recursively searching the subinterval that contains more than half of the integers. My original solution did not guarantee that the number of integers is halved in each iteration, so the worst case run time of its log2 n passes was proportional to n·log n. Jim Saxe reduced that to linear time by observing that the search can avoid carrying too many duplicates.
When his search knows that a duplicate must be in a current range of m integers, it will only store m+1 integers on its current work tape;
If more integers would have gone on the tape, his program discards them. Although his method frequently ignores input variables, its strategy is conservative enough to ensure that it finds at least one duplicate.
The algorithm that Bentley is talking about works by repeatedly halving
the candidate range in which the duplicate element must lie. Initially
this range is 0..2^32-1. On each pass it throws away the half of the
range that contains fewer data values and it also throws away the data
lying in that half. Eventually the range decreases to a single value,
which must be a duplicate because the remaining data has at least two
elements. The problem that Bentley notes is that the data may still
have 4 billion elements at this stage! The final improvement he
mentions is that you can throw away data _inside_ the candidate range
as long as you keep enough data around to ensure that at least one
duplicate occurs in it. This is equal to the size of the current
candidate range plus 1!
Another Similer Approach I Found
Create a bit array of length 2^32 bits (initialize to zero), that would be about 512MB and will fit into RAM on any modern machine.
Start reading the file, int by int, check bit with the same index as the value of the int, if the bit is set you have found a duplicate, if it is zero, set to one and proceed with the next int from the file.
The trick is to find a suitable data structure and algorithm. In this case everything fits into RAM with a suitable data structure and a simple and efficient algorithm can be used.
If the numbers are int64 you need to find a suitable sorting strategy or make multiple passes, depending on how much additional storage you have available.
The Pigeonhole Principle -- If you have N pigeons in M pigeonholes, and N>M, there are at least 2 pigeons in a hole. The set of 32-bit integers are our 2^32 pigeonholes, the 4.3 billion numbers in our file are the pigeons. Since 4.3x10^9 > 2^32, we know there are duplicates.
You can apply this principle to test if a duplicate we're looking for is in a subset of the numbers at the cost of reading the whole file, without loading more than a little at a time into RAM-- just count the number of times you see a number in your test range, and compare to the total number of integers in that range. For example, to check for a duplicate between 1,000,000 and 2,000,000 inclusive:
int pigeons = 0;
int pigeonholes = 2000000 - 1000000 + 1; // include both fenceposts
for (each number N in file) {
if ( N >= 1000000 && N <= 2000000 ) { pigeons++ } } if (pigeons > pigeonholes) {
// one of the duplicates is between 1,000,000 and 2,000,000
// try again with a narrower range
}
Picking how big of range(s) to check vs. how many times you want to read 16GB of data is up to you :)
As far as a general algorithm category goes, this is a combinatorics (math about counting) problem.
Binary search find an element that occurs at least twice by recursively searching the subinterval that contains more than half of the integers. My original solution did not guarantee that the number of integers is halved in each iteration, so the worst case run time of its log2 n passes was proportional to n·log n. Jim Saxe reduced that to linear time by observing that the search can avoid carrying too many duplicates.
When his search knows that a duplicate must be in a current range of m integers, it will only store m+1 integers on its current work tape;
If more integers would have gone on the tape, his program discards them. Although his method frequently ignores input variables, its strategy is conservative enough to ensure that it finds at least one duplicate.
The algorithm that Bentley is talking about works by repeatedly halving
the candidate range in which the duplicate element must lie. Initially
this range is 0..2^32-1. On each pass it throws away the half of the
range that contains fewer data values and it also throws away the data
lying in that half. Eventually the range decreases to a single value,
which must be a duplicate because the remaining data has at least two
elements. The problem that Bentley notes is that the data may still
have 4 billion elements at this stage! The final improvement he
mentions is that you can throw away data _inside_ the candidate range
as long as you keep enough data around to ensure that at least one
duplicate occurs in it. This is equal to the size of the current
candidate range plus 1!
Another Similer Approach I Found
Create a bit array of length 2^32 bits (initialize to zero), that would be about 512MB and will fit into RAM on any modern machine.
Start reading the file, int by int, check bit with the same index as the value of the int, if the bit is set you have found a duplicate, if it is zero, set to one and proceed with the next int from the file.
The trick is to find a suitable data structure and algorithm. In this case everything fits into RAM with a suitable data structure and a simple and efficient algorithm can be used.
If the numbers are int64 you need to find a suitable sorting strategy or make multiple passes, depending on how much additional storage you have available.
The Pigeonhole Principle -- If you have N pigeons in M pigeonholes, and N>M, there are at least 2 pigeons in a hole. The set of 32-bit integers are our 2^32 pigeonholes, the 4.3 billion numbers in our file are the pigeons. Since 4.3x10^9 > 2^32, we know there are duplicates.
You can apply this principle to test if a duplicate we're looking for is in a subset of the numbers at the cost of reading the whole file, without loading more than a little at a time into RAM-- just count the number of times you see a number in your test range, and compare to the total number of integers in that range. For example, to check for a duplicate between 1,000,000 and 2,000,000 inclusive:
int pigeons = 0;
int pigeonholes = 2000000 - 1000000 + 1; // include both fenceposts
for (each number N in file) {
if ( N >= 1000000 && N <= 2000000 ) { pigeons++ } } if (pigeons > pigeonholes) {
// one of the duplicates is between 1,000,000 and 2,000,000
// try again with a narrower range
}
Picking how big of range(s) to check vs. how many times you want to read 16GB of data is up to you :)
As far as a general algorithm category goes, this is a combinatorics (math about counting) problem.
Labels:Data
Google Interview
Given an array[] of integers (+ive , -ive, 0) find the subarray which gives the largest product.
Data Structure Used: Array
Algorithm & Expliantion
Lets us take an array = { 2,-25,4,5,-3,-5}
We take 3 variables P , N , Val
P=1 , N=0 , Val=0
first value is 2 , A[0] which is +ve . So we multiply both P & N with A[0] .
P=2 , N=0
now V[1] = -25 -ve .
We multiply P with -V[1] & N with -V[1] .
P=50
N=0
As V[1] is -ve we swap P & N .
P=0 , N=50
if V[i] == 0
We initialise P=1 , N=0
if( P < 1 ) /* analogous to kadane's algo's if( sumSoFar < 0 ) */ { P=1; N=0; } at every step val = max( val , P ) We proceed in the same fashion till the end . What we are trying to do is to maintain max Positive product so far & max -ve product so far . Hence when we encounter a -ve value we multiply both with - V[i] & then swap as -ve * -ve = +ve -ve * +ve = -ve Working Code #include
int swap(int *a,int *b)
{
int t;
t=*a;
*a=*b;
*b=t;
}
int main()
{
int a[]={6,-1,2,-33,4,15,-7,28,-9,-10};
int maxp=1,maxn=0;
int i=0;
for(i=0;i<10;i++) { if(a[i]>0)
{
maxp*=a[i];
maxn*=a[i];
}
else
{
maxp*=-a[i];
maxn*=-a[i];
swap(&maxp,&maxn);
}
}
printf("%d",maxp);
}
Time Complexity O(N)
Space Conmplexity O(1)
Run Here https://ideone.com/VaKgk
Algorithm & Expliantion
Lets us take an array = { 2,-25,4,5,-3,-5}
We take 3 variables P , N , Val
P=1 , N=0 , Val=0
first value is 2 , A[0] which is +ve . So we multiply both P & N with A[0] .
P=2 , N=0
now V[1] = -25 -ve .
We multiply P with -V[1] & N with -V[1] .
P=50
N=0
As V[1] is -ve we swap P & N .
P=0 , N=50
if V[i] == 0
We initialise P=1 , N=0
if( P < 1 ) /* analogous to kadane's algo's if( sumSoFar < 0 ) */ { P=1; N=0; } at every step val = max( val , P ) We proceed in the same fashion till the end . What we are trying to do is to maintain max Positive product so far & max -ve product so far . Hence when we encounter a -ve value we multiply both with - V[i] & then swap as -ve * -ve = +ve -ve * +ve = -ve Working Code #include
int swap(int *a,int *b)
{
int t;
t=*a;
*a=*b;
*b=t;
}
int main()
{
int a[]={6,-1,2,-33,4,15,-7,28,-9,-10};
int maxp=1,maxn=0;
int i=0;
for(i=0;i<10;i++) { if(a[i]>0)
{
maxp*=a[i];
maxn*=a[i];
}
else
{
maxp*=-a[i];
maxn*=-a[i];
swap(&maxp,&maxn);
}
}
printf("%d",maxp);
}
Time Complexity O(N)
Space Conmplexity O(1)
Run Here https://ideone.com/VaKgk
Labels:Data
Amazon Interview
,
Google Interview
Subscribe to:
Posts
(
Atom
)