Thursday, October 20, 2011

Given an integer N, there are 2^N binary codewords of length N. All of them are taken and sorted to create a sequence: Sort by the number of 1-bits in the codeword bit-string. If there is a tie, break tie so that the smaller number comes first in the output sequence

Example  and Testcases :You would be given 2 integers N (<=10) and K (1 <= K <= 2^N)
Output the codeword with index K into this sequence
The input is from stdin and output to stdout

Sample Testcases:
Input #00:
3 5
Output #00:
011
Explanation:
For N = 3, the sequence consists of {000,001,010,100,011,101,110,

111}. The 5th indexed number would be 011

Input #01:
7 127
Output #01:
1111110

An array of integers is given and you have to find the largest possible integer by concatenating all elements:

   Example:

array:  87  36  52
answer:  875236

array: 87 9 52
answer: 98752 

Provide an efficient algorithm to solve the question "Can I buy N items?

 A shop sells an item in packets of 6, 9, and 17. A customer can buy any number of packets, but a packet cannot be broken up. Provide an efficient algorithm to solve the question "Can I buy N items?". For example, is it possible to buy 29 items? What about 19? Your algorithm simply needs to return true or false.

Design Data Structure For Students-Courses Realtionship

In a university, students can enroll in different courses. A student may enroll for more than one course. Both students and courses can be identified by IDs given to them. Design a data structure to store students, courses, and the student-course relationships. You can use arrays, lists, stacks, trees, graphs, etc. or come up with your own data structures. Give the running times, in Big O notation, for the following operations for your data structure and justify the answers: a) Return all students in a list. b) Return all courses in a list. c) Return all courses in a list for a given student. d) Return all students in a list for a given course.



Thursday, October 13, 2011

Determine a user’s influence on the system by determining how many users he is responsible for


You Know Invitation Based System. on Google+, Gmail,Yahoo,Facebook etc...users can add their friends by emailing a . We want to determine a user’s influence on the system by determining how many users he is responsible for. A user’s influence is calculated by giving him 1 point for every user he’s added in addition to the sum of the influence scores of each user that he’s added.
Example: User 0 adds User 1 and User 2. User 1 adds User 3.
User 0’s influence score is 3. (He added two users and one of them added added a third user.)
User 1's is 1.
User 2’s is 0.
User 3’s is 0.

P.S.:One of The Question Asked By Google M.V.  From Me. :)
The above scenario is represented by the following input file. Line i is user ID i and position j within the line is marked with an X if user ID iadded user ID j. Both row and column indicies are 0-based:
 OXXO
 OOOX
 OOOO
 OOOO
      


-->

Sunday, October 9, 2011

How Google Won User's Heart :: Google Auto-Suggestion/Auto-Completion Algorithm Exposed

How Auto Suggestion Works e.g. How Google Won User's Heart
You've seen search engines suggest queries when you begin typing the first few letters of your search string. This is being done by Duck Duck Go as well as Google (to name a few). This is typically done by maintaining a list of past queries and/or important strings that the search engine thinks are worthy of being suggested to a user that is trying to find something similar. These suggestions are effective only if the search engine spits them out very fast since these should show up on the screen before the user has finished typing what he/she wanted to type. Hence the speed with which these suggestions are made is very critical to the usefulness of this feature.

Let us consider a situation (and a possible way of approaching this problem) in which when a user enters the first few letters of a search query, he/she is presented with some suggestions that have as their prefix, the string that the user has typed. Furthermore, these suggestions should be ordered by some score that is associated with each such suggestion.

Approach-1:

Our first attempt at solving this would probably involve keeping the initial list of suggestions sorted in lexicographic order so that a simple binary search can give us the 2 ends of the list of strings that serve as candidate suggestions. These are all the strings that have the user's search query as a prefix. We now need to sort all these candidates by their associated score in non-increasing order and return the first 6 (say). We will always return a very small subset (say 6) of the candidates because it is not feasible to show all candidates since the user's screen is of bounded size and we don't want to overload the user with too many options. The user will get better suggestions as he/she types in more letters into the query input box.

We immediately notice that if the candidate list (for small query prefixes say of length 3) is large (a few thousand), then we will be spending a lot of time sorting these candidates by their associated score. The cost of sorting is O(n log n) since the candidate list may be as large as the original list in the worst case. Hence, this is the total cost of the approch. Apache's solr uses this approach. Even if we keep the scores bound within a certain range and use bucket sort, the cost is still going to be O(n). We should definitely try to do better than this.


Approach-2:

One way of speeding things up is to use a Trie and store (pointers or references to) the top 6 suggestions at or below that node in the node itself. This idea is mentioned here. This results in O(m) query time, where m is the length of the prefix (or user's search query).

However, this results in too much wasted space because:
  1. Tries are wasteful of space and
  2. You need to store (pointers or references to) 6 suggestions at each node which results in a lot of redundancy of data

We can mitigate (1) by using Radix(or Patricia) Trees instead of Tries.


Approach-3:

There are also other approaches to auto-completion such as prefix expansion that are using by systems such asredis. However, these approaches use up memory proportional to the square of the size of each suggestion (string). The easy way to get around this is to store all the suggestions as a linear string (buffer) and represent each suggestion as an (index,offset) pair into that buffer. For example, suppose you have the strings:
  1. hello world
  2. hell breaks lose
  3. whiteboard
  4. blackboard

Then your buffer would look like this:
hello worldhell breaks losewhiteboardblackboard
And the 4 strings are actually represented as:
(0,11), (11,16), (27,10), (37,10)

Similarly, each prefix of the suggestion "whiteboard" is:
  1. w => (27,1)
  2. wh => (27,2)
  3. whi => (27,3)
  4. whit => (27,4)
  5. white => (27,5)
  6. whiteb => (27,6)
  7. and so on...

Approach-4:

We can do better by using Segment (or Interval) Trees. The idea is to keep the suggestions sorted (as in approach-1), but have an additional data structure called a Segment Tree which allows us to perform range searches very quickly. You can query a range of elements in Segment Tree very efficiently. Typically queries such as min, max, sum, etc... on ranges in a segment tree can be answered in O(log n) where n is the number of leaf nodes in the Segment Tree. So, once we have the 2 ends of the candidate list, we perform a range search to get the element with the highest score in the list of candidates. Once we get this node, we insert this range (with the maximum score in that range as the key) into the priority queue. The top element in the queue is popped and split at the location where the element with the highest score occurs and the scores for the 2 resulting ranges are computed and pushed back into the priority queue. This continues till we have popped 6 elements from the priority queue. It is easy to see that we will have never considered more than 2k ranges (here k = 6).

Hence, the complexity of the whole process is the sum of:
  1. The complexity for the range calculation: O(log n) (omitting prefix match cost) and
  2. The complexity for a range search on a Segment Tree performed 2k times: O(2k log n) (since the candidate list can be at most 'n' in length)

Giving a total complexity of:
O(log n) + O(2k log n)
which is: O(k log n)

Some Useful Links
http://en.wikipedia.org/wiki/Autocomplete
http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/
http://stevedaskam.wordpress.com/2009/05/17/autocomplete-data-structures/
http://sujitpal.blogspot.com/2007/02/three-autocomplete-implementations.html
http://stackoverflow.com/questions/73095/how-to-implement-a-simple-auto-complete-functionality
http://code.google.com/p/autocomplete-server/
http://suggesttree.sourceforge.net/
http://www.quora.com/What-is-the-best-open-source-solution-for-implementing-fast-auto-complete
http://www.phpfreaks.com/forums/index.php?topic=240881.0
http://dhruvbird.blogspot.com/2010/09/very-fast-approach-to-search.html