Monday, February 28, 2011

Given a function which generates a random integer in the range 1 to 7, write a function which generates a random integer in the range 1 to 10 uniformy

This has been asked in Google interview and Amazon interview, and appears quite often as one of the few probabilistic analysis questions. You should be familiar with the concept of expected value, as it could be extremely helpful in probabilistic analysis.

Basic Idea

In C++, rand() generates random integral number between 0 and 32K. Thus to generate a random number from 1 to 10, we write rand() % 10 + 1. As such, to generate a random number from integer a to integer b, we write rand() % (b - a + 1) + a.
The interviewer told you that you had a random generator from 0 to 1. It means floating-point number generator.
How to get the answer mathematically:
  1. Shift the question to a simple form such that the lower bound is 0.
  2. Scale the range by multiplication
  3. Re-shift to the required range.
For example: to generate R such that
a <= R <= b.  Apply rule 1, we get a-a <= R - a <= b-a 
                       0 <= R - a <= b - a.  
Think R - a as R1. How to generate R1 such that R1 has range from 0 to (b-a)?
R1 = rand(0, 1) * (b-a)   // by apply rule 2.
Now substitute R1 by R - a
R - a = rand(0,1) * (b-a)    ==>   R = a + rand(0,1) * (b-a)
==== 2nd question - without explanation ====
We have 1 <= R1 <= 5
==>   0 <= R1 - 1             <= 4
==>   0 <= (R1 - 1)/4         <= 1
==>   0 <= 6 * (R1 - 1)/4     <= 6
==>   1 <= 1 + 6 * (R1 - 1)/4 <= 7
Thus, Rand(1,7) = 1 + 6 * (rand(1,5) - 1) / 4



Hint:
Assume you could generate a random integer in the range 1 to 49. How would you generate a random integer in the range of 1 to 10? What would you do if the generated number is in the desired range? What if it’s not?

Solution:
This solution is based upon Rejection Sampling. The main idea is when you generate a number in the desired range, output that number immediately. If the number is out of the desired range, reject it and re-sample again. As each number in the desired range has the same probability of being chosen, a uniform distribution is produced.

Obviously, we have to run rand7() function at least twice, as there are not enough numbers in the range of 1 to 10. By running rand7() twice, we can get integers from 1 to 49 uniformly. Why?

1 2 3 4 5 6 7
1 1 2 3 4 5 6 7
2 8 9 10 1 2 3 4
3 5 6 7 8 9 10 1
4 2 3 4 5 6 7 8
5 9 10 1 2 3 4 5
6 6 7 8 9 10 * *
7 * * * * * * *

A table is used to illustrate the concept of rejection sampling. Calling rand7() twice will get us row and column index that corresponds to a unique position in the table above. Imagine that you are choosing a number randomly from the table above. If you hit a number, you return that number immediately. If you hit a *, you repeat the process again until you hit a number.

Since 49 is not a multiple of tens, we have to use rejection sampling. Our desired range is integers from 1 to 40, which we can return the answer immediately. If not (the integer falls between 41 to 49), we reject it and repeat the whole process again.


int rand10() {
int row, col, idx;
do {
row = rand7();
col = rand7();
idx = col + (row-1)*7;
} while (idx > 40);
return 1 + (idx-1)%10;
}

Now to calculate the expected value for the number of calls to rand7() function.

E(# calls to rand7) = 2 * (40/49) +
4 * (9/49) * (40/49) +
6 * (9/49)2 * (40/49) +
...


= ∑ 2k * (9/49)k-1 * (40/49)
k=1

= (80/49) / (1 - 9/49)2
= 2.45



No comments :