Tuesday, June 21, 2011

Which is the Best Data structure which is supposed to log number of user requests per second.

Make a Data structure which is supposed to log number of user requests per second. At any point of time your boss can ask you the number of hits for the last 60 seconds. It can be at any time for example he will say at 71st second that tell me how many hits for past 30 seconds or something, but this window can go maximum upto 60 seconds to in the previous example 11 to 71.


Data Structure Used: Queue(Basic Approach),Circuler Queue,Binary Index Tree(Most Efficient)

Algorithm & Solution

Take a queue of length 60. Each second do a Deque() and enque(#of users logged this second). For query traverse from tail of the queue to that many seconds as asked and sum all of those.

Algorithm: (Optimized with array)

N <- 60
A is an array of length N. holds the queue. Initialized to 0.
p is the pointer to head of the queue
tick is the system clock whose value increases by 1 each second

enque (n) // does a simultaneous deque()
{
p <- (p+N-1) mod N
A[p] <- n
}

update()
{
n <- 0
t <- tick
while true
{
if( t == tick && new user has logged in)
{
n <- n+1
}

else if (t < tick)
{
enque(n)
n <- 0
t <- tick
}

}
Query(t)
{
sum <- 0
for i <- 0 to t-1
{
sum <- sum + A[ (p+i) mod N]
}

return sum
}

where tick is the memory/register where system keeps track of each second passed by incrementing its value by 1 every second. tick can be read by any program in the system but can be written only by system clock hw/sw. Now, in the 'while' loop, eventually tick will be incremented by system clock hw/sw each second behind the scene, while, t will remain at previous value and an enque() will happen along with update of t.
This algorithm is optimized for enque() since this happens every second while a Query happens asynchronously and its frequency might be too less than frequency of enque() operations.

Time Complexity
Space Complexity

Optmization

We can use Binary Indexed Tree/ Fenwick Tree. Updating the array as well cumulative sum will be of O(log n) complexity.

Of course we have to put a limit to after which time(i/p) we cant query as we cant store infinite amount of data, or if yes partioning of data would be needed(if thats the intention the above is not efficient).

Feel Free to Comment or Optimize The Solution

No comments :