Saturday, March 19, 2011

WAP to Efficient Implementation of Trie Data Structure

TRIE is an interesting data-structure used mainly for manipulating with Words in a language. This word is got from the word retrieve. TRIE (pronounced as 'try') has a wide variety of applications in

* Spell checking
* Data compression
* Computational biology
* Routing table for IP addresses
* Storing/Querying XML documents etc.,

We shall see how to construct a basic TRIE data structure in Java.

Data Structure :Trie (Re"trie"ve)

Algorithm For Insertion In Trie

Any insertion would ideally be following the below algorithm.

1. If the input string length is zero, then set the marker for the root node to be true.
2. If the input string length is greater than zero, repeat steps 3 and 4 for each character
3. If the character is present in the child node of the current node, set the current node point to the child node.
4. If the character is not present in the child node, then insert a new node and set the current node to that newly inserted node.
5. Set the marker flag to true when the end character is reached.

Now if you go through the already written code for this, you can have a better understanding by comparing it with the above algorithm.


Algorithm For Searching In Trie

The search alogirthm involves the following steps

1. For each character in the string, see if there is a child node with that character as the content.
2. If that character does not exist, return false
3. If that character exist, repeat step 1.
4. Do the above steps until the end of string is reached.
5. When end of string is reached and if the marker of the current Node is set to true, return true, else return false.

Complexity Ananlysis

operations on how to work with a TRIE, we shall now see the space and time complexities in order to get a real feel of how good a TRIE data structure is. Lets take the two important operations INSERT and SEARCH to measure the complexity.

INSERT operation first. Lets always take into account the worst case timing first and later convince ourselves of the practical timings. For every Node in the TRIE we had something called as Collection where the Collection can be either a Set or a List. If we choose Set, the order of whatever operation we perform over that will be in O(1) time, whereas if we use a LinkedList the number of comparisons at worst will be 26 (the number of alphabets). So for moving from one node to another, there will be at least 26 comparisons will be required at each step.

Having these in mind, for inserting a word of length 'k' we need (k * 26) comparisons. By Applying the Big O notation it becomes O(k) which will be again O(1). Thus insert operations are performed in constant time irrespective of the length of the input string (this might look lik an understatement, but if we make the length of the input string a worst case maximum, this sentence holds true).

Same holds true for the search operation as well. The search operation exactly performs the way the insert does and its order is O(k*26) = O(1).

To understand this, you will have to have a deep knowledge on what "Order" means. The Big O notation is used to tell how the complexity increases according to the increase in the input space. For TRIE ADT, the input space is the number of words that we use. Assume we have 10 words, the insert operation will take a total of its word length multiplied by the 26 letters in the linked list. Which will be equal to k*26. Applying Big O we have O(k*26) = O(1). The reason is this does not depend upon the increase in the number of words that the TRIE is going to have.
Even if the TRIE has a billion words, the insert operation will take a constant time of k*26 which is irrespective of 'n'. So, if you plot a curve you could see that its constant and in Big O terms, it is called as O(1) or constant time insertion


Working Code:

import java.util.Collection;
import java.util.LinkedList;
import java.io.*;

class Node
{
char content;
boolean marker;
Collection child;

public Node(char c){
child = new LinkedList();
marker = false;
content = c;
}

public Node subNode(char c){
if(child!=null){
for(Node eachChild:child){
if(eachChild.content == c){
return eachChild;
}
}
}
return null;
}
}


class Trie{
private Node root;

public Trie(){
root = new Node(' ');
}

public void insert(String s){
Node current = root;
if(s.length()==0) //For an empty character
current.marker=true;
for(int i=0;i Node child = current.subNode(s.charAt(i));
if(child!=null){
current = child;
}
else{
current.child.add(new Node(s.charAt(i)));
current = current.subNode(s.charAt(i));
}
// Set marker to indicate end of the word
if(i==s.length()-1)
current.marker = true;
}
}

public boolean search(String s){
Node current = root;
while(current != null){
for(int i=0;i if(current.subNode(s.charAt(i)) == null)
return false;
else
current = current.subNode(s.charAt(i));
}
/*
* This means that a string exists, but make sure its
* a word by checking its 'marker' flag
*/
if (current.marker == true)
return true;
else
return false;
}
return false;
}
}

public class TrieLoader
{

public static Trie trieDSA;

public static void main(String[] args)
{
TrieLoader trieLoader = new TrieLoader();
trieLoader.load();

System.out.println(trieDSA.search("bate")) ;

}

public void load(){
trieDSA = new Trie();
BufferedReader br = null;
try {
br = new BufferedReader(new InputStreamReader(System.in));
String eachLine = null;
while(!(eachLine=br.readLine()).equals("quit")){
trieDSA.insert(eachLine);
}
} catch (Exception e) {
e.printStackTrace();
} finally{
if(br!=null){
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}

Processing Time of Dictionary is O(N) N is length of documnet we wants to preprocess.

Time Complexity & Space Complexity Explined Above in Detail as said it is contants e.g. O(k) lenth of String for all operation like inesrting,serching & deleting words in dictionary or trie even if we we wants to play around billions of words :)thats the power of trie.

Time Complexity O(K)
Run Here https://ideone.com/eHGjG

Example Application Trie
http://www.topcoder.com/stat?c=problem_statement&pm=7411
http://www.topcoder.com/stat?c=problem_statement&pm=6215

Algo & Program Developed By BregBoy( A Great Programmer & Tech Lover)

1 comment :

Anonymous said...

As if anyone can read your crap, why don't you indent your code ?