Excellent Study Material For Understanding Algorithms & implementing Suffix Tree
6.http://apps.topcoder.com/forums/?module=RevisionHistory&messageID=1171511
7.http://www.stanford.edu/class/cs97si/suffix-array.pdf
Application of Suffix Tree
1.Longest repeated substring problem:
The longest repeated substring problem is finding the longest substring of a string that occurs at least twice. This problem can be solved in linear time and space by building a suffix tree for
the string, and finding the deepest internal node in the tree. The
string spelled by the edges from the root to such a node is a longest
repeated substring. The problem of finding the longest substring with
at least k occurrences
can be found by first preprocessing the tree to count the number of
leaf descendants for each internal node, and then finding the deepest
node with at least k descendants.
By from Sartaj Sahni
Find the longest substring ofS
that appears at leastm > 1
times. This query can be answered inO(|S|)
time in the following way:
(a) Traverse the suffix tree labeling the branch nodes with the sum of the label lengths from the root and also with the number of information nodes in the subtrie. (b) Traverse the suffix tree visiting branch nodes with information node count>= m
. Determine the visited branch node with longest label length.
Note that step (a) needs to be done only once. Following this, we can do step (b) for as many values ofm
as is desired. Also, note that whenm = 2
we can avoid determining the number of information nodes in subtries. In a compressed trie, every subtrie rooted at a branch node has at least two information nodes in it.
Note that step (a) needs to be done only once. Following this, we can do step (b) for as many values ofm
as is desired. Also, note that whenm = 2
we can avoid determining the number of information nodes in subtries. In a compressed trie, every subtrie rooted at a branch node has at least two information nodes in it.
2. Pattern Searching/Sub-String Searching
Searching for a substring,pat[1..m]
, intxt[1..n]
, can be solved in O(m) time (after the suffix tree fortxt
has been built in O(n) time).
3. Longest Repeated Substring
4. Longest Common Subsequence
5. Longest Palindrome
http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf
Add a special ``end of string'' character, e.g. `$', to i.e., `issi' in the case of `mississippi'. The longest repeated substring can be found in O(n) time using a suffix tree.
The longest common substring of two strings,
Equivalently, one can build a (basic) suffix tree for the string
(Try it using the HTML FORM above.)
txt[1..n]
and build a suffix tree; the longest repeated substring oftxt[1..n]
is indicated by the deepest fork node in the suffix tree, where depth is measured by the number of characters traversed from the root, The longest common substring of two strings,
txt1
and txt2
, can be found by building a generalized suffix tree for txt1
andtxt2
: Each node is marked to indicate if it represents a suffix of txt1
or txt2
or both. The deepest node marked for both txt1
and txt2
represents the longest common substring.Equivalently, one can build a (basic) suffix tree for the string
txt1$txt2#
, where `$' is a special terminator for txt1
and `#' is a special terminator for txt2
. The longest common substring is indicated by the deepest fork node that has both `...$...' and `...#...' (no $) beneath it.(Try it using the HTML FORM above.)
Note that the `longest common substring problem' is different to the `longest common subsequence problem' which is closely related to the `edit-distance problem': An instance of a subsequence can have gaps where it appears in
txt1
and in txt2
, but an instance of a substring cannot have gaps.A palindrome is a string, P, such that P=reverse(P). e.g. `abba'=reverse(`abba'). e.g. `ississi' is the longest palindrome in `mississippi'. The longest palindrome of
txt[1..n]
can be found in O(n) time, e.g. by building the suffix tree for txt$reverse(txt)#
or by building the generalized suffix tree for txt
and reverse(txt)
.(Try it.)
Others
--
Some Awesome Algorithms/Codes For Suffix Tree Implementation
No comments :
Post a Comment