Wednesday, August 24, 2011

Design an algorithm that takes strings S and r and returns if r matches s. (Assume r is a well-formed regular expression.)

Before starting solving we need to think about some test cases  like .,*,?,^,$  e.g about a regular expression grammar.

A regular expression is a sequence of characters that defines a set of
matching strings.For this problem , we define a simple subset of a full
regular expression Language.

Alphabetical and numerical characters match themselves. 
 1.For example aW9 will match that string of 3 letters wherever it appears.
The meta-characters ". and $ stand for the beginning and end of the
string. For example, .....aW9 matches aW9 only at the start of a string
aW9$ matches aW9 only at the end of a string, and ^aW9$ matches a string only if it is exactly equal to aW9.

2.The metacharacter . matches any single character. For example,
a.9 matches a89 and xyaW9123 but not aw89.

3.The metacharacter * specifies a repetition of the single previous
period or a literal character. For example,a. *9 matches aw89.
By definition, regular expression r matches string s if s contains a
substring starting at any position matching r. For example, aW9 and a. 9
match string xyaW9123 but .....aW9 does not.

Algorithm:
The key to solving this problem is using recursion effectively.
If the regular expression r starts with "', then s must match the remainder of r; otherwise, s must match r at some position.
Call the function that checks whether a string S matches r from
the beginning matchHere. This function has to check several cases
(1.) Iength-O regular expressions which match everything, 
(2.) a regular expression starting with a *match, 
(3.) the regular expression $,and 
(4.) a regular expression starting with an alphanumeric character or dot.
Of these, (1.) and (3.) are base cases, (4.) is a check followed by a call
to matchHere, and (3.) requires a new matchStar function.

More Info http://swtch.com/~rsc/regexp/regexp1.html


No comments :