Introduction to Regular Expression
Regular Expressions are part of those small technology that are incredibly useful in a wide range of programs. Regular Expressions are commonly used for one specific purpose:to locate substrings within a large strings expressions. Regular Expressions is not a new technology and is originated from UNIX environment and made popular with Perl Programming language. Microsoft ported it onto Windows where up until now has been used mostly with scripting languages. Regular Expressions are today, however supported by a number of .NET classes in the namespace of System.Text.RegularExpressions.
Regular Expression Basic
With regular expressions you can perform quite sophisticated and high level operations on strings.For example you can
1. Identify all repeated words in the string.
2. Convert all words to title case (for e.g this is me to "This Is Me")
3. Ensure that sentences are properly capitalized
4. Check if your string is a valid email addresses,postal address, domain name, ipaddress and etc.
Of course all the tasks above can be performed using C# using the various methods of System.String and System.Text.StringBuilder. However in some cases, this would involve writing a fair amount of C# code. If you use regular expressions, this code can be normally be compressed to just a couple of lines.
You only need to instantitate a System.Text.RegularExpressions.RegEx Object, pass the string to be processed and pass in the regular expressions and you're done.
The following lists some of the main special characters or escape sequences that you can use.
| Symbol |
Meaning |
Example |
Matches |
| ^ |
Beginning of input text |
^ABC |
ABC,ABCDEFG,ABC123,... |
| $ |
End of input text |
ABC$ |
ABC,234ABC,... |
| | |
Alternation |
sam|ted |
ted,sam |
| . |
Any single character except the newline character(\n) |
a.iation |
aviation,asiation |
| * |
Preceeding character maybe repeated 0 or more times. |
ba*t |
bt,bat,baat,baaaat and so on |
| + |
Preceeding character maybe repeated 1 or more times. |
ba+t |
bat,baat,baaaat and so on |
| ? |
Preceeding character maybe repeated 0 or 1 times. |
ba?t |
bt,bat only |
| {...} |
Explicit quantifier notation. |
ab{2}c |
abbc |
| [...] |
Explicit set of characters to match. |
a[bB]c |
abc, aBc |
| (...) |
Logical grouping of part of an expression. |
(abc){2} |
abcabc |
Please note that ordinary characters except ". $ ^ { [ ( | ) ] } * + ? \" will match themselves
e.g b will match b, c will match c
| Escaped character |
Description |
| \a |
Matches a bell (alarm) \u0007. |
| \b |
Matches a backspace \u0008 if in a []; otherwise matches a word boundary (between \w and \W characters). |
| \t |
Matches a tab \u0009. |
| \r |
Matches a carriage return \u000D. |
| \v |
Matches a vertical tab \u000B. |
| \f |
Matches a form feed \u000C. |
| \n |
Matches a new line \u000A. |
| \e |
Matches an escape \u001B. |
| \040 |
Matches an ASCII character as octal (up to three digits); numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number. For example, the character \040 represents a space. |
| \x20 |
Matches an ASCII character using hexadecimal representation (exactly two digits). |
| \cC |
Matches an ASCII control character; for example, \cC is control-C. |
| \u0020 |
Matches a Unicode character using hexadecimal representation (exactly four digits). |
| \ |
When followed by a character that is not recognized as an escaped character, matches that character. For example, \* is the same as \x2A. |
Using Regular Expression in .NET Framework
For better understanding Regular Expression in .NET, I will list down the code snippets for each particular operation.
1.Searching string using Regular Expressions
string Text = @"Regular Expression in .NET has been
made very easy with the introduction of the .NET library. Regular Expression is very
powerful string manipulation algorithm and I love regular expression so much";
string Pattern = "ion";
MatchCollection Matches = Regex.Matches(
Text,Pattern,RegexOptions.IgnoreCase|RegexOptions.ExplicitCapture);
foreach(Match NextMatch in Matches)
{
Console.WriteLine(NextMatch.Index);
}
The code above basically will search and print out the index position of the "ion" from the text above.
In the code above, we are using the static Method from Regex class, you can also instantiate Regex object and then use the object matches method.
Code below will show you the equivalent way of the code above.
string Text = @"Regular Expression in .NET has been
made very easy with the introduction of the .NET library. Regular Expression is very
powerful string manipulation algorithm and I love regular expression so much";
string Pattern = "ion";
Regex oRegex = new Regex(Pattern, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);
MatchCollection Matches = oRegex.Matches(Text);
foreach (Match NextMatch in Matches)
{
Console.WriteLine(NextMatch.Index.ToString());
}
2. Replacing string using Regular Expressions
string Text = @"Regular Expression in .NET has been
made very easy with the introduction of the .NET library. Regular Expression is very
powerful string manipulation algorithm and I love regular expression so much";
string Pattern = "ion";
Regex oRegex = new Regex(Pattern, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);
MatchCollection Matches = oRegex.Matches(Text);
foreach (Match NextMatch in Matches)
{
Text = oRegex.Replace(Text, "aon",1);
}
Console.WriteLine(Text);
Output : "Regular Expressaon in .NET has been made very easy with the introductaon of the .NET library. Regular Expressaon is very powerful string manipulataon algorithm and I love regular expressaon so much"
3. Using Group in Regular Expression
Regex r = new Regex("(a(b))c");
Match m = r.Match("abdabc");
Console.WriteLine("Number of groups found = " + m.Groups.Count);
//Output : Number of Groups Found : 3