I am a huge fan of Regular Expressions (Regex). There are a lot of cool technologies, but no other technology seems as much like total hocus pocus. I take most every opportunity to use Regular Expressions in my applications, and I'm pretty much a sucker when a friend asks me to figure one out. I'll spend hours tweaking an expression in The Regulator, a cute little .NET regular expression utility. Actually I just noticed that the last release is 2004. What are other people using to do regular expression development and testing?
There is one rule of thumb I observe whenever I use a Regular Expression in an application: Make it overridable by a config setting. Regular Expressions are forever a work in progress and very often the first slice needs some tweaking after you get it into the real world. This little practice has saved my ass more than once, most recently I was able to fix a production application by just tweaking a config rather than pushing a whole patched release through QA and doing a full deployment.
At this point I just automatically do this when I put a regular expression into an application:
Let's start with a little application that just tells you the matches for a hard coded Regex
Notice how the pattern [a-yA-Z]+ misses the 'z' causing it to miss the match on 'lazy'. This could be a real problem if this application made it into the wild. The good news is I've coded in some insurance.
private static Regex MatchWordRegex
{
get
{
if (_regEx == null)
{
string defaultPattern = @"[a-yA-Z]+";
string cfgValue =
ConfigurationManager.AppSettings[REGEX_CONFIG_KEY];
if (!string.IsNullOrEmpty(cfgValue))
defaultPattern = cfgValue;
_regEx = new Regex(defaultPattern);
}
return _regEx;
}
}
You'll notice that I run out to the config and check if there is a value there to override what I've coded in as the default. So when it grabs the new value from the config:
<configuration>
<appSettings>
<add key="MatchWords.MatchWordRegex" value="[\w]+"/>
</appSettings>
</configuration>
So now when I run the app I get the following:
This isn't really rocket science at all but it's just a good practice around including Regular Expressions into your applications. I could give you some spiel about how with great power comes great responsibility, but truly you'll pay the price if you don't take the time to do this right.
Incidentally while I have your attention here is the link to the MSDN System.Text.RegularExpressions namespace page. Also there is a little tidbit you should know about Regular Expression performance in .NET from the System.Text.RegularExpressions.Regex class documentation
The Regex class contains several static (or Shared in Visual Basic) methods that allow you to use a regular expression without explicitly creating a Regex object. In the .NET Framework version 2.0, regular expressions compiled from static method calls are cached, whereas regular expressions compiled from instance method calls are not cached. By default, the regular expression engine caches the 15 most recently used static regular expressions. As a result, in applications that rely extensively on a fixed set of regular expressions to extract, modify, or validate text, you may prefer to call these static methods rather than their corresponding instance methods. Static overloads of the IsMatch, Match, Matches, Replace, and Split methods are available.
Finally, if you're really bored enough, here is the little console app.