January 11, 2008

Regular Expressions

Posted by Ben Simo

(bb|[^b]{2}); [Tt]hat is the \?\.

Regular expressions are great tools for testers. I have found them useful for describing GUI objects to GUI test automation tools. I have found them useful for automation results validation. I have found them useful for extracting data I care about from voluminous log files. I've also found them useful for manipulating data.

What are regular expressions? Regular expressions are patterns for finding text of interest. They are supported by many test tools, system utilities, text editors, and programming languages.

Regular expressions can include the following meta characters to define patterns.

  • ^ Matches the beginning
  • $ Matches the end
  • . Matches any single character
  • * Matches zero or more occurrences of the preceding character
  • \ Escape character
  • ? Matches zero or one occurrence of the preceding character
  • + Matches one or more occurrences of the previous character
  • [ ] Defines a character class
  • [^ ] Defines an exclusion-based character class
  • \{ \} Matches a specific number or range of instances of the previous character
  • \( \) Treats the expression between \( and \) as a group
  • | Or. Use to match one of many expressions
  • \< Matches the beginning of a word
  • \> Matches the end of a word
  • \b Word boundary
  • \B Not a word boundary

* Many tools do not support all meta characters

Here are some example regular expressions:

“frog”
  • Matches “frog”, “bullfrog”, and “tree frog”; but not “Frog”

“^Frog”
  • Matches “Froggy went a courting”, but not “Quality Frog”

“frog$”
  • Matches “frog”, “bullfrog”, and “tree frog”; but not “froggy” or “The frog sat on a log.”

“.at”
  • Matches “cat”, “rat”, “bat”, “goat”, and “gnat”

“20*5”
  • Matches “2005”, “20005”, “20000000000000000000000005”, “25”; but not “2ABC5” or “2006”

“Spee?d”
  • Matches “Sped” and “Speed”; but not “Speeed”

“20+5”
  • Matches “2005” and “20005”, but not “25”

“200[5-9]”
  • Matches “2005”, “2006”, “2007”, and “2009”; but not “2004”

“199[0-9]|200[0-9]”
  • Matches years 1990 through 2009.

“[0-9][0-9]*\.[0-9][0-9][^0-9]”
  • Matches “1.29”, “1.29%”, and “1234.55”; but not “1.299” or “.29”

“A[LKRSZ|C[AOT]|D[CE]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[AFRHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY]“
  • Matches any valid 2-letter US postal state or territory name abbreviation.


Want to learn more?

Take a look at my slides from last night's presentation to the Denver Mercury User Group. Check out Wikipedia. Or try a Google Search. If you ask bb|[^b]{2}, check out Think Geek.

Ha[p]{2}y T[ea]sting\.


  Edit

2 Comments:

January 15, 2008  
Anonymous wrote:

Hi!
As far as I know this expression
\( \) works when there are "(", ")".

If we want expression to be treated as a group we should use just (some expression) w/o backslashes.

January 15, 2008  
Ben Simo wrote:

To escape or not to escape, that is the question.

Thanks for mentioning this. This is something I mentioned in my presentation but neglected to include in the blog post.

It depends on the tool. Older tools that did not originally support the parens may require that they be escaped to identify them as metacharacters. Newer tools likely require that they be escaped when they are not metacharacters.

For the items in my list with backslashes in front of them: you will need to determine what your specific tools require.