Tech Tip: New Regular Expression Syntax in Quick Fields 8 — Match Groups and Non-Capturing Groups
In Quick Fields 8, the syntax for pattern matching expressions (regular expressions) has changed to .NET regular expressions.
June 22nd, 2009 Comment on this articleIn Quick Fields 8, the syntax for pattern matching expressions (regular expressions) has changed to .NET regular expressions, which are also used by Workflow 8. When you migrate Quick Fields 7 sessions to Quick Fields 8, the migration should proceed successfully, but you may need to update processes that use pattern matching to the new syntax. In addition, Quick Fields 8 includes several new processes, such as Substitution and Text Identification, that use regular expressions to work with a document’s text, so familiarity with the new regular expression syntax will enable you to take full advantage of them.
One of the key changes in regular expression syntax that is likely to affect your sessions is the change to how match groups are designated. A match group is a component of a regular expression indicating that the result returned should only be what is indicated by the expression inside that group. If you want to group regular expressions without creating match groups, so that you can retrieve information matched by the full expression but not contained in the group, you can create a non-capturing group. In Quick Fields 8, the syntax for groups is as follows:
(expr): Match group
(?:expr): Non-capturing group
Also, the curly braces previously used to create match groups now designate a quantifier:
{n}: A quantifier indicating n of the previous element.
Below are some examples of how pattern matching expressions would be updated from Quick Fields 7 to Quick Fields 8. Note that the syntax for particular character types such as alphanumeric characters and spaces has also changed.
The following expression, which uses a match group, will look for a group of four numeric digits followed by a dash and extract only the group of word characters that comes after the dash. For example, if this pattern encounters 2009–April, it will return only April. If it encounters 2007–January and February notes and agendas, it will return only January.
Quick Fields 7: \d\d\d\d–{\c+}
Quick Fields 8: \d\d\d\d–(\w+)
or \d{4}–(\w+)
The following pattern, which uses a non-capturing group, will look for a group of four numeric digits and a dash followed by any number of word characters or spaces and retrieve everything. If this pattern encounters 2007–January and February notes and agendas it will return that entire string: 2007–January and February notes and agendas.
Quick Fields 7: \d\d\d\d–(\c\b?)+
Quick Fields 8: \d\d\d\d–(?:\w\s?)+
or \d{4}–(?:\w\s?)+
The match groups and non-matching groups can also be nested. The following pattern will look for a group of four numeric digits and a dash followed by any number of word characters or spaces and retrieve everything after the dash. If it encounters 2007–January and February notes and agendas, it will return only January and February notes and agendas.
Quick Fields 7: \d\d\d\d–{(\c\b?)+}
Quick Fields 8: \d\d\d\d–((?:\w\s?)+)
or \d{4}–((?:\w\s?)+)
For more on .NET regular expression syntax, see the .NET Framework Developer’s Guide.
Tags: Quick Fields


