Regular expression grammar

The Adapter Configuration Editor allows you to use regular expressions to describe how log files should be transformed into Common Base Event records. The following tables are a guideline to regular expression usage.

General rules

Regular expression matching

Expression Matches
{n,m} at least n but not more than m times
{n,} at least n times
{n} exactly n times
* 0 or more times
+ 1 or more times
? 0 or 1 times
. everything except \n in a regular expression within parentheses
^ a null token matching the beginning of a string or line (i.e., the position right after a newline or right before the beginning of a string) in a regular expression within parentheses
$ a null token matching the end of a string or line (that is, the position right before a newline or right after the end of a string) in a regular expression within parentheses
\b backspace inside a character class ([abcd])
\b null token matching a word boundary (\w on one side and \W on the other)
\B null token matching a boundary that isn't a word boundary
\A only at beginning of string
\Z only at end of string (or before newline at the end)
\ newline
\r carriage return
\t tab
\f form feed
\d digit [0-9]
\D non-digit [^0-9]
\w word character [0-9a-z_A-Z]
\W non-word character [^0-9a-z_A-Z]
\s a whitespace character [ \t\n\r\f]
\S a non-whitespace character [^ \t\n\r\f]
\xnn the hexadecimal representation of character nn
\cD the corresponding control character
\nn or \nnn the octal representation of character nn unless a backreference.
\1, \2, \3 ... whatever the first, second, third, and so on, parenthesized group matched. This is called a backreference. If there is no corresponding group, the number is interpreted as an octal representation of a character.
\0 the null character. Any other backslashed character matches itself .
*? 0 or more times
+? 1 or more times
?? 0 or 1 times
{n}? exactly n times
{n,}? at least n times
{n,m}? at least n but not more than m times

Grouping and extracting matches

To group parts of an expression, use the metacharacters ( ). This allows the regular expression in the parentheses to be treated as a single unit. For example, the regular expression

severity:(1|2)
matches the pattern severity:1 or severity:2.

To extract parts of a string that have been matched using the grouping metacharacters, use the special variables $1, $2, etc.

# Extract the name and URL from $pattern = <a href="secure_logon.html">Logon form</a>
$pattern =~ <a href=\"(.*)\">(.*)</a> ; # match using grouping
$url = $1;                # $1 equals secure_logon.html
$pagename = $2;           # $2 equals Logon form

Perl 5 extended regular expressions

Expression Matches
(?#text) An embedded comment causing text to be ignored.
(?:regexp) Groups things like "()" but doesn't cause the group match to be saved.
(?=regexp) A zero-width positive lookahead assertion. For example, \w+(?=\s) matches a word followed by whitespace, without including whitespace in the MatchResult
(?!regexp) A zero-width negative lookahead assertion. For example foo(?!bar) matches any occurrence of foo that isn't followed by bar. This is a zero-width assertion, which means that a(?!b)d matches ad because a is followed by a character that is not b (the d) and d follows the zero-width assertion.
(?imsx) One or more embedded pattern-match modifiers:
i enables case insensitivity
m
enables multiline treatment of the input
s
enables single-line treatment of the input
x
enables extended whitespace comments

Parent topic: Creating a rules-based adapter