Updated: 2016-02-05-Fri 04:13:21 UTC

Perl REGEX Voodoo:

Perl Regular Expressions Symbols


back to Perl REGEX reference overview page
(?=) - Positive look ahead assertion foo(?=bar) matches foo when followed by bar
(?!) - Negative look ahead assertion foo(?!bar) matches foo when not followed by bar
(?<=) - Positive look behind assertion (?<=foo)bar matches bar when preceded by foo
(?) - Once-only subpatterns (?>\d+)bar Performance enhancing when bar not present
(?(x)) - Conditional subpatterns
(?(3)foo|fu)bar - Matches foo if 3rd subpattern has matched, fu if not
(?#) - Comment (?# Pattern does x y or z)

There are 2 operators that bind the pattern to a string in Perl.
One searches for the pattern in the string;
the other searches that the pattern is not in the string.
Operator:Meaning:Example:
=~Bind the pattern to the string.
Returns TRUE if the pattern matches.
my $x = "Hello";
if ($x =~ /ell/) {
  #TRUE
  #(pattern found)
!~Bind the pattern to the string.
Returns TRUE if the pattern does NOT match.
my $x = "Hello";
if ($x !~ /empathy/) {
  #TRUE
  #(pattern not found)

Special case - matching characters that are reserved for REGEX:
12 special symbols to escape.
REGEX has a number of special characters that you have to learn
(e.g., '.' for any character but newline).
Sometimes, you are actaully looking for that actual character
in the string you have bound your pattern to.
In these cases, you simply escape the character with a backslash
(i.e., "no, no, I'm really looking for a period in my string.").
REGEX Symbol:Meaning:
\\Looking for backslash in my REGEX-bound string.
\|Looking for vertical-bar in my REGEX-bound string.
\(Looking for l-paren in my REGEX-bound string.
\)Looking for r-paren in my REGEX-bound string.
\[Looking for l-square-brace in my REGEX-bound string.
\{Looking for r-square-brace in my REGEX-bound string.
\^Looking for caret in my REGEX-bound string.
\$Looking for dollar-sign in my REGEX-bound string.
\*Looking for star in my REGEX-bound string.
\+Looking for plus in my REGEX-bound string.
\?Looking for question-mark in my REGEX-bound string.
\.Looking for period in my REGEX-bound string.


Documentation from perldoc.perl.org
Metacharacter:Explanation/Examples:
\Escape the character immediately following it
(e.g., '\.' matches a period, '\d' matches a digit, '\\' matches a backslash, etc.).
^Anchors pattern to match at the beginning of the string (or line, if /m is used).
$Anchors pattern to match at the end of the string (or line, if /m is used).
.match one character of any type except the newline character
(unless the modifier 's' is used to override this).
(     )Group subexpressions for capturing to $1, $2, etc.
If the entire pattern matches, then each parenthetical grouping (from left to right) assigns the corresponding substring that matches the subexpression to $1, $2, etc.
Works well associating characters together, often with alternation '|'.
(e.g., /FILENAME\.tar\.(bz2|gz)/
(?:  )Group subexpressions without capturing them.
  |  Alternation: match either subespression to the left or right of '|'
(e.g., match either gz or bz2 compression tar file with /FILENAME\.tar\.(bz2|gz)./
[     ]matches any one of the characters contained within the brackets.
[^   ]matches any one of the characters NOT contained within the brackets.

Perl REGEX Quantifiers:
Remember: Quantifiers are greedy unless otherwise specified (compare below).
Remember:
Quantifier:Range:Explanation/Examples:
*[0,∞)Match the previous character, range, sequence or group 0 or more times.
Greedy!
*?[0,∞)Match the previous character, range, sequence or group 0 or more times.
Non-Greedy!
+[1,∞)Match the previous character, range, sequence or group 1 or more times.
Greedy!
+?[1,∞)Match the previous character, range, sequence or group 1 or more times.
Non-Greedy!
?[0,1]Match the previous character, range, sequence or group 0 or 1 time(s).
Greedy!
??[0,1]Match the previous character, range, sequence or group 0 or 1 time(s).
Non-Greedy!
{m}[m,m]Match the previous character, range, sequence or group m times.
Greedy!
{m}?[m,m]Match the previous character, range, sequence or group m times.
Non-Greedy!
{m,}[m,∞)Match the previous character, range, sequence or group m or more times.
Greedy!
{m,}?[m,∞)Match the previous character, range, sequence or group m or more times.
Non-Greedy!
{m,n}[m,n]Match the previous character, range, sequence or group between m & n times (both inclusive).
Greedy!
{m,n}?[m,n]Match the previous character, range, sequence or group between m & n times (both inclusive).
Non-Greedy!

Specific Escape Sequences
Escape Sequence:characters:Other symbols:
\ttab(HT, TAB)
\nnewline(LF, NL)
\rreturn(CR)
There are more here

Special Variables
Perl has a number of built-in variables (like $_)
that are of particular use in REGEX.
Variable:Meaning:
$`(backtick): substring corresponding to the pre-match region for the pattern.
$&(ampersand): substring corresponding to the match region for the pattern.
$'(single-quote): substring corresponding to the post-match region for the pattern.
\1, \2, etc.Back-reference group
$1, $2, etc.Match group variable

Perl's REGEX modifiers:
perldoc documentation here
m//i  or  m///iCase-insensitive pattern matching (ignore case of characters).
m//g  or   s///gGlobal match
m//m  or  s///mTreat string as multiple lines.
Note: changes how /^.../ and /...$/ behave.
m//s  os  s///sTreat string as single line.
Note: Allows '.' to match even newline.
m//x  or  s///xExtend legibility to permit whitespace and comments.
Ex:
my $str = "2019-01-32 14:37:43 myHOST SSHD[42]: Failed login...\n";
if ($str =~ /
   \d{4}  #YEAR
   -
   \d{2}  #MONTH
   -
   \d{2}  #DAY
 /x
) {
s///e (ONLY sub!)Applies only to s///