|
31 Regular expressions
/xxx/ is a string between slashes
.
The string is called a Regular Expression
(
$a
=
~ /abc/)
# the expression is true if "abc" is found in $a
$string
=
~ /the/
# is True if "the" is in the variable $string
$string
!
~ /the/
# is True if "the" is NOT in the variable $string
Special characters between the slashes affect how the matching is tested
match
m
//
# m/abc/ matches if abc is found in $_
substitute
s
///
# s/abc/123/ substitue 123 for abc
list matching
grep
#
metacharacters \n newline
\t TAB
.
any single character
*
?
character class [abcde]
# match any of a,b,c,d,e
[a
-
e]
# match any of a,b,c,d,e
[0
-
9]
# match a digit
[
*!
@
#$%&()] # match any of these punctuation marks
^
# caret as first character negates the match
anchors /^xxx/
# matches if line starts with xxx
/$xxx/
# matches if line ends with xxx
Assertions: used to anchor parts of the pattern Example Matches Doesn
't Match
to word
or
string boundaries
^ start of string ^fool folish tomfoolery
$ end of string fool$ April fool foolish
\b word boundary be\bside be side beside
\B nonword boundary be\Bside beside bd side
Atoms: building blocks of a regular expression
.
any character b
.
b bob bb
[] list of characters in brackets ^[Bb] Bob, bob Rbob
() regular expression ^a(b
.
b)c$ abobc abbc
Quantifiers: modifier
for
an atom
*
zero
or
more instances of the atom ab
*
c ac, abc abb
+
one
or
more instances of the atom ab
+
c abc ac
? zero
or
one instances of the atom ab?c ac, abc abbc
{
n
}
n instances of the atom ab
{
2
}
c abbc abbbc
{
n,
}
at least n instances of the atom ab
{
2,
}
c abbc, abbbc abc
{
n,
m}
at least n, at most
m
instances of the atom ab
{
2,3
}
c abbc abbbbc
Special Characters: \n
\d any digit b\dd b4d bad
\D nondigit b\Dd bdd b4d
\n newline
\t TAB
\s white space character
\S non white space char
\w alphanumeric char a\wb a2b a^b
\W nonalphanumeric char a\Wb aa^b aabb
Match Options:
g perform global matching
-
even after first match has been found
i perform case
-
insensitive matching
o evaluate regular expresssion one
time
only
Backreferences:
When performing matches, you can direct Perl to track all of the parts of the string
in which the match succeeded
.
The results will be stored in variables
$1
$2
...
Examples:
$string
=
~ /^x/ tests
for
x
at the start of the string
$string
=
~ /$x/ tests
for
x
at the end of the string
$string
=
~ /./ tests
for
any single character
$string
=
~ /t.e/ tests
for
t
and
e separated by any one character
$string
=
~ /^$/ tests
for
a string with nothing in it
$string
=
~ /[a-z]/ test
for
any one character of any lower case letter
$string
=
~ /[a-zA-Z]/ test
for
any one character of any letter
$string
=
~
s
/dog/cat/ replaces dog with cat first
time
it appears in the string
$string
=
~
s
/dog/cat/gi replaces dog with cat anywhere in the string, case insensitive
|