All Snippets
Top 100 Snippets

By Language

GBIC >> Source Code >> Perl >> Snippet

31 Regular expressions

/xxx/ is a string between slashes . The string is called a Regular Expression

( $a = ~ /abc/)        # the expression is true if "abc" is found in $a
$string = ~ /the/     # is True if "the" is in the variable $string
$string ! ~ /the/     # is True if "the" is NOT in the variable $string

Special characters between the slashes affect how the matching is tested

match             m //       # m/abc/ matches if abc is found in $_
substitute        s ///      # s/abc/123/ substitue 123 for abc
list matching     grep      #

metacharacters    \n newline
                 \t TAB
                  .  any single character

character class   [abcde]      # match any of a,b,c,d,e
                 [a - e]        # match any of a,b,c,d,e
                 [0 - 9]        # match a digit
                 [ *! @ #$%&()] # match any of these punctuation marks
                 ^            # caret as first character negates the match

anchors           /^xxx/       # matches if line starts with xxx
                 /$xxx/       # matches if line ends with xxx

Assertions:  used to anchor parts of the pattern       Example        Matches   Doesn 't Match
            to word or string boundaries
  ^   start of string                                 ^fool          folish       tomfoolery
  $   end of string                                   fool$          April fool   foolish
  \b  word boundary                                   be\bside       be side      beside
  \B  nonword boundary                                be\Bside       beside       bd side

Atoms:  building blocks of a regular expression
   .   any character                                   b . b            bob          bb
  []  list of characters in brackets                  ^[Bb]          Bob, bob     Rbob
  ()  regular expression                              ^a(b . b)c$      abobc        abbc

Quantifiers:  modifier for an atom
   *      zero or more instances of the atom           ab * c           ac, abc      abb      
   +      one or more instances of the atom            ab + c           abc          ac
  ?      zero or one instances of the atom            ab?c           ac, abc      abbc
   { n }    n instances of the atom                      ab { 2 } c         abbc         abbbc
   { n, }   at least n instances of the atom             ab { 2, } c        abbc, abbbc  abc
   { n, m}  at least n, at most m instances of the atom  ab { 2,3 } c       abbc         abbbbc

Special Characters: \n
  \d   any digit                                      b\dd           b4d          bad
  \D   nondigit                                       b\Dd           bdd          b4d
  \n   newline
  \t   TAB
  \s   white space character
  \S   non white space char
  \w   alphanumeric char                              a\wb           a2b          a^b
  \W   nonalphanumeric char                           a\Wb           aa^b         aabb

Match Options:
  g    perform global matching - even after first match has been found
  i    perform case - insensitive matching
  o    evaluate regular expresssion one time only

When performing matches, you can direct Perl to track all of the parts of the string
in which the match succeeded .  The results will be stored in variables $1 $2 ...
$string = ~ /^x/ tests for x at the start of the string
$string = ~ /$x/ tests for x at the end of the string
$string = ~ /./ tests for any single character
$string = ~ /t.e/ tests for t and e separated by any one character
$string = ~ /^$/ tests for a string with nothing in it
$string = ~ /[a-z]/ test for any one character of any lower case letter
$string = ~ /[a-zA-Z]/ test for any one character of any letter
$string = ~ s /dog/cat/ replaces dog with cat first time it appears in the string
$string = ~ s /dog/cat/gi replaces dog with cat anywhere in the string, case insensitive