DavidSzczesniak
4/6/2018 - 2:21 PM

Character Classes & Shortcuts

Sets of characters that can match a single location in the pattern of a regular expression. The pattern has to match any ONE of the characters in the brackets.

\!h # Some basic shortcuts:

# \s meaning any whitespace 

$_ = "fred \t \t barney";
if (/fred\s+barney/) {
  print "It matched!\n"
}

# \h can be used for horizontal whitespace only

# \v for vertical whitespace

# \d to abbreviate the character class for any digit

/HAL-\d+/ instead of /HAL-[0-9]+/

# \w means 'word' but actually refers to identifier characters - used to name a variable or subroutine

# \R matches any linebreak

\!h # Negating the shortcuts:
  
# \D - not a decimal digit
# \W - not 'word' characters
# \S - not whitespace
# \H - not horizontal whitespace
# \V - not vertical whitespace

\!h # NOTE: these shortcuts can be used in place of a character class or inside the square brackets of one. 
\!h # For example, [\s\d] will match whitespace and digits.
\!h # Or, [\d\D] will match any digit or non-digit. Basically, any character at all!
\!h # Character classes are kept in square brackets, like [abcwxyz]. 
\!h # You can also specify a range instead, like [a-zA-Z] to match any one letter, or [0-9] to match a digit.

\!h # Example:

$_ = "The HAL-9000 requires authorization to continue.";
if (/HAL-[0-9]+/) {
  print "The string mentions some model of HAL computer.\n";
}

\!h # Another Example - another way to make a match case insensitive:

$_ = "Bamm-Bamm";
if (/Bamm-?[Bb]amm/) { # pattern has to match B or b
  print "The string has Bamm-Bamm\n"; 
}

\!h # Leaving out characters you DON'T want to include:

[^def]  # anything not d, e, or f
[^n-z]  # not a lowercase letter from n to z
[^n\-z] # not an n, -, or z