How it is used
- test if a string or its substring matches with some pattern.
- replace or substitute some string pattern in a text string.
- extract substring from a string based on certain text pattern.
For example, if the user input in a form contains all digits, legal phone number patterns, credit card number patterns, or date patterns.
For example, remove all tags in a web page and only leave text content.
For example, given a URL, extract the protocol, domain name, port no., and uri fields for further processing such as web crawling, web indexing/searching, or copying web pages for offline reading.
Web Pag for Testing Your Regulart Expression with provided data
http://cs.uccs.edu/~cs301/testreg.html
Reference:
- Mastering Regular Expressions by Jeff Friedl, Oreily.
- Perlre man page ("man perlre")
Perl Metacharacter Summary
Items to match a single characters
. | dot | Match any one characters |
[...] | character class | Match any character listed |
[^...] | negated character class | Match any character not listed |
\t | tab | Match HT or TAB character |
\n | new line | Match LF or NL character |
\r | return | Match CR character |
\f | line feed | Match FF (Form Feed) character |
\a | alarm | Match BELL character |
\e | escape | Match ESC character |
\0nnn | Character in octal, e.g. \033 | Match equivalent character |
\xnn | Character in hexa decimal, e.g. \x1B | Match equivalent character |
\c[ | Control character, e.g., \c[A? | Match control character? |
\l | lowercase next character | |
\u | uppercase next character | |
\L | lowercase characters till \E | |
\U | uppercase characters till \E | |
\E | end case modification | |
\Q | quote (disable) pattern metacharacters | till \E |
Example 1: character class
if ($string =~ /[01][0-9]/) {
print "$string contains digits 00 to 19\n";
} else {
print "$string contains digits 00 to 19\n";
}
Example 2: negated character class
if ($string =~ /[^A-z]/) { print "$string contains nonletter characters\n"}
else { print "$string does not contains non-letter characters.\n"}
Class Shorthand: Items that match a single character in a predefined character class
\w | Match a "word" character (alphanumeric plus "_") |
\W | Match a non-word character |
\s | Match a whitespace character |
\S | Match a non-whitespace character |
\d | Match a digit character |
\D | Match a non-digit character |
Quantifiers: Items appended to provide "Counting"
* | Match 0 or more times |
+ | Match 1 or more times |
? | Match 0 or 1 times |
{n} | Match exactly n times |
{n,} | Match at least n times |
{n, m} | Match at least n but no more than m times |
Items That Match Positions
^ | Caret, Match start of the line (can match multiple times when /m (multiline matching) |
$ | Match end of the line (can match multiple times when /m (multiline matching) |
\b | Match a word boundary |
\B | Match a non-(word boundary) |
\A | Match only at beginning of string |
\Z | Match only at end of string, or before newline at the end |
\z | Match only at end of string |
\G | Match only where previous m//g left off (works only with /g) |
Grouping and Alternation
| | Alternation, Match either expression it separates |
(...) | Limit scope of alternation, Provide grouping for the quantifiers, Capture matched substrings for backreferences. |
\1, \2, ... | Backreference, Match text previously matched within first, second, ..., set of parentheses. |
(?:...) | Grouping only, non-capturing parentheses |
(?=...) | Positive lookahead, non-capturing parentheses |
(?!...) | Negative lookahead, non-capturing parentheses |
Modes, append at the end of regular expression
i | ignore case |
g | global, in substitute case s/.../.../g, repeat substitution multiple times. |
m | multiline matching mode |
Reference: http://cs.uccs.edu/~cs301/perl/re.htm