How it is used
- test if a string or its substring matches with some pattern.
- replace or substitute some string pattern in a text string.
- extract substring from a string based on certain text pattern.
For example, if the user input in a form contains all digits, legal phone number patterns, credit card number patterns, or date patterns.
For example, remove all tags in a web page and only leave text content.
For example, given a URL, extract the protocol, domain name, port no., and uri fields for further processing such as web crawling, web indexing/searching, or copying web pages for offline reading.
Web Pag for Testing Your Regulart Expression with provided data
http://cs.uccs.edu/~cs301/testreg.html
Reference:
- Mastering Regular Expressions by Jeff Friedl, Oreily.
- Perlre man page ("man perlre")
Perl Metacharacter Summary
Items to match a single characters
| . | dot | Match any one characters |
| [...] | character class | Match any character listed |
| [^...] | negated character class | Match any character not listed |
| \t | tab | Match HT or TAB character |
| \n | new line | Match LF or NL character |
| \r | return | Match CR character |
| \f | line feed | Match FF (Form Feed) character |
| \a | alarm | Match BELL character |
| \e | escape | Match ESC character |
| \0nnn | Character in octal, e.g. \033 | Match equivalent character |
| \xnn | Character in hexa decimal, e.g. \x1B | Match equivalent character |
| \c[ | Control character, e.g., \c[A? | Match control character? |
| \l | lowercase next character | |
| \u | uppercase next character | |
| \L | lowercase characters till \E | |
| \U | uppercase characters till \E | |
| \E | end case modification | |
| \Q | quote (disable) pattern metacharacters | till \E |
Example 1: character class
if ($string =~ /[01][0-9]/) {
print "$string contains digits 00 to 19\n";
} else {
print "$string contains digits 00 to 19\n";
}
Example 2: negated character class
if ($string =~ /[^A-z]/) { print "$string contains nonletter characters\n"}
else { print "$string does not contains non-letter characters.\n"}
Class Shorthand: Items that match a single character in a predefined character class
| \w | Match a "word" character (alphanumeric plus "_") |
| \W | Match a non-word character |
| \s | Match a whitespace character |
| \S | Match a non-whitespace character |
| \d | Match a digit character |
| \D | Match a non-digit character |
Quantifiers: Items appended to provide "Counting"
| * | Match 0 or more times |
| + | Match 1 or more times |
| ? | Match 0 or 1 times |
| {n} | Match exactly n times |
| {n,} | Match at least n times |
| {n, m} | Match at least n but no more than m times |
Items That Match Positions
| ^ | Caret, Match start of the line (can match multiple times when /m (multiline matching) |
| $ | Match end of the line (can match multiple times when /m (multiline matching) |
| \b | Match a word boundary |
| \B | Match a non-(word boundary) |
| \A | Match only at beginning of string |
| \Z | Match only at end of string, or before newline at the end |
| \z | Match only at end of string |
| \G | Match only where previous m//g left off (works only with /g) |
Grouping and Alternation
| | | Alternation, Match either expression it separates |
| (...) | Limit scope of alternation, Provide grouping for the quantifiers, Capture matched substrings for backreferences. |
| \1, \2, ... | Backreference, Match text previously matched within first, second, ..., set of parentheses. |
| (?:...) | Grouping only, non-capturing parentheses |
| (?=...) | Positive lookahead, non-capturing parentheses |
| (?!...) | Negative lookahead, non-capturing parentheses |
Modes, append at the end of regular expression
| i | ignore case |
| g | global, in substitute case s/.../.../g, repeat substitution multiple times. |
| m | multiline matching mode |
Reference: http://cs.uccs.edu/~cs301/perl/re.htm
