Majordomo2 Patterns (Regular Expressions)
In the following examples, the double-quote characters, ", are part of
the pattern delimiters.
The Short Story
Where you can use a pattern, use something like:
- "user@example.com"
- "example.com"i
where the trailing 'i' specifies case insensitivity.
If you need more complicated matching, you can use:
- %user@*example.com%i
where an asterisk matches any number of any character.
If you need more powerful patterns, you can use a Perl regular expression
surrounded by '/' characters:
- /(someone|nobody)\\@.*example.com/i
The Details
Majordomo2 is written in Perl and exposes
the power of the Perl pattern matching language to the user. Unfortunately,
Perl regular expressions (regexps for short) are somewhat complicated and
difficult to learn. They also have their own quirks, which can get in the
way of doing simple pattern matching tasks.
To eliminate some of the complexity, Majordomo2 provides two simpler
forms of pattern matching in addition to full Perl regular expressions.
A Majordomo2 pattern is composed of enclosing delimiters, modifiers,
and the pattern itself. Majordomo2 uses the delimiters to determine the
type of the pattern. The modifiers change the behavior of the pattern;
generally, the only useful modifier makes the match insensitive to case.
There are three supported types of pattern:
- substring
- shell-like
- Perl-like regular expressions
They differ in their delimiters and the number and function of various
special characters (the so-called metacharacters).
Substring Patterns
Examples:
- "example.com"
- "user@somewhere.example.com"i
The delimiter is a double quote. There are no special characters; the
pattern matches if the given string occurs anywhere within the string
to be matched. A trailing 'i' specifies that the match is done insensitive
to case.
Shell-Like Patterns
Examples:
- %user@*example.com%i
- %u-???@*example.com%i
The delimiter is a percent sign. These patterns are reminiscent of
csh or DOS patterns in that a question mark matches any single character,
and an asterisk matches any number (including zero) of any character.
Character classes with '[' and ']' are also supported.
Perl-Like Regular Expressions
More extreme power is available by using the regular expression language
available through Perl. These patterns are called "Perl-like" because
there is one important difference between them and real Perl regexps:
in Perl version 5 and above, unescaped '@' symbols are not allowed in
patterns. This is a constant source of problems for Majordomo2 users
because '@' signs are obviously very common in Internet mail addresses.
To ease this difficulty, Majordomo2 checks the syntax of user-supplied
regexps, and if they have syntax errors relating to unescaped '@' symbols,
they are all escaped and the pattern is checked again. This solves most
of the problems. Be careful, though, when writing Perl code. The enhanced
Majordomo2 regexps will only provide you with syntax errors.
What follows is a basic discussion of Perl regular expressions. All
'@' symbols will be shown escaped for correctness. For more information,
consult the Perl documentation or any reasonable book on Perl.
Perl Regular Expressions in Depth
A regular expression is a concise way of expressing a pattern in a series
of characters. The full power of regular expressions can make some difficult
tasks quite easy, but we will only brush the surface here.
The character "/" is used to mark the beginning and end of a regular
expression. Letters and numbers stand for themselves. Many of the other
characters are symbolic. Some commonly used ones are:
| \\@ |
the "@" found in nearly all addresses; it must be preceded by a
backslash to avoid errors in Perl |
| . |
(period) any character |
| * |
previous character, zero or more times; note especially... |
| .* |
any character, zero or more times |
| + |
previous character, one or more times; so for example... |
| a+ |
letter "a", one or more times |
| \\ |
next character stands for itself; so for example... |
| \\. |
literally a period, not meaning "any character " |
| ^ |
beginning of the string; so for example... |
| ^a |
a string beginning with letter "a" |
| $ |
end of the string; so for example... |
| a$ |
a string ending with letter "a" |
Example 1:
- /foo\\.example\\.com/
Note: the periods are preceded by a backslash so that they are
interpreted as periods, not as wildcards. This pattern matches any string
containing:
- foo.example.com
such as:
- foo.example.com
- bar.foo.example.com
- user@bar.foo.example.com
- users%bar.foo.example.com@example.com
Example 2:
- /johndoe\\@.*foo\\.example\\.com/
The `@' has special meaning to Perl and should be prefixed with a backslash
to avoid errors. The string ".*" means "any character, zero or more times".
So this matches:
- johndoe@foo.example.com
- johndoe@terminus.foo.example.com
- ajohndoe@terminus.foo.example.com@example.com
But it doesn't match:
- johndoe@example.com
- brent@foo.example.com
Example 3:
- /^johndoe\\@.*cs\\.example\\.org$/
This pattern is similar to Example 2 and matches the same first two strings:
- johndoe@foo.example.org
- johndoe@terminus.foo.example.org
But it doesn't match:
- ajohndoe@terminus.foo.example.org@example.com
because the regular expression says the string has to begin with letter "j" and
end with letter "g " (between the ^ and $ symbols) and neither of those
is true for ajohndoe@terminus.foo.example.org@example.com.
Example 4:
- /.*/
This is the regular expression that matches anything.
Example 5:
- /.\\*johndoe/
Here the * is preceded by a \\, so it refers literally to an asterisk
character and not the symbolic meaning "zero or more times." The . still
has its symbolic meaning of "any one character", so it would match:
- a*johndoe
- s*johndoe
Because the . by itself implies one character, it would not match:
- *johndoe
Example 6:
Normally all matches are case sensitive. You can make any match case
insensitive by appending an "i" to the end of the expression.
- /example\\.com/i
This would match example.com, EXAMPLE.com, ExAmPlE.cOm, etc. Removing
the "i":
- /example\\.com/
would match example.com but not EXAMPLE.com or any other capitalization.
To be on the safe side put a \\ in front of any characters in the regular
expressions that are not numbers or letters. In order to put a / into the
regular expression, the same rule holds: precede it with a \\. Thus, with
\\ in front of the / and = characters.
- /\\/CO\\=US/
This pattern matches /CO=US and may be a useful regular expression to
those of you who need to deal with X.400 addresses that contain / characters.
Example 7:
Normally, all whitespace within a pattern is matched verbatim, but it
is sometimes desirable to add some additional space within a pattern to
make it more readable. For instance, here is a pattern matching common
quoting characters in email (", :, word>)
-
/^(-|:|>|[a-z]+>)/i
This can be a bit difficult to follow, so we can space it out a bit:
- /^( - | : | > | [a-z]+> )/xi
The 'x' modifier specifies that whitespace is to be ignored and makes
the pattern a bit easier to read. If you want to match actual whitespace,
use '\\s'.
Note: the 'x' modifier provides additional functionality to Perl
code relating to comments, but because Majordomo2 requires patterns to
lie all on a single line, this is not significant here.
Last modified
June 15, 2005
by dlschmid
|