link to content
Essentials at NC State Home
skip header navigation and go to content Help | ResNet | Computing@NC State | For OIT Staff | Publications | Search NC State | Feedback | Site Map
your unity account
antivirus & security
email & messaging
connections & labs
your computer
software@nc state
files
web pages
education & training
publications
other resources
troubleshooting
ITD Sections

Majordomo2 Logo

Majordomo2 Patterns (Regular Expressions)

In the following examples, the double-quote characters, ", are part of the pattern delimiters.


The Short Story

Where you can use a pattern, use something like:

"user@example.com"
"example.com"i

where the trailing 'i' specifies case insensitivity.

If you need more complicated matching, you can use:

%user@*example.com%i

where an asterisk matches any number of any character.

If you need more powerful patterns, you can use a Perl regular expression surrounded by '/' characters:

/(someone|nobody)\\@.*example.com/i

The Details

Majordomo2 is written in Perl and exposes the power of the Perl pattern matching language to the user. Unfortunately, Perl regular expressions (regexps for short) are somewhat complicated and difficult to learn. They also have their own quirks, which can get in the way of doing simple pattern matching tasks.

To eliminate some of the complexity, Majordomo2 provides two simpler forms of pattern matching in addition to full Perl regular expressions.

A Majordomo2 pattern is composed of enclosing delimiters, modifiers, and the pattern itself. Majordomo2 uses the delimiters to determine the type of the pattern. The modifiers change the behavior of the pattern; generally, the only useful modifier makes the match insensitive to case. There are three supported types of pattern:

substring
shell-like
Perl-like regular expressions

They differ in their delimiters and the number and function of various special characters (the so-called metacharacters).


Substring Patterns

Examples:

  1. "example.com"
  2. "user@somewhere.example.com"i

The delimiter is a double quote. There are no special characters; the pattern matches if the given string occurs anywhere within the string to be matched. A trailing 'i' specifies that the match is done insensitive to case.

Shell-Like Patterns

Examples:

  1. %user@*example.com%i
  2. %u-???@*example.com%i

The delimiter is a percent sign. These patterns are reminiscent of csh or DOS patterns in that a question mark matches any single character, and an asterisk matches any number (including zero) of any character. Character classes with '[' and ']' are also supported.

Perl-Like Regular Expressions

More extreme power is available by using the regular expression language available through Perl. These patterns are called "Perl-like" because there is one important difference between them and real Perl regexps: in Perl version 5 and above, unescaped '@' symbols are not allowed in patterns. This is a constant source of problems for Majordomo2 users because '@' signs are obviously very common in Internet mail addresses.

To ease this difficulty, Majordomo2 checks the syntax of user-supplied regexps, and if they have syntax errors relating to unescaped '@' symbols, they are all escaped and the pattern is checked again. This solves most of the problems. Be careful, though, when writing Perl code. The enhanced Majordomo2 regexps will only provide you with syntax errors.

What follows is a basic discussion of Perl regular expressions. All '@' symbols will be shown escaped for correctness. For more information, consult the Perl documentation or any reasonable book on Perl.


Perl Regular Expressions in Depth

A regular expression is a concise way of expressing a pattern in a series of characters. The full power of regular expressions can make some difficult tasks quite easy, but we will only brush the surface here.

The character "/" is used to mark the beginning and end of a regular expression. Letters and numbers stand for themselves. Many of the other characters are symbolic. Some commonly used ones are:

\\@ the "@" found in nearly all addresses; it must be preceded by a backslash to avoid errors in Perl
. (period) any character
* previous character, zero or more times; note especially...
.* any character, zero or more times
+ previous character, one or more times; so for example...
a+ letter "a", one or more times
\\ next character stands for itself; so for example...
\\. literally a period, not meaning "any character "
^ beginning of the string; so for example...
^a a string beginning with letter "a"
$ end of the string; so for example...
a$ a string ending with letter "a"

Example 1:

/foo\\.example\\.com/

Note: the periods are preceded by a backslash so that they are interpreted as periods, not as wildcards. This pattern matches any string containing:

foo.example.com

such as:

foo.example.com
bar.foo.example.com
user@bar.foo.example.com
users%bar.foo.example.com@example.com

Example 2:

/johndoe\\@.*foo\\.example\\.com/

The `@' has special meaning to Perl and should be prefixed with a backslash to avoid errors. The string ".*" means "any character, zero or more times". So this matches:

johndoe@foo.example.com
johndoe@terminus.foo.example.com
ajohndoe@terminus.foo.example.com@example.com

But it doesn't match:

johndoe@example.com
brent@foo.example.com

Example 3:

/^johndoe\\@.*cs\\.example\\.org$/

This pattern is similar to Example 2 and matches the same first two strings:

johndoe@foo.example.org
johndoe@terminus.foo.example.org

But it doesn't match:

ajohndoe@terminus.foo.example.org@example.com

because the regular expression says the string has to begin with letter "j" and end with letter "g " (between the ^ and $ symbols) and neither of those is true for ajohndoe@terminus.foo.example.org@example.com.

Example 4:

/.*/

This is the regular expression that matches anything.

Example 5:

/.\\*johndoe/

Here the * is preceded by a \\, so it refers literally to an asterisk character and not the symbolic meaning "zero or more times." The . still has its symbolic meaning of "any one character", so it would match:

a*johndoe
s*johndoe

Because the . by itself implies one character, it would not match:

*johndoe

Example 6:

Normally all matches are case sensitive. You can make any match case insensitive by appending an "i" to the end of the expression.

/example\\.com/i

This would match example.com, EXAMPLE.com, ExAmPlE.cOm, etc. Removing the "i":

/example\\.com/

would match example.com but not EXAMPLE.com or any other capitalization.

To be on the safe side put a \\ in front of any characters in the regular expressions that are not numbers or letters. In order to put a / into the regular expression, the same rule holds: precede it with a \\. Thus, with \\ in front of the / and = characters.

/\\/CO\\=US/

This pattern matches /CO=US and may be a useful regular expression to those of you who need to deal with X.400 addresses that contain / characters.

Example 7:

Normally, all whitespace within a pattern is matched verbatim, but it is sometimes desirable to add some additional space within a pattern to make it more readable. For instance, here is a pattern matching common quoting characters in email (", :, word>)

/^(-|:|>|[a-z]+>)/i

This can be a bit difficult to follow, so we can space it out a bit:

/^( - | : | > | [a-z]+> )/xi

The 'x' modifier specifies that whitespace is to be ignored and makes the pattern a bit easier to read. If you want to match actual whitespace, use '\\s'.

Note: the 'x' modifier provides additional functionality to Perl code relating to comments, but because Majordomo2 requires patterns to lie all on a single line, this is not significant here.

 

Last modified June 15, 2005 by dlschmid

jump back to content/page ends, begin footer
jump to content
jump to content Go to page top Page Top | Site Map | OIT | Policy Disclaimer | Site Survey