.: monkey-mind :.

sheep go to heaven, goats go to hell

Evaluating variable regular expressions

We all know how to match a variable against a regular expression:

$var =~ /^foo.*(b[Aa]r)?$/;

But what if the regular expression is variable? Suppose we want to match the string “foobar” against an input variable $var?

1. Naive approach

We could just say:

'foobar' =~ /$var/;

Which works if $var contains a valid regular expression. If it contains, say “foo(bar” you are likely to see the following:
$ cat > foobar1 << EOF
$var = 'foo(bar';
'foobar' =~ /$var/;
EOF
$ perl foobar1
Unmatched ( in regex; marked by <-- HERE in m/foo( <-- HERE bar/ at foobar1 line 2.

This error is fatal, and will usually crash your application.

2. Less naive approach

The way to catch exceptions in Perl is to use the eval() construct, right? So what if we just pack the whole thing in an eval():

eval "'foobar' =~ /$var/";

Mmmh, try this:

$ cat > foobar2 << EOF
$var = 'foo/; print "ALL YOUR BASE ARE BELONG TO US!\n"; /bar';
eval "'foobar' =~ /$var/";
EOF
$ perl foobar2
ALL YOUR BASE ARE BELONG TO US!

Here, the $var is expanded first, yielding:

'foobar' =~ /foo/; print "ALL YOUR BASE ARE BELONG TO US!\n"; /bar/

Perl then evaluates that and correctly spits out the message.

If $var comes from someplace other than a constant expression under your direct control, you generaly do not want this.

(If you do make this mistake, rest assured; you are in good company: an early version of the RIPE database software (around 1996) contained exactly this bug.)

3. The right approach

So, using eval() was already a step in the right direction, but we want to prevent the execution of arbitrary code. Note that if you play around a bit and use the regex from foobar1 in foobar2 or vice versa (example), you won't see any adverse effect. Maybe there's a way to combine the two?

How about:

eval "'foobar' =~ /\$var/";

Or even better:
eval { 'foobar' =~ /$var/ };

It looks like the eval() from foobar2, but it effectively delays expansion of the $var reference, so the effect is to execute the matching statement from foobar1 within the eval(). This catches the flaws of the previous two approaches and does everything you need it to do.

To see for yourself, try running the following script:

sub do_eval {
    my $var = shift;

    if (eval {'foobar' =~ /$var/}) {
        print "foobar matched /$var/\n";
    }
    elsif (length $@) {
        print "ERROR: bad regex given to do_eval()\n";
    }
    else {
        print "foobar did not match /$var/\n";
    }
}

do_eval('foo(bar');
do_eval('foo/; print "ALL YOUR BASE ARE BELONG TO US!\n"; /bar');
do_eval('foo.*(bar)?');

This should produce:

ERROR: bad regex given to do_eval()
foobar did not match /foo/; print "ALL YOUR BASE ARE BELONG TO US!\n"; /bar/
foobar matched /foo.*(bar)?/