monkey-mind :: code :: perl tricks

Evaluating variable regular expressions

We all know how to match a variable against a regular expression:

$var =~ /^foo.*(b[Aa]r)?$/;

But what if the regular expression is variable? Suppose we want to match the string “foobar” against an input variable $var?

1. Naive approach

We could just say:

'foobar' =~ /$var/;

Which works if $var contains a valid regular expression. If it contains, say “foo(bar” you are likely to see the following:

$ cat > foobar1 << EOF
$var = 'foo(bar';
'foobar' =~ /$var/;
EOF
$ perl foobar1
Unmatched ( in regex; marked by <-- HERE in m/foo( <-- HERE bar/ at foobar1 line 2.

This error is fatal, and will usually crash your application.

2. Less naive approach

The way to catch exceptions in Perl is to use the eval() construct, right? So what if we just pack the whole thing in an eval():

eval "'foobar' =~ /$var/";

Mmmh, try this:

$ cat > foobar2 << EOF
$var = 'foo/; print "ALL YOUR BASE ARE BELONG TO US!\n"; /bar';
eval "'foobar' =~ /$var/";
EOF
$ perl foobar2
ALL YOUR BASE ARE BELONG TO US!

Here, the $var is expanded first, yielding:

'foobar' =~ /foo/; print "ALL YOUR BASE ARE BELONG TO US!\n"; /bar/

Perl then evaluates that and correctly spits out the message.

If $var comes from someplace other than a constant expression under your direct control, you generaly do not want this.

(If you do make this mistake, rest assured; you are in good company: an early version of the RIPE database software (around 1996) contained exactly this bug.)

3. The right approach

So, using eval() was already a step in the right direction, but we want to prevent the execution of arbitrary code. Note that if you play around a bit and use the regex from foobar1 in foobar2 or vice versa (example), you won't see any adverse effect. Maybe there's a way to combine the two?

How about:

eval "'foobar' =~ /\$var/";

Or even better:

eval { 'foobar' =~ /$var/ };

It looks like the eval() from foobar2, but it effectively delays expansion of the $var reference, so the effect is to execute the matching statement from foobar1 within the eval(). This catches the flaws of the previous two approaches and does everything you need it to do.

To see for yourself, try running the following script:

sub do_eval {
    my $var = shift;

    if (eval {'foobar' =~ /$var/}) {
        print "foobar matched /$var/\n";
    }
    elsif (length $@) {
        print "ERROR: bad regex given to do_eval()\n";
    }
    else {
        print "foobar did not match /$var/\n";
    }
}

do_eval('foo(bar');
do_eval('foo/; print "ALL YOUR BASE ARE BELONG TO US!\n"; /bar');
do_eval('foo.*(bar)?');

This should produce:

ERROR: bad regex given to do_eval()
foobar did not match /foo/; print "ALL YOUR BASE ARE BELONG TO US!\n"; /bar/
foobar matched /foo.*(bar)?/

Main Code Perl Tricks Eval regex IPv6 expand IPv6 as_string Absolute paths	Evaluating variable regular expressions We all know how to match a variable against a regular expression: $var =~ /^foo.(b[Aa]r)?$/; But what if the regular expression is variable? Suppose we want to match the string “`foobar`” against an input variable `$var`? 1. Naive approach We could just say: 'foobar' =~ /$var/; Which works if `$var` contains a valid* regular expression. If it contains, say “`foo(bar`” you are likely to see the following: $ cat > foobar1 << EOF $var = 'foo(bar'; 'foobar' =~ /$var/; EOF $ perl foobar1 Unmatched ( in regex; marked by <-- HERE in m/foo( <-- HERE bar/ at foobar1 line 2. This error is fatal, and will usually crash your application. 2. Less naive approach The way to catch exceptions in Perl is to use the `eval()` construct, right? So what if we just pack the whole thing in an `eval()`: eval "'foobar' =~ /$var/"; Mmmh, try this: $ cat > foobar2 << EOF $var = 'foo/; print "ALL YOUR BASE ARE BELONG TO US!\n"; /bar'; eval "'foobar' =~ /$var/"; EOF $ perl foobar2 ALL YOUR BASE ARE BELONG TO US! Here, the `$var` is expanded first, yielding: 'foobar' =~ /foo/; print "ALL YOUR BASE ARE BELONG TO US!\n"; /bar/ Perl then evaluates that and correctly spits out the message. If `$var` comes from someplace other than a constant expression under your direct control, you generaly do not want this. (If you do make this mistake, rest assured; you are in good company: an early version of the RIPE database software (around 1996) contained exactly this bug.) 3. The right approach So, using `eval()` was already a step in the right direction, but we want to prevent the execution of arbitrary code. Note that if you play around a bit and use the regex from `foobar1` in `foobar2` or vice versa (example), you won't see any adverse effect. Maybe there's a way to combine the two? How about: eval "'foobar' =~ /\$var/"; Or even better: eval { 'foobar' =~ /$var/ }; It looks like the `eval()` from `foobar2`, but it effectively delays expansion of the `$var` reference, so the effect is to execute the matching statement from `foobar1` within the `eval()`. This catches the flaws of the previous two approaches and does everything you need it to do. To see for yourself, try running the following script: sub do_eval { my $var = shift; if (eval {'foobar' =~ /$var/}) { print "foobar matched /$var/\n"; } elsif (length $@) { print "ERROR: bad regex given to do_eval()\n"; } else { print "foobar did not match /$var/\n"; } } do_eval('foo(bar'); do_eval('foo/; print "ALL YOUR BASE ARE BELONG TO US!\n"; /bar'); do_eval('foo.(bar)?'); This should produce: ERROR: bad regex given to do_eval() foobar did not match /foo/; print "ALL YOUR BASE ARE BELONG TO US!\n"; /bar/ foobar matched /foo.(bar)?/
	Copyright © 2004-2025 Steven Bakker

.: monkey-mind :.

Evaluating variable regular expressions

1. Naive approach

2. Less naive approach

3. The right approach