AllowedPattern is using python re module with user input

Bug #1217194 reported by Clint Byrum
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Triaged
Medium
Unassigned

Bug Description

A malicious user could eat up exponential amounts of CPU time and memory by using backreferences. For example

r"(\S+)+x"

If there are no "x" chars in a large body of text, the number of operations increases exponentially with every character being searched.

This capability must be controlled in some way, either by using a less capable regular expression matcher, or finding a way to detect and disable back references.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Just want to provide more data here, as it may not be clear how dangerous processing raw regular expressions can be:

clint@clint-HP:~/src$ cat /tmp/test.py
import re
import sys

n = int(sys.argv[1])

x = re.compile(('a?' * n) + ('a' * n))
print bool(x.match('a' * n))
clint@clint-HP:~/src$ time python /tmp/test.py 20
True

real 0m0.085s
user 0m0.080s
sys 0m0.004s
clint@clint-HP:~/src$ time python /tmp/test.py 25
True

real 0m2.062s
user 0m2.048s
sys 0m0.012s
clint@clint-HP:~/src$ time python /tmp/test.py 30
^CTraceback (most recent call last):
  File "/tmp/test.py", line 7, in <module>
    print bool(x.match('a' * n))
KeyboardInterrupt

real 0m22.336s
user 0m22.284s
sys 0m0.008s
clint@clint-HP:~/src$ time python /tmp/test.py 26
True

real 0m4.165s
user 0m4.148s
sys 0m0.008s
clint@clint-HP:~/src$ time python /tmp/test.py 27
True

real 0m8.501s
user 0m8.456s
sys 0m0.028s

Note that with every added character, the CPU time doubles.

Changed in heat:
importance: Medium → High
Revision history for this message
Jessica Lucci (jessicalucci14) wrote :

https://pypi.python.org/pypi/re2/

Is a python wrapper around google's re2 regex package. It provides the same functionality as most of perl's syntax, although it drops support for backrefs altogether. There's no way to really detect catastrophic backtracking 100% of the time (sans manual intervention), so I think it makes sense to drop the support for backrefs altogether.

Revision history for this message
Angus Salkeld (asalkeld) wrote :

It doesn't seem to install very nicely :(

Downloading/unpacking re2 (from -r requirements.txt (line 15))
  Downloading re2-0.2.20.tar.gz (1.9MB): 1.9MB downloaded
  Running setup.py egg_info for package re2
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "/tmp/pip_build_root/re2/setup.py", line 45, in <module>
        raise OSError("Cannot find RE2 library. Please install it from http://code.google.com/p/re2/wiki/Install")
    OSError: Cannot find RE2 library. Please install it from http://code.google.com/p/re2/wiki/Install
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/tmp/pip_build_root/re2/setup.py", line 45, in <module>

    raise OSError("Cannot find RE2 library. Please install it from http://code.google.com/p/re2/wiki/Install")

OSError: Cannot find RE2 library. Please install it from http://code.google.com/p/re2/wiki/Install

Revision history for this message
Angus Salkeld (asalkeld) wrote :
Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :

I tried to use the "regex" package - it installs nicely and runs almost twice faster then the "re" given Clint's example from above. How could I check that it actually releases the GIL as it claims to?

Changed in heat:
assignee: nobody → Pavlo Shchelokovskyy (pshchelo)
Changed in heat:
assignee: Pavlo Shchelokovskyy (pshchelo) → nobody
Revision history for this message
Gregory Haynes (greghaynes) wrote :

Although a more performant regex engine is nice I dont think it actually solves the problem, just (possibly) makes it a little bit harder to make malicious input.

Could we use fnmatch (http://docs.python.org/2/library/fnmatch.html) rather than true regex's? Looking at the tests they appear all to be trivially convertable to this format.

Changed in heat:
assignee: nobody → Matthew Gilliard (matthew-gilliard-u)
Revision history for this message
Matthew Gilliard (matthew-gilliard-u) wrote :

I don't think fnmatch will work, actually. The first example I looked at:

allowed_pattern: "[a-zA-Z]+"

is not convertible to fnmatch syntax.

Revision history for this message
jan grant (jan-grant) wrote :

The trick, perhaps, is to *actually* use "true" regexes - that is, regular (in the technical sense of the word) expressions. An NFA implementation typically has reasonable performance. The problem in the example arises when the parentheses (used for syntactic grouping only in REs) are overloaded with support for 'match groups' in the output. Do we ever need to get at these subgroups within heat?

If this is to support an "allowed pattern" only, then finding an RE library that turns off backref/match group support and falls back to a stock NFA implementation would be one potential approach.

Revision history for this message
Matthew Gilliard (matthew-gilliard-u) wrote :

So. Options, in order of (my) preference:

(1) Work out how to package re2 so that it can be pip installed cleanly (https://github.com/axiak/pyre2/issues/27)

(2) background process for regex evaluation, with a watchdog timer

(3) inspect user-supplied regexes to filter out the more obvious degenerate cases

(4) restrict allowed expressions to those expressible by fnmatch

(5) write our own string-matching code

I'll continue working on

Revision history for this message
Matthew Gilliard (matthew-gilliard-u) wrote :

(sorry, I hit "post" too soon)

I'll continue working on this. I don't see any NFA regex engine for python except the re2 bindings, and if that proves difficult to integrate I'll follow up with (2) unless I get any other feedback.

Changed in heat:
assignee: Matthew Gilliard (matthew-gilliard-u) → nobody
Revision history for this message
Matthew Gilliard (matthew-gilliard-u) wrote :

I haven't spent enough time with this to make much progress. I've unassigned myself, although if nobody picks it up I will pick it up if I get more time.

Changed in heat:
assignee: nobody → Nikunj Aggarwal (nikunj2512)
Changed in heat:
assignee: Nikunj Aggarwal (nikunj2512) → nobody
Revision history for this message
Matthew Gilliard (matthew-gilliard-u) wrote :

Spoke with someone about this on IRC. Compiling re2 as part of the pip installation should be possible, although it relies on a lib not available by default in RH. Would mean a stackforge fork of pyre2.

Angus Salkeld (asalkeld)
Changed in heat:
importance: High → Medium
Rico Lin (rico-lin)
Changed in heat:
milestone: none → no-priority-tag-bugs
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.