Regular expressions (regex) are a way of describing the order and type of characters that occur in a string. They are often used to validate input or search for “wildcard” matches within a set of strings. If the regular expression (rather than the string it is testing) is generated from untrusted input, or a regex that exists in your codebase is poorly designed, an attacker can perform a regex injection attack by sending malicious input that will take a huge amount of computing power to evaluate.
Risks
Regex injection is often used to perform denial-of-service attacks on vulnerable web-servers.
Regex Injection
The following code samples show a web-server that naively allows a “wildcard” search expression to be sent from the client, and evaluates the string as a regular expression against a list of potential matches:
Python
@app.route('/search/<pattern>')
def search(pattern):
# Don't evaluate regexes passed over HTTP!
regex = re.compile(pattern)
matches = (item for item in ITEMS if regex.match(item))
return matches
Java
public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException
{
// Don't evaluate regexes passed over HTTP!
String search = request.getParameter("search");
List<String> matches = ITEMS.stream()
.filter(item -> item.matches(search))
.collect(Collectors.toList());
response.setContentType("application/json");
PrintWriter out = response.getWriter();
out.write(new Gson().toJson(matches));
out.flush();
}
Node
app.get('/search', (request, response) => {
// Don't evaluate regexes passed over HTTP!
const regex = new RegExp(request.query.search)
const matches = items.filter(item => {
return item.match(regex)
})
response.send(
`<div>
<h1>Matches</h1>
<p>${matches.join('</p><p>')}</p>
<a href="/">Back</a>
</div>`)
})
C
public IActionResult Search(string search)
{
// Don't evaluate regexes passed over HTTP!
var regex = new Regex(search);
var matches = Items.Where(item => regex.IsMatch(item)).ToList();
return Json(matches);
}
Regular expression matching can be very slow if the expression matching engine has to perform a lot of backtracking. This occurs when the regular expression contains repeated matching groups, each of which contains a repeating symbol - which means the engine has to evaluate exponentially more different logical branches while scanning a string.
It this case, an “evil” regex in the following form can be supplied:
(.*a){20}
This pattern means “twenty occurrences of: zero or more characters followed by the letter a”. This expression will require an enormous amount of compute time to evaluate against a string such as:
aaaaaaaaaaaaaaaaaaaa!
Sending many such search requests to your server gives an attacker an easy way to perform a denial-of-service attack.
Regex Patterns in Validation
Attackers can take advantage of inefficient regexes even when they do not have control of the form of the regular expression itself. By passing a maliciously crafted “email” parameter to a sign-up page, for instance, they can probe for slow-running validation expressions and attempt to take your website offline.
Mitigation
- Don’t generate regular expressions directly from untrusted input - define them in your codebase.
- Use a search index like Elasticsearch or Lucene for complex searches, rather than running regular expression matches on large datasets.
-
Check any regexes within your codebase for repeating grouped patterns or ambiguous patterns. “Catastrophic” backtracking can be avoided if you follow these rules of thumb:
- Avoid nested quantifiers like
(a+)+
, where a pattern potentially matching multiple characters can be applied multiple times. - Avoid quantified overlapping disjunctions like
(a|a)+
. - Avoid quantified overlapping adjacencies like
\d+\d+
.
- Avoid nested quantifiers like
Automatically Detecting Unsafe Regular Expressions
The Node.js library safe-regex
library can be used to detect unsafe regular expressions:
$ node safe.js '(x+x+)+y'
false
$ node safe.js '(beep|boop)*'
true
$ node safe.js '(a+){10}'
false
$ node safe.js '\blocation\s*:[^:\n]+\b(Oakland|San Francisco)\b'
true
Make sure to test new expressions you define them in your codebase, and consider adding a unit test to scan for unsafe regular expressions if you develop in Node.js.