What is ReDoS? How to prevent it?

What is ReDoS? How to prevent it?

Published on: April 17, 2024

Regular expression Denial of Service (ReDoS) attacks may cause your web application to be slow and unresponsive. This attack relies on catastrophic backtracking caused by specially constructed input for unoptimized regular expressions.

WARNING: We’re not responsible for damage caused by ReDoS attacks! Malicious hacking is a computer crime and you may face legal consequences! This post is meant to gain awareness about ReDoS attacks and give a way to prevent those vulnerabilities.

"Evil Regex"

Let's take /^(a|a)*$/ as an example. The visualization looks like this:

Visualization of /^(a|a)*$/ regular expression

Visualization of /^(a|a)*$/ regular expression

As you will see, it will first match the beginning of the string, then any amount of either "a" or "a" character, then it will match the end of the string.

Let's try aaaaaaaaaaaaaaaaaaaaaaaab input on a backtracking regular expression engine. There is the beginning, then multiple "a" characters, but there is a "b" character, and there is another possibility of "a", so the regular expression engine will backtrack and try to match "a" again.

a
aa
aaa
aaaa
aaaaa
...
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaa[BACKTRACK]
...

As you will see, there is an exponential function in amount of backtracks (1 + 2 + 22 + ... + 2n = 2n+1 - 1). The amount of steps has an exponential function, so the amount grows very fast (faster than linear function). The fast growth of steps then causes the matching to be very slow for long strings.

Let's take another example: /^a*a*$/, that matches the same strings as previous regular expression.

Visualization of /^a*a*$/ regular expression

Visualization of /^a*a*$/ regular expression

As you will see, it will first match the beginning of the string, then any amount of "a" character, then also any amount of "a" character, then it will match the end of the string.

Let's try the same aaaaaaaaaaaaaaaaaaaaaaaab input as before on a backtracking regular expression engine. There is the beginning, then multiple "a" character (first a*), but there is a "b" character, and there is another possibility of any amount of "a" character (second a*), so the regular expression engine will backtrack and try to match any amount of "a".

a
aa
aaa
aaaa
aaaaa
...
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaa[BACKTRACK]
...

As you will see, the amount of steps after first backtrack is polymonial: (1 + 2 + 3 + 4 + 5 + ... + n = ((n+1) * n)/2) = (n2 + n)/2). The amount of steps has an polymonial function, so the amount grows faster than linear function. The fast growth of steps then causes the matching to be slow for very long strings.

Overall, "Evil Regex" contains:

  • Grouping with repetition
  • Inside the repeated group:
    • Repetition
    • Alternation with overlapping

There are also other examples of "Evil Regex":

  • (a+)+
  • ([a-zA-Z]+)*
  • (a|aa)+
  • (a|a?)+
  • (.*a){x} for x ≥ 10

All the above are causing slowness, when the aaaaaaaaaaaaaaaaaaaaaaaa! string is tried.

ReDoS prevention

You can prevent ReDoS by optimizing regular expressions: /^(a|a)*$/ and /^a*a*$/ could be optimized into /^a*$/.

Optimizations for examples of "Evil Regex":

  • (a+)+ could be optimized into a+
  • ([a-zA-Z]+)* could be optimized into [a-zA-Z]*
  • (a|aa)+ could be optimized into a+
  • (a|a?)+ could be optimized into a*
  • (.*a){x} could be optimized into ([^a]*a){x}

You can check if the regular expression is vulnerable by checking it with ReDoS checker; Devina.io has one of them.

/^(a|a)*$/ regular expression is checked

/^(a|a)*$/ regular expression is checked

You can also switch to non-backtracking regular expression engine, such as RE2 or one used in Rust regex library.