Tom Anderson | 18 Aug 19:52

Re: Performance tip

Brian Rectanus wrote:
> I don't think the comments should be for translating the RE, but
> instead what the REs purpose is or the logical steps it is is
> following in a chained rule.  So, not 'matches foo followed by any
> chars up to bar', but 'detect foobar attack by looking for attack
> signature from CVE-blah'.  Other comments could be an example of an
> attack request, etc.  I think these comments are far more useful then
> trying to explain the RE syntax.  For multiple attack signatures
> combined into a single rule via '|', this becomes hard to comment with
> just Apache comments.

Fair enough, I agree that rules should be commented to document their 
general effect or purpose.  Completely unrelated rules should be kept 
seperate for logical distinction without too much impact on performance.

> Although Ivan's note was for ORing simple rules -- which I think is
> good -- I am still not convinced that this gives that much performance
> benefit from anything but ORing simple matches like keywords.  In
> other words, I don't think all cases will benefit here.  I still have
> yet to see hard numbers that show that combining 100 complex rules
> down to one has performance benefits worth the extra complexity and
> error-prone nature of more complex rules.
> 
> Anyone have hard stats with numbers they would like to share?

I threw together a quick script to benchmark the difference between 
various optimizations.  You can view/download the script here: 
http://orderamidchaos.com/modsec/regex-benchmark

You can either enter your own request input as a parameter or choose 1, 
2, or 3 to test my built-in samples (actually taken from my audit log).

I tested four different cases using the same 29 distinct rules.  In the 
first case, they are atomized into one line per rule.  Next, they are 
combined into two distinct rules.  Third, they are condensed to reduce 
backtracking.  And finally, they are made non-capturing.  Here are the 
results:

./regex-benchmark 1

                  Rate    atomized     combined    condensed noncapturing
atomized       8029/s          --         -93%         -93%         -93%
combined     113298/s       1311%           --          -0%          -1%
condensed    113752/s       1317%           0%           --          -0%
noncapturing 114323/s       1324%           1%           1%           --

./regex-benchmark 2

                  Rate    atomized     combined    condensed noncapturing
atomized       6894/s          --         -93%         -93%         -93%
combined      99964/s       1350%           --          -0%          -0%
condensed    100161/s       1353%           0%           --          -0%
noncapturing 100238/s       1354%           0%           0%           --

./regex-benchmark 3

                  Rate    atomized     combined    condensed noncapturing
atomized       7854/s          --         -93%         -93%         -93%
combined     109253/s       1291%           --          -1%          -1%
condensed    110287/s       1304%           1%           --          -0%
noncapturing 110636/s       1309%           1%           0%           --

So you can see that there is a fairly significant performance 
difference, with the non-capturing condensed rules performing over 1300% 
better than the distinct rules.  There isn't a huge difference between 
the other optimizations, but simply combining distinct rules into fewer, 
more complex rules provides a major improvement.

Granted, this is done natively in Perl, not in Apache.  But since 
ModSecurity is using the Perl regex engine, the comparision should be 
close.  If anything, the differences should be more drastic in Apache 
due to additional overhead between rules such as ModSecurity's 
processing and logging.

Furthermore, this is only a tiny subset of the rules contained in most 
ModSecurity configurations.  Combining dozens or hundreds of rules 
should provide even more benefit.

It would also follow that eliminating rules altogether should provide a 
nice performance boost, so weeding out those which are extraneous may 
well be worth your time.

Tom

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

Gmane