Jeremiah Grossman: web application security scanner vulnerability statistics assessment methodology

What vulnerabilities (blackbox / whitebox) scanners can and can't find is one of the most important topics in web application security. Innovation is this area will inevitably determine the industry accepted vulnerability assessment methodology. Online business depends on this problem being addressed with right blend of coverage, ease-of-use, and price. For us vendors it’s a battleground for which solutions will ultimately be successful in the market. Competitors who do not adapt and push the technology limits will not be around long. I’ve seen this coming for a while. To the delight of many and frustration of some I’ve offered presentations, released articles, and written blog posts.

Since founding WhiteHat Security I’ve long believed that there was no way a scanner, built by me or someone else, could identify anywhere close to all the vulnerabilities in all websites. For years was I had no good way to explain or justify my position. It wasn’t until I read a fascinating Dr. Dobbs's article (The Halting Problem) from 2003 that established the basis of my current understanding. To quote:

"None other than Alan Turing proved that a Turing Machine (and, thus, all the computers we're interested in these days) cannot decide whether a given program will halt or run continuously. By extension, no program can detect all errors or run-time faults in another program."

Brilliant! This article instantly sparked my interest in the halting problem, undecidable problem, and a bunch of other mathematical proofs. Taking what I had learned, I later introduced the “technical vulnerabilities” and “business logic flaws” terminology during a 2004 Black Hat conference. I guess people enjoyed the terminology because I frequently see others using it. Loosely described, technical vulnerabilities are those that can be found by scanners and business logic flaws must be found by humans (experts).

What needs to be understood is finding each vulnerability class is not exclusive to a single method of identification. Scanners and humans can in fact identify both technical and logical vulnerabilities. How effective they are is the real question. Observe the following diagram. (Don’t take the numbers too literally, the diagram is meant to enforce concepts more than precise measurements.)

Scanners are way more adept at finding the majority of technical vulnerabilities. Mostly because of the vast number of tests required to be exhaustive is too time consuming for a human (expert).
Humans (experts) are much better quited at finding business logic flaws. The issues are highly complex and require contextual understanding, which scanners (computers) lack.
Neither scanner nor human will likely or provably reach 100% vulnerability coverage. Software has bugs (vulnerabilities) and that will probably remain the case for a long time to come.

The coverage scales will slide in different directions with each website encountered. A while back I posted some stats on how vulnerabilities are identified here at WhiteHat. Based on 100 websites, here are the findings.

This numbers are neat on a variety of level. As more people dive into web application security inevitably we’ll see more measurements, reviews, and statistics released. The cloud of the unknown will lift and the most effective assessment methodology will reveal itself. I welcome this trend as I think I'm on the right track. Brass tax...

From what I've seen, malicious web attacks typically target websites on a one-by-one basis rather than shotgun blast approach. The bad guys aren’t using commercial scanners, performing full-blown assessments, or even open source tools for that matter. Frankly because they don’t need to. A web browser and a single vulnerability is all the really need to profit. That’s why I’ve been harping on comprehensiveness and measuring the effectiveness of the assessment methodologies for so long. Finding some of vulnerabilities, some of the time, on some of the websites - ain’t going to cut it. You will get hacked this way. We need to find them all, all of the time, and as fast as possible.

My crystal ball (1-3 years):
1) Standalone back box scanners will transfer from the hands of security personnel to those in development and QA – they’ll merge with the white box scanners and finally tightly integrate inside of established IDE’s.
2) The one-off vulnerability assessment market (professional services) will give way to managed service model, just like they already have in the network VA world.
3) Majority industry consolidation will occur as customers look for singular security vendors that can address the entirety of their vulnerability stack.

Jeremiah Grossman

Friday, November 17, 2006

What scanners can and can't find. Who cares and why does it matter?