Jeremiah Grossman: The Best Web Application Vulnerability Scanner in the World

Saturday, October 20, 2007

The Best Web Application Vulnerability Scanner in the World

Update: 10.23.2007 - Wow, people really seem to get emotional about measuring scanner effectiveness. I thought the war of words was limited to the comments on mine and RSnake's blog, but apparently NTOBJECTives came under DDoS attack after Larry's review was released. Looks like it died down cause I get to their website.

--

Looks like this...

Within a few moments of pressing the scan button it’ll find every vulnerability, with zero false positives, generate a pretty looking report, and voila you’re compliant with GLBA, HIPAA, and PCI-DSS. Of course, we all know such a web application scanner is simply not possible to create for a variety of reasons. That’s why each feature or configuration option in the GUI should be considered, compensating for a technical limitation. For example, if ScannerA has a feature ScannerB lacks, this doesn’t mean ScannerB is missing something. It could very well mean ScannerB overcame a hurdle that ScannerA still needs a human to complete. Or, maybe it means ScannerB is indeed limited. It’s often hard to tell which is which even for an expert.

As another example, some scanners have a GUI option to configure what a customized 404 Not Found page looks like. Others don’t need any assistance, because an algorithm handles that logic automatically. Some scanners offer both options just in case. There are many similar examples. Until the scanner is run on a target website, it is impossible to tell what the outcome will be ahead of time; and, even after the fact it’s still tricky to figure out what happened. As such, scanners are not designed to perform an entire vulnerability assessment on their own. At least I hope not. Scanners are designed to help a person save time in completing the process. Unless this is explained up front to customers, they’ll have improperly set expectations and eventual disappointment.

This brings us to the only two reviews publish this year by Jordan Wiens of Network Computing and Larry Suto (Application Security Consultant in San Francisco). Jordan’s Web Application Scanners Rolling Reviews focused on Ajax vulnerability detection with proper tool configuration, while Larry analyzed coverage depth in plain ol’ default scan mode. Guess what!? Very different rankings occurred amongst the scanners. In Network Computing, though the scanners tested (WebInspect, Acunetix, Hailstorm, and N-Stalker) claimed capability of supporting Ajax automatically, all failed except AppScan. Then Larry’s results had little known NTOSpider with top honors for ability to scan deeper than both AppScan and WebInspect. Strange eh? I highly recommend the reading both reviews and drawing your own conclusions.

Most disappointingly though for WhiteHat Security, Jordan ran out of time on his project before he was able to write up a full review of our Sentinel offering. I was really excited about the opportunity to demonstrate how our SaaS technology could spank all the scanners in Ajax support, vulnerability identification, false-positive rate, ease of deployment, reporting, ROI, and any other metric that matters. Another time and another place I guess. However, we were able to run Sentinel through the same environment as everyone else and generate results. So, Jordan was able to publish the following kind words in his follow up that we appreciated. It speaks for itself:

“Besides nabbing all the vulnerabilities discovered by the scanning products, WhiteHat's Sentinel identified e-mail-based XSS vulnerabilities in our sample Web mail application through its combination of manual testing and automated scanning. WhiteHat navigated all our sample Ajax applications without any trouble.

Based on our testing, if you want automated scans of Ajax applications, your best options are Sentinel and AppScan.”

Here’s what I want to leave the post with. Evaluating web application vulnerability scanners is a difficult task for anyone. A person has to be knowledgeable in web application security, capable of understanding the report results, not to mention be able to set up enough real-world websites to make the comparison reasonable. How many people does that eliminate? Then what to measure? Everyone has a different point of view of what is meaningful. Do we measure vuln to vulns, Ajax support, scanner depth, usability, reporting capabilities, etc? Each metric has value, but not to everyone all the time. To assist, Anurag Agarwal is helping WASC create a Web Application Security Scanner Evaluation Criteria (WASSEC) with assistance from the community. It should be highly useful when completed.

10 comments:

Jordan said...: I was also disappointed that I didn't get to finish the full set of SaaS comparisons as well. It would have been much more valuable I think to have a more complete set of options laid out for folks.

Larry's report is quite interesting. I talked to NTObjectives late into the review process about possibly participating and they were quite confident of their code-coverage, but not only would we have had to squeeze them into the very end time-wise, but they also didn't claim to have the Ajax support we were testing for and thus decided to not participate. Given the requirements of the review, that was probably the right decision.; October 20, 2007 at 2:25 PM
Anonymous said...: So everyone agrees that Larry's report was bunk?

That - or either IBM is looking to buy WhiteHat (and possibly using their money/power over NWC to sway opinion), or WhiteHat is looking to be bought by IBM.

I'm more likely to believe just the former.; October 20, 2007 at 4:36 PM
Jeremiah Grossman said...: Heya Jordan,

SaaS match-up woulda been nice. :) Maybe something for early 2008 though. Someone is going to have to do it eventually. And NTO's performance actually didn't surprise me in the least. Very smart guys worked on that project and while relatively unknown, the technology has spent years in development. Personally I think we can simplify these reviews down to a few criteria, as long as the test websites are reasonable enough.

Hope all is well.; October 20, 2007 at 4:37 PM
Jeremiah Grossman said...: ehehe, good ol' ntp.

For my part I don't think Larry's report was bunk at all. This was his results based up on his criteria. There is no question that scanners also run wildly different from websites to website.

Are insinuating that Jordan was somehow bought off on his report via IBM? As someone who worked closely with Jordan throughout the process I'm sure this was not the case. Time was indeed short.; October 20, 2007 at 4:44 PM
dre said...: Yeah I didn't actually think that was the case. Sorry for insinuating anything, it wasn't intended as that.

Can you create a post that addresses the reasons what you liked/didn't like certain things about Larry's paper, please?; October 20, 2007 at 4:51 PM
Jeremiah Grossman said...: OMG, ntp == dre and dre == ntp! :)

I suppose I can dig in to the report and perhaps find something people may find compelling. Who knows though. While there might not be anything particularly wrong with the report, I mean hey, crawl depth was his focus and its titled that way...
that's also not to say he focused on the most important things possible... and of course thats all subjective anyway. For myself, I prefer time to hack measurements and I've never actually seen one done yet.; October 20, 2007 at 5:02 PM
Jeremiah Grossman said...: Oh oh, and that doesn't mean the latter is true either. :); October 20, 2007 at 5:03 PM
dre said...: charlie miller talked about a time to hack measurement (as well as a `numbered of times discovered' metric) in his talks at toorcon over this past weekend.

i think the real benefit of code coverage is to look at attack paths in unique ways (what charlie's talk was all about). code coverage itself is to be used as a tool, with no strict methodology or end-result from it. in other words, people should be using code coverage (or fuzzer tracking ... fault-injection trackers) in order to increase time between findings (time to hack).

i don't think code coverage (from both a software security testing standpoint and a developer-tester or quality tester perspective) is best utilized as a benchmarking tool or measurement. there are other, better benchmarking tools/measurements such as binary classifiers - sensitivity and specificity.

if we want web application security scanners to be "crawl tools" as their primary benchmark criteria, then i'm very worried about the state of the art in application security assurance. we're talking about spidering technology that is 12 years old and freely available in many mature open-source tools.

How do I shot web?

note: the above url is down right now (ed is having problems) so instead try this link for now; October 21, 2007 at 10:04 PM
Jeremiah Grossman said...: hey dre, you make some good points and observations, and I not inclined to disagree with any of them... except the crawling bit.

While crawling is definitely older technology, its anything but well understood when it comes to website VA. Crawling a website is WAY more involved now than parsing an HTML document and finding link tags. In VA, you have to worry about state management, forms, multiple logic flows, CAPTCAHs, JavaScript, and other RIA. It sounds simple because search engines are a dime a dozen, but I'd be surprised if commercial grade VA products way outperformed any crawler available anywhere else, open source or otherwise.

In fact, that might be a good topic for a future blog post.; October 22, 2007 at 8:51 AM
Jordan said...: Thanks Jeremiah, appreciate the support.

@Dre: Hah! IBM swaying opinion? Not a chance.

1) I have no idea who pays what in the magazine.

2) I'm not a full-time writer, I'm a full-time security guy who just happens to write and work for a magazine on the side, so I don't really know all that much about what goes on behind the scenes at the magazines operationally.

3) No one else changed a single bit of the results in the reviews, suggested changes to the details, findings, etc. Fortunately though they do clean my writing up significantly, for which every person who reads the articles should be grateful.

4) I've found that any of the writers I've worked with at Information Week and Network Computing are more than willing to write what they believe is right no matter who it's about, or who's giving the magazine money or not. We are sometimes wrong (everybody's human), but it's certainly not because of outside influence when it happens.

4) I was actually somewhat surprised by AppScan taking the top spot. When I had last looked at these products (not for the magazine but for my day-job) some time ago, Cenzic won out as both WF and Spi had reliability issues.

So if I had any preconceived notions, it certainly wasn't that Watchfire would be the winner among the products.

All I did was set the criteria and let the products prove themselves.

Which is exactly the point Jeremiah made about NTO and the other report. Different needs produce different results and this isn't surprising or even a bad thing.

If one product was always terrible, they'd eventually go out of business. There's usually (though not always) a legitimate scenario where any given product is more appropriate.

The issue is whether that scenario is relevant to more or less folks in the real world.

Different strokes for different folks.

Which is again why I was bummed not to have more SaaS coverage. It's relevant to folks making buying decisions these days and therefore it's good to include in a review. I am glad that I at least got to highlight it somewhat, but do agree with Jeremiah that a more detailed comparison in the future would be useful.; October 24, 2007 at 12:09 PM