Wednesday, December 22, 2010

Which mountain would you rather climb?

Some Web application vulnerability scanners, dynamic and static analysis, are designed for comprehensiveness over accuracy. For others, the exact opposite is true. The tradeoff is that as the number of "checks" a scanner attempts increases causes the amount of findings, false-positives, scan times, site impact, and required man-hour investment to grow exponentially. To allow users to choose their preferred spot between those two points, comprehensiveness and accuracy, most scanners offer a configuration dial typically referred to as a "policy." Policies essentially ask, "What do you want to check for?" Whichever direction the comprehensiveness dial is turned will have a profound effect on the workload to analyze the results. Only this subject isn't discussed much.

Before going further we need to define a few terms. A "finding" is something reported that’s of particular interest. It may be a vulnerability, the lack of a “best-practice” control, or perhaps just something weird warranting further investigation. Within those findings are sure to be "false-positives" (FP) and "duplicates" (DUP). A false-positive is a vulnerability that’s reported, but really isn’t one for any variety of potential reasons. Duplicates are when the same real vulnerability is reported multiple times. "False-negatives," (FN) which reside outside the findings pool, are real vulnerabilities with true organizational risk, that for whatever reason the scanner failed to identify.

Let’s say the website owner wants a "comprehensive" scan. A scan that will attempt to identify just about everything modern day automation is capable of checking for. In this use-case it is not uncommon for scanners to generate literally thousands, often tens or hundreds of thousands, of findings that need to be validated to isolate the ~10% of stuff that’s real (yes, a 90% FP/DUP rate). For some spending many many hours vetting is acceptable. For others, not so much. That’s why the larger product vendors all have substantial consulting divisions to handle deployment and integration post-purchase. Website owners can also opt for a more accurate (point-and-shoot) style of scan where comprehensiveness may be cut down by say half, but thousands of findings becomes a highly accurate hundreds or dozens thereby decreasing validation workload to something manageable.

At this point it is important to note, as illustrated in the diagram, even today’s top-of-the-line Web application vulnerability scanners can only reliably test for roughly half of the known Web application classes of attack. These are the technical vulnerability (aka syntax related) classes including SQL Injection, Cross-Site Scripting, Content-Spoofing, and so on. This holds true even when the scanner is well-configured (logged-in and forms filled out). Covering the other half, the business logic flaws (aka semantic related) such as Insufficient Authentication, Insufficient Authorization, Cross-Site Request Forgery, etc. require some level of human analysis.

With respect to scanner output, an organizations tolerance for false-negatives, false-positives, and personnel resources investment is what should dictate the type of product or scan configuration selected. The choice becomes a delicate balancing act. Dialing up scanner comprehensiveness too high, get buried in a tsunami of findings. What good is comprehensiveness if you can’t find the things that are truly important? On the other hand dialing down the noise too far reduces the number of vulnerabilities identified (and hopefully fixed) to the point where there's marginal risk reduction because the bad guys could easily find one that was missed. The answer is somewhere in the middle and one of risk management.

About 20 km west of Mount Everest (29,029 ft. ASL) is a peak called Cho Oyu (26,906 ft. ASL), the 6th highest mountain in the world. The difference being the two is only 2,000 ft. For some mountain climbers the physical difficulty, risk of incident, and monetary expense of that last 2,000 ft necessary to summit Everest is just not worth it. For others, it makes all the difference in the world. So, just like scanner selection, an individual decision must be made. Of course the vendor in me says just use WhiteHat Sentinel and we’ll give you a lift to the top of whichever mountain you’d like. :)
Vendors take Note: Historically, whenever I've discussed scanners and scanner performance the comments would typically be superficial marketing BS with no willingness to supply evidence to backup the claims. As always I encourage open discourse, but respectfully if you make claims about your product performance, and I sincerely hope you do, please be ready to do so with data. Without data, as Jack Daniel as concisely stated, we'll assume you are bluffing, guessing, or lying.


AppSec said...

I hate the phrasing false positive. It's like a double negative.

There's really misconfigured/defined policies and then theres an acceptance of a valid finding.

Those findings of the word password in a comment are correct, but they carry 0% risk. that is not a false postive.

The scanner that returns a XSS vulnerability because it doesn't undrestand that a static final variable doesn't change and won't accept user input is a misconfigured policy.

They are two different things.

Jeremiah Grossman said...

@AppSec: Much of the infosec industry is littered with terms or phrases we'd prefer to do away with. Unfortunately, changing ingrained language is tough, and engaging in such battles is one where we should choose very wisely. For me, the general use of the term false-positive is OK in my estimation. Cross-Site Scripting, now that's a whole other matter entirely. :)

eric said...

Hey,if thats the way,i wud like to climb to mount everest,as if we get there,getting famour will not take time