Before going further we need to define a few terms. A "finding" is something reported that’s of particular interest. It may be a vulnerability, the lack of a “best-practice” control, or perhaps just something weird warranting further investigation. Within those findings are sure to be "false-positives" (FP) and "duplicates" (DUP). A false-positive is a vulnerability that’s reported, but really isn’t one for any variety of potential reasons. Duplicates are when the same real vulnerability is reported multiple times. "False-negatives," (FN) which reside outside the findings pool, are real vulnerabilities with true organizational risk, that for whatever reason the scanner failed to identify.
Let’s say the website owner wants a "comprehensive" scan. A scan that will attempt to identify just about everything modern day automation is capable of checking for. In this use-case it is not uncommon for scanners to generate literally thousands, often tens or hundreds of thousands, of findings that need to be validated to isolate the ~10% of stuff that’s real (yes, a 90% FP/DUP rate). For some spending many many hours vetting is acceptable. For others, not so much. That’s why the larger product vendors all have substantial consulting divisions to handle deployment and integration post-purchase. Website owners can also opt for a more accurate (point-and-shoot) style of scan where comprehensiveness may be cut down by say half, but thousands of findings becomes a highly accurate hundreds or dozens thereby decreasing validation workload to something manageable.
At this point it is important to note, as illustrated in the diagram, even today’s top-of-the-line Web application vulnerability scanners can only reliably test for roughly half of the known Web application classes of attack. These are the technical vulnerability (aka syntax related) classes including SQL Injection, Cross-Site Scripting, Content-Spoofing, and so on. This holds true even when the scanner is well-configured (logged-in and forms filled out). Covering the other half, the business logic flaws (aka semantic related) such as Insufficient Authentication, Insufficient Authorization, Cross-Site Request Forgery, etc. require some level of human analysis.
With respect to scanner output, an organizations tolerance for false-negatives, false-positives, and personnel resources investment is what should dictate the type of product or scan configuration selected. The choice becomes a delicate balancing act. Dialing up scanner comprehensiveness too high, get buried in a tsunami of findings. What good is comprehensiveness if you can’t find the things that are truly important? On the other hand dialing down the noise too far reduces the number of vulnerabilities identified (and hopefully fixed) to the point where there's marginal risk reduction because the bad guys could easily find one that was missed. The answer is somewhere in the middle and one of risk management.
About 20 km west of Mount Everest (29,029 ft. ASL) is a peak called Cho Oyu (26,906 ft. ASL), the 6th highest mountain in the world. The difference being the two is only 2,000 ft. For some mountain climbers the physical difficulty, risk of incident, and monetary expense of that last 2,000 ft necessary to summit Everest is just not worth it. For others, it makes all the difference in the world. So, just like scanner selection, an individual decision must be made. Of course the vendor in me says just use WhiteHat Sentinel and we’ll give you a lift to the top of whichever mountain you’d like. :)
Vendors take Note: Historically, whenever I've discussed scanners and scanner performance the comments would typically be superficial marketing BS with no willingness to supply evidence to backup the claims. As always I encourage open discourse, but respectfully if you make claims about your product performance, and I sincerely hope you do, please be ready to do so with data. Without data, as Jack Daniel as concisely stated, we'll assume you are bluffing, guessing, or lying.