Monday, May 07, 2018

All these vulnerabilities, rarely matter.

There is a serious misalignment of interests between Application Security vulnerability assessment vendors and their customers. Vendors are incentivized to report everything they possible can, even issues that rarely matter. On the other hand, customers just want the vulnerability reports that are likely to get them hacked. Every finding beyond that is a waste of time, money, and energy, which is precisely what’s happening every day. Let’s begin exploring this with some context:

Within any Application Security vulnerability statistics report published over the last 10 years, they’ll state that the vast majority of websites contain one or more serious issues — typically dozens. To be clear, we’re NOT talking about website infected with malvertizements or network based vulnerabilities that can trivially found via Shodan and the like. Those are separate problems. I’m talking exclusively about Web application vulnerabilities such as SQL Injection, Cross-Site Scripting, Cross-Site Request Forgery, and several dozen more classes. The data shows only half of those reported vulnerabilities ever get fixed and doing so take many months. Pair this with Netcraft’s data that states there’s over 1.7B sites on the Web. Simple multiplication tells us that’s A LOT of vulnerabilities in the ecosystem laying exposed. 

The most interesting and unexplored question to me these days is NOT the sheer size of the vulnerability problem, or why so many issue remain unresolved, but instead figuring out why all those ‘serious’ website vulnerabilities are NOT exploited. Don’t get me wrong, a lot of websites certainly do get exploited, perhaps on the order of millions per year, but it’s certainly not in the realm of tens or even hundreds of millions like the data suggests it could be. And the fact is, for some reason, the vast majority of plainly vulnerable websites with these exact issues remain unexploited for years upon years. 

Some possible theories as to why are:
  1. These ‘vulnerabilities’ are not really vulnerabilities in the directly exploitable sense.
  2. The vulnerabilities are too difficult for the majority of attackers to find and exploit.
  3. The vulnerabilities are only exploitable by insiders.
  4. There aren’t enough attackers to exploit all or even most of the vulnerabilities.
  5. There are more attractive targets or exploit vectors for attackers to focus on.
Other plausible theories?

As someone who worked in the Application Security vulnerability assessment vendor for 15+ years, here is something to consider that speaks to theory #1 and #2 above. 

During the typical sales process, ‘free’ competitive bakeoffs with multiple vendors is standard practice. 9 out of 10 times, the vendor who produces the best results in terms of high-severity vulnerabilities with low false-positives will win the deal. As such, every vendor is heavily incentivized to identify as many vulnerabilities as they can to demonstrate their skill and overall value. Predictively then, every little issue will be reported, from the most basic information disclosure issues to the extremely esoteric and difficult to exploit. No vendor wants to be the one who missed or didn’t report something that another vendor did and risk losing a deal. More is always better. As further evidence, ask any customer about the size and fluff of their assessment reports.

Understanding this, the top vulnerability assessment vendors invest millions upon millions of dollars each year in R&D to improve their scanning technology and assessment methodology to uncover every possible issue. And it makes sense because this is primarily how vendors win deals and grow their business.

Before going further, let’s briefly discuss the reason why we do vulnerability assessments in the first place. When it comes to Dynamic Application Security Testing (DAST), specifically testing in production, the whole point is to find and fix vulnerabilities BEFORE an attacker will find and exploit them. It’s just that simple. And technically, it just takes the exploitation of one vulnerability for the attacker to succeed.

Here’s the thing: if attackers really aren’t finding, exploiting, or even caring about these vulnerabilities as we can infer from the supplied data — the value in discovering them in the first place becomes questionable. The application security industry industry is heavily incentivized to find vulnerabilities that for one reason or another have little chance of actual exploitation. If that’s the case, then all those vulnerabilities that DAST is finding rarely matter much and we’re collectively wasting precious time and resources focusing on them. 

Let’s tackle Static Application Security Testing (SAST) next. 

The primary purpose of SAST is to find vulnerabilities during the software development process BEFORE they land in production where they’ll eventually be found by DAST and/or exploited by attackers. With this in mind, we must then ask what the overlap is between vulnerabilities found by SAST and DAST. If you ask someone who is an expert in both SAST and DAST, specifically those with experience in this area of vulnerability correlation, they’ll tell you the overlap is around 5-15%. Let’s state that more clearly, somewhere between 5-15% of the vulnerabilities reported by SAST are found by DAST. And let’s remember, from an I-dont-want-to-be-hacked perspective, DAST or attacker-found vulnerabilities are really the only vulnerabilities that matter. Conceptually, SAST helps find them those issues earlier. But, does it really? I challenge anyone, particularly the vendors, to show actual broad field evidence.

Anyway, what then are all those OTHER vulnerabilities that SAST is finding, which DAST / attackers are not?  Obviously, it’ll be some combination of theories #1 - #3 above. They’re not really vulnerabilities, they’re too difficult to remotely find/exploit, or attackers don’t care about them. In either case, what’s the real value for the other 85-95% of vulnerabilities reported by SAST? A: Not much. If you want to know why so many reported 'vulnerabilities' aren’t fixed, this is your long-winded answer. 

This is also why cyber-insurance firms feel comfortable writing policies all day long, even if they know full well their clients are technically riddled with vulnerabilities, because statistically they know those issues are unlikely to be exploited or lead to claims. That last part is key — claims. Exploitation of a vulnerability does not automatically result in a ‘breach,’ which does not necessarily equate to a ‘material business loss,’ and loss is the only thing the business or their insurance carrier truly cares about. Many breaches do not result is losses. This is an crucial point that many InfoSec pros are unable to distinguish between — breach and loss. They are NOT the same thing.

So far we’ve discussed the misalignment of interests between Application Security vulnerability assessment vendors and their customers. The net-result of which is that that we’re wasting huge amounts of time, money, and energy finding and fixing vulnerabilities that rarely matter. If so, the first thing we need to do is come up with a better way to prioritize and justify remediation, or not, of the vulnerabilities we already know exist and should care about. Secondly, we must more efficiently invest our resources in the application security testing process. 

We’ll begin with the simplest risk formula: probability (of breach) x loss (expected) = risk.

Let’s make up some completely bogus numbers to fill in the variables. In a given website we know there’s a vanilla SQL Injection vulnerability in a non-authenticated portion of the application, which has a 50% likelihood of being exploited over a year period. If exploitation results in a material breach, the expected loss is $1,000,000 for incident handling and clean up. Applying our formula:

$1,000,000 (expected loss) x 0.5 (probability of breach) = $500,000 (risk)

In which case, in can be argued that if the SQL injection vulnerability in question costs less than $500,000 to fix, then that’s the reasonable choice. And, the sooner the better. If remediation costs more than $500,000, and I can’t imagine why, then leave it as is. The lesson is that the less a vulnerability costs to fix the more sense it makes to do so. Next, let’s change the variables to the other extreme. We’ll cut the expected loss figure in half and reduce the likelihood of breach to 1% over a year.

$500,000 (expected loss) x 0.01 (probability of breach) = $5,000 (risk)

Now, if vulnerability remediation of the SQL Injection vulnerability costs less than $5,000, it makes sense to fix it. If more, or far more, then one could argue it makes business sense not to. This is the kind of decision that makes the vast majority of information security professionals extremely uncomfortable and instead why they like to ask the business to, “accept the risk.” This way their hands are clean, don’t have to expose their inability to do risk management, and can safely pull an, “I told you so,” should an incident occur. Stating plainly, if your position is recommending that the business should fix each and every vulnerability immediately regardless of the cost, then you’re really not on the side of the business and you will continue being ignored.

What’s needed to enable better decision-making, specifically how to decide what known vulnerabilities to fix or not to fix, is a purpose-built risk matrix specifically for application security. A matrix that takes each vulnerability class, assigns a likelihood of actual exploitation using whatever available data, and containing an expected loss range. Where things will get far more complicated is that the matrix should take into account the authentication status of the vulnerability, any mitigating controls, the industry, resident data volume and type, insider vs external threat actor, a few other things to improve accuracy. 

While never perfect, as risk modeling never is, I’m certain we could begin with something incredibly simple that would far outperform our the way we currently do things — HIGH, MEDIUM, LOW (BLEH!). When it comes to vulnerability remediation, how exactly is a business supposed to make good informed decisions about remediation using traffic light signals? As we’ve seen, and as all previous data indicates, they don’t. Everyone just guesses and 50% of issues go unfixed.

InfoSec's version of the traffic light: This light is green, because in most places where we put this light it makes sense to be green, but we're not taking into account anything about the current street’s situation, location or traffic patterns. Should you trust that light has your best interest at heart?  No.  Should you obey it anyway?  Yes. Because once you install something like that you end up having to follow it, no matter how stupid it is.

Assuming for a moment the aforementioned matrix is created, all of a sudden it fuels the solution to the lack of efficiency in the application security testing process. Since we’ll know exactly what types of vulnerabilities we care about in terms of actual business risk and financial loss, investment can be prioritized to only look for those and ignore all the other worthless junk. Those bulky vulnerability assessment reports would likely dramatically decrease in size and increase in value.

If we really want to push forward our collective understanding of application security and increase the value of our work, we need to completely change the way we think. We need to connect pools of data. Yes, we need to know what vulnerabilities websites currently have — that matter. We need to know what vulnerabilities various application security testing methodologies actually test for. Then we need to overlap this data set with what vulnerabilities attackers predominately find and exploit. And finally, within that data set, which exploited vulnerabilities lead to the largest dollar losses.

If we can successfully do that, we’ll increase the remediation rates of the truly important vulnerabilities, decrease breaches AND losses, and more efficiently invest our vulnerability assessment dollars. Or, we can leave the status quo for the next 10 years and have the same conversations in 2028. We have work to do and a choice to make. 


Joshbw said...

You draw the conclusion that the 85-95% of SAST vulnerabilities that don't overlap with DAST are in categories 1-3. While my experience doesn't necessarily map to any other enterprise (we don't run out of the box security rules, for reasons you cite plus several others, instead opting to write our own rules), our code bases are large enough that I would call my experience a lot more than anecdotal. And that experience is that the lack of overlap is for several reasons:

1. DAST, for scenarios where it has some feasible way to map the interactive surface of a piece of code (e.g. Web UI, swagger defined services, etc.), still requires a lot of baby sitting to get anything approaching reasonable coverage, and on anything but trivial apps is never 100%. And there are apps where DAST feasibly could map the surface, but rarely does so effectively (Electron and Cordova apps, for example). SAST gets far more coverage with much less manual intervention, so we have more rules for SAST

2. The categories in #1 are the minority - the closest DAST gets for an out of the box solution for a whole boatload of app/service models is fuzzing, and that still requires a ton of manual work to get decent coverage. Given that DAST is either a manual or non-existent option for many project types, we have similarly invested in more rules for SAST

3. DAST tests are miles away from comprehensive. Even a slight variation from a standard exploit scenario stymies most DAST. It simply isn't adaptive like a human is, nor is it capable of tying together several inferential clues (though some DAST attempts this by reporting a ton of low quality "informational" findings). The state of DAST is that it can tell you if you are obviously exploitable, but some of us are protecting against adversaries that will invest the time into finding the unobvious exploits. At the end of the day, static analysis with a good taint flow engine (or control flow for certain issues) is simply much more accurate (for both false positives and false negatives) across a larger swath of code types than the current state of DAST at finding exploitable (not vulnerable - actually exploitable) scenarios, so we have invested more in SAST rules

4. Writing new DAST tests is a lot more labor intensive than writing tests for (good) SAST offerings

5. DAST is operationally much more expensive to run than SAST, which has informed where we invest in rule authoring

None of those things means that the mismatch between SAST and DAST rules is because of points 1-3. That said, as I copped to initially, we aren't indicative of the average experience given our threat model and resources to customize what we run (a whole lot of rules in both commercial SAST and DAST are absolute garbage, so anyone running the out of the box experience likely hates it). However, I think its a mistake to say that if a rule is in SAST but not DAST that its one of your first three categories of issues

Joshbw said...

Though to your larger point - I'd say that *most* applications (those without serious adversaries in their realistic threat model) should probably only run high accuracy checks for critical vulnerabilities - dangerous deserialization, SQL injection, etc. (basically the "run code on your server infrastructure" or "run code in most end users' DOM" checks), and the state of tooling makes that scenario a pain to do. Will most applications ever be attacked (outside of wormable scenarios), probably not, but ensuring that you don't have the critical issues is sort of like making sure your buildings have fire escapes - fires are rare, but you still want to be prepared because the results are catastrophic if you aren't.

Jeremiah Grossman said...

@Joshbw: Thank you for the response! "You draw the conclusion that the 85-95% of SAST vulnerabilities that don't overlap with DAST are in categories 1-3." Actually, no. The categories were possible theories as to why so many vulnerabilities in general go unexploited -- however they are found. Not so much those being reasons for why SAST doesn't find what DAST does and vice versa, which you nicely articulated and without any argument from me. In general though, let's say we take your environment and way you do testing, which is completely fine. My question is, you're going to produce a bunch of SAST findings... but what is the likelihood for each of those findings will be found, exploited, and lead to material business loss by a real adversary? If low, why do we bother? If high, I'd really like to see the data to back it up.

Effectively, I think we essentially touched on the same answer in your second comment. I just wish we had actual data to back up our thinking, rather than simple judgment calls that we can't justify!

Joshbw said...

"My question is, you're going to produce a bunch of SAST findings... but what is the likelihood for each of those findings will be found, exploited, and lead to material business loss by a real adversary? If low, why do we bother? If high, I'd really like to see the data to back it up. "

If you are asking about us specifically - for the most part we have to assume at least a subset of any particular class of vulnerability will bite us, because of both the volume and skill of adversaries. Would a particular SQL Injection on some obscure page actually be found and exploited if we let it through? Probably not. But if we allowed SQL Injection at any volume into our code bases a very non-trivial amount of it would be exploited and we would be on fire from it. Its not terribly feasible to speak of the likelihood of a particular instance of a vulnerability being exploited by an attacker, but we do know that when the density/prevalence increases, the likelihood we will be exploited does as well (which is reasonably obvious, but still). There are other factors that get figured into it (platform mitigations, what attackers seem to be looking for at a given point, if there is blood in the water around a certain scenario, etc.), but keeping the bug density low is one of the few elements that a dev team has to control their likelihood of exploitation.

A really good analogy is inoculation rates in the field of epidemiology - the chance of a measles outbreak is almost entirely tied to the vaccine level. The chance increases substantially with each % below 97%. Can a doctor tell if a specific person not getting vaccinated will be involved in the outbreak - certainly not - but when they look at the population in aggregate they can talk about outbreak chances pretty accurately. So given that we know our population of code will be exposed to constant attempts at exploitation, our best predictor is how prevalent a vulnerability is within that population.

If you weren't asking the question of us specifically, I think that everything I saw above is true for companies in general EXCEPT that their population of code might be exposed to attempts at exploitation far less regularly. Using the above analogy, we don't vaccinate against smallpox anymore because nobody gets exposed - if that ever changed, the fact that the vaccine rate is essentially 0 we would be very screwed. In the epidemiology world risk is the combination of vaccine rate and rate a population would be exposed to a pathogen. In our world its probably looseley bug density combined with rate a codebase faces scrutiny by attackers. Higher the scrutiny, the lower your density needs to be to survive. Lower the scrutiny, the more tolerance a company can have for high bug density - hence me thinking most companies, by virtue of having lower scrutiny, are probably fine focusing on high accuracy critical findings (and really solid patching in common platforms on their attack surface)

Anton Chuvakin said...

But wait... thinks are actually worse. Some of the DAST-found and even VA-found infrastructure vulnerabilities also "don't matter" (= do not lead to loss). The challenge is that most don't know what do. Unlike in advertising (“Half the money I spend on advertising is wasted; the trouble is I don't know which half.” ), my fear now is that in vulnerability management 80% of the money is wasted, but nobody knows which 20% matter....

Unknown said...

I didn't see mentioned the requirement to understand the risks associated with the application itself. What kind of information is it handling? What happens to the business if the app is not available and for how long due to an attack or breach? What are the compliance requirements associated with the application and their implications to customers? I learned long ago that if you can't bring the assessment into meaningful terms to the business and align with business risks then all you are doing is creating a report for its own sake which in many cases will be ignored.

Anonymous said...

One report you might be interested in is the Technical Assessment Methodology (TAM) in development by the Electric Power Research Institute (EPRI) that performs a bounded analysis of components to assess and mitigate vulnerability. The attack surface characterization of an asset in the TAM focuses on the critical data and what means exist to interact with it. They avoid the boondoggle of chasing innumerable vulnerabilities by looking at exploit sequences comprised of attacker objectives, attack pathways, and attack mechanisms and identifying what residual vulnerabilities are not covered by the applied security control methods.

Unknown said...

Great article Jeremiah. And I have a ton of thoughts but in general, have been thinking along very similar lines. A quick question and some extended thoughts

What do you think about pen tests and bug bounties? Are they better than DAST even though the output is still focused on lists of bugs and to get as many as you can? The upside is at least they're typically putting in the work to show executability.

I've been finding myself telling security people this story often recently. The whole job of appsec historically has been to create lists of bugs and pass them to the developers. But what is the developer's experience of getting the lists of bugs? They say "you didn't have to buy a tool or pay a pen tester to tell me I have bugs in my code. I have tons of bugs filed against my code base and most of them are hindering important business functionality. So why should I care about these bugs compared to those ones? These are all just theoretical risks, not actual ones." Basically we think making longer lists of bugs are somehow doing our jobs better at appsec. But is a lack of bugs really the problem? Absolutely not. The problem is prioritizing those that actually matter. I know I'm speaking to the choir on this but thought I'd share how I've been explaining the same thing you laid out above to folks.

The follow-on point is one that you refer to a couple times above. So DAST is better than SAST and you say "where they’ll eventually be found by DAST and/or exploited by attackers." The "and/or exploited by attackers" bit is the key we've found to change the story here with both how we prioritize and how we get devs to give a shit in the first place. This also ties in with the equation for the risk you talk about later on (probability (of breach) x loss (expected) = risk.). So how do you actually get this information? Our take is a mix of both understanding from REAL ATTACK DATA on your apps both where attackers are attempting to attack and being able to see when they're actually having success in an attack is crucial to being able to calculate the probability of breach part. If you can see all your attack traffic and understand what parts of your app it's focused on, then you can start numerically building an informed sense of the probability of breach. The real attack data also completely changes the conversation with devs because the attack is no longer a theoretical game nor is it something that you've paid someone to find in their code. It's an actual attacker attempting to breach their code as you speak with them. This leads devs to actually care to pay attention to the data in the first place and makes it much easier to have the infosec to app dev manager discussion each quarter about what parts of the code base you should be proactively hardening (the ones with the highest risk, aka high loss and highly targeted by attackers).

Unknown said...

Ah, there's approving queue. Ok I'll post the second part now and hope you can figure out how to make them line up. Sorry J.

Unknown said...

Part two:

Anyways, YES "If we really want to push forward our collective understanding of application security and increase the value of our work, we need to completely change the way we think. ".

And YES YES YES "if your position is recommending that the business should fix each and every vulnerability immediately regardless of the cost, then you’re really not on the side of the business and you will continue being ignored." So many times YES.

I find myself explaining over and over to people "is your goal really to find all the bugs? fix them all? and do that all before they go into production? Then you're high." But stated this way no one says, "yes that's the goal". So then that leads very naturally into the next point which is, ok then you have to assume you have vulnerabilities in your production code at all times correct? Answer "yes". Then why are you not obsessed with getting visibility into where your attackers are attempting to breach your code so that you can know if one of those vulnerabilities is being breached or someone is at the very least attempting to do so? Answer "cause I didn't think to do that". We should do our best to find and solve as many vulnerabilities as we can but we have to assume we can't find them all and then production attack visibility and protection are paramount. On top of all that, many attacks against apps are now not even trying to find bugs. They're exploiting features through unintended use cases or misuse or abuse of application functionality. Functionality that doesn't even get picked up in sast or dast (which is another reason why I'd prob say bug bounty or pen test is ultimately more valuable).

Long response but just really resonated with how I've been thinking and talking about appsec recently. And makes me wonder how long security testing in its current carnation will last. (spoiler is prob forever which is sad but doesn't mean we can't make progress one by one).

Sichao Wang said...

Maybe think of a different use case. Imagine a scenario where company A plans to acquire company B which has web properties with major vulnerabilities. Even these vulns have not yet been exploited (in the real-estate case, no foundational/fire damages have occurred yet) perhaps company B, like in what the real-estate industry does, needs to fully disclose all the "section 1 or section 2 issues"? It can be mandatory (by cyber insurance policy) for the seller to disclose and fix all major exploitable vulnerabilities before the transaction can go through. Verizon could have been better off for their Yahoo! acquisition had there been a "Section 1 form".