Friday, August 17, 2007

How to make a website harder to hack

I mean, that’s what web application security is all about. We know websites will never be 100% secure just like software never be 100% bug free. We also know web application hacks are targeted. All we have to do is look at CardSystems, the U.N., MySpace, CNBC, UC Davis, Microsoft UK, Google, Dolphin Stadium, Circuit City, T-Mobile, and many other incidents to figure that out. Bad guys don’t hammer away at eComSiteA then mistakenly hack into WebBankB. It doesn’t work like that. The victim is the one they’re targeting in the browser URL bar. So instead we should approach website security in terms of time and difficulty just like they’ve done for decades in physical security--with burglary resistance, fire resistance, alarm systems, etc.

For example GSA approved and U.L. certified products such as a:

Class 5 vault door - “shall be resistant to 20 man-hours surreptitious entry, 30 man-minutes covert entry and 10 man-minutes forced entry.“

Class 150-4 hours container - “must maintain an interior temperature less than 150°F and an interior relative humidity less than 85% when exposed to fire as per the Standard Time Temperature Curve for 4 hours to 2000°F.”

These benchmarks make sense. The problem in web application security is everyone so blindly and exclusively talks about “best practices” like the SDLC, input validation, threat modeling, PCI compliance, source code reviews, scanners, developer education, WAFs, and other topics. They forget about the big picture. Do these “solutions”, or combination thereof, actually make a website harder to hack? Yes! Well, probably. Err, maybe. And if so, how much harder? If the answer is unknown then how do we justify their cost in time and money? Oh right, “compliance." Still, imagine telling the CIO you just spent the last 6 months, countless company man-hours, and hundreds of thousands of dollars implementing “best practices” only to raise the bar by maybe 30 minutes!?

Judging from WhiteHat Security’s vulnerability assessment statistics on hundreds of websites, this is exactly what’s happening. Vendors basically attempt to dazzle customers with the most blinky-red-lights, buzzword compliant banter, confusing diagrams, meaningless ROI calculations, and reams of FUD to distract people from their main objective. MAKE A WEBSITE HARDER TO HACK. Do yourself a favor the next time a vendor is hawking is their wares at you, ask them this simple question, “How much harder does your solution make my website to hack?” The answer might surprise you, if there is one.

Rather than continue ranting, I think its time the web application security industry began using this type of testing methodology so we may answer these questions in no uncertain terms. To do so we’d have to take into consideration the skill of the bad guys, the tools and techniques at their disposal, whether they would be internal or external, change rate of the website, and . . . anything else? The methodology probably wouldn’t need to be overly complicated. In fact, we might borrow ideas from physical security on how they set-up their testing processes. Imagine obtaining a measurable degree of security assurance.


Anonymous said...

Interesting point, though are the physical door specs really valid either? I am sure they used some measurements, but I am betting those could be twisted to say different things as well.

While numbers are useful, they can also be heavily exploited. The "lies, damned lies and statistics" quote almost always rings true in a case like this.

I am sure such numbers will ultimately be required, but I am not sure they will be as valuable as it might seem.

Jeremiah Grossman said...

Certainly its hard to say what's "valid", but it seems t work in the physical security world to at least some trusted extent, so why not in infosec? I'm open to a better way, but right now all I see are "best practice" requirements with no way of measuring.

kuza55 said...

Then "How much harder does your solution make my website to hack?" Your solution being WhiteHat Sentinel.

And can you provide any verifiable statistics as to your claims?

Jeremiah Grossman said...

Right, my solution is a fairly solid benchmark as to how well a websites defensive measures are functioning.... which could be input validation, WAFs, scrubbed error messages, output filters, prepared statements, etc etc etc.

So far I haven't made any Sentinel claims yet in the manner in which I described, but I'm consider what they might be. And yes I'd like to get them independently verified. To that end I'd like to formalize a measurement process if at all possible that people can buy into.

Right now we have no idea as to what solutions work and what doesn't.

Bob Rich said...

I've been thinking about this type of thing quite a bit recently. We're like the medicine men of IT, and until we can find some set of metrics to qualify and quantify risk and relief, companies are going to have little more than the word of their local shaman to determine how much sacrifice is enough.

I think the analogy to physical security solutions is a good a place as any other to start. It would seem that in the stated case of safe cracking, there are going to be preconditions (e.g. when installed as instructed), some parameters on the attacker (e.g. tools, experience), and probably an implicit objective (e.g. get something out of the safe).

To carry this through to web sites, we would probably have to define our set of conditions under which the specification applies, declare the basic skill set and tools available to the attacker, and specify the objective. All three of these would be fairly difficult, but a few folks and a white board (or wiki) could see if it's feasible in short order.

Of course, i'm sure there are caveats in the physical security world (e.g. an 'on average' kind of qualification), same would be true here i would imagine. After all, most modern web apps are more like a refinery than a stack of bonds.

Anonymous said...

I would say that a concept parallel to physical security might be foundation OS security.

How much weight of consideration is put on trusted systems for web server security?

If not, why not?

Anonymous said...


How about publishing statistics about how long it takes your team to compromise the same site after auditing it multiple times? This will automatically include many real world factors, not the least of which is what happens when things that were reported in the past aren't fixed. (in that case, the time goes down, not up).

It will be a start to seeing if this model can be an effective one for information security.

Anonymous said...

@ rezn that's a pretty good idea but it presents at least two problems in the world of Infosec Consulting:

1) It assumes that you always use the same people for the same clients.
2) It assumes that clients do regular (or at least follow-up) assessments.

Personally in my experience the # of clients that do follow-up or regular assessments is dismally small. The vast majority do assessments as one offs either as a last minute thought before "going live" or in response to an incident.

Jeremiah Grossman said...

@Bob Rich > Well said. I'm not 100% convinced the model would work either, but if not, then I have to absolutely know why. Obviously we need a better way to measure. Maybe I should start a project or something and set up that wiki you are describing.

@rob lewis > Do you have any URL references I could read?

@rezn > We could do that, the problem is we're not in control of our customer's websites. Often vulns go without fixes for a LONG time, and even if some are resolved, the rest aren't.

Taking Thorin's comments into consideration, it might not be a good idea to have the same people/technology that perform the VA be the same ones who "hack" for website for the time measurement. I'd much prefer using someone outside of WH for that part.

Anonymous said...

I think rezn was thinking more along the lines of repeat customers.

ie: The first time we did the assessment we found holes x, y, and z which took a, b, and c minutes/hours to find and exploit vs the timing on any repeat assessments.

My arguement had been that unless you had the same person doing the assessment the times would be inconsistent (due to methodology/personal style).

However, now that I've thought about it a bit more, if the same person is always doing it then there isn't a feasable check.

It will either be: it took me 5 mins (or less) to re-verify that x is still broken or 5 mins to find out it's fixed. But, still took b mins/hours to find something new (which can't be compared against the previous results).

So either you'd be comparing the same thing (semi-uselessly) over and over again or you'd be comparing apples and oranges.
(Once I know an apple is an apple it should basically always take me the same amount of time to identify the apple).

Having a 3rd or 4th party involved could become embarassing but would also be a could quality control. (I think that opens up a completely different topic though).

Anonymous said...

UGH not I'm making typos :(

"...but would also be a good quality control"

Andy Steingruebl said...

A safe (most anyway) is a relatively static item that you can evaluate as a single item under attack.

What you don't find however are UL ratings for how secure an individual bank branch is. This would involve testing the safe, lock boxes, guards, tellers, alarm system police response, etc.

I think we can feasibly do this style of certification against individual security components with a certain setup.

Stinger or .Net's input validation for example.

What you're talking about is more red-teaming, which we do against things like nuclear facilities, national labs, etc. Unfortunately its hard to boil them down to a single rating...

Anonymous said...

Yes, my idea was that since WH sells itself as a service, and tries to get companies to sign up for multiple, scheduled assessments that it would be in a position to provide an apples to apples time comparison. I also think it is a reasonable suggestion since Jeremiah is the one who came up with the idea that UL style time-to-defeat ratings might be useful.

If you notice, however, I did not say I though it would be a useful metric (for the reasons that security retentive has pointed out). I said that having WH start keeping track of and report on thier times would start to show if this kind of metric is indeed meaningful for our industry. I don't actually think it is, but I'm not sure.

Anonymous said...

Do you mean general references on the topic of secure web servers?

Gartner's recommended Web security hierarchy

" The Inevitability of Failure: The Flawed Assumption of Security in Modern Computing Environments"

Anonymous said...

That link may have been cut off.

Reduce risks with these guidelines for updating Internet server security

by John Pescatore and Edward Younker

Anonymous said...

Now I see what you're getting at rezn.

Thanks for the links Rob.

Jeremiah Grossman said...

Several excellent points have been made in the comments and I’ve been taking some time to digest things.

It’s true we’re comparing apples to oranges, VA solution vs. Bad Guy, but I think that’s OK since this the real world scenario and what I’m hoping to measure against. The closer to reality the better. Also, I think what Retentive said holds true. The U.L. procedures are more component tests rather than testing of a live implementation of a serious of comments. Indeed what I’ve proposed is more properly described as red teaming. That being said, I believe we can moving forward under those terms. I have to adjust the characterization of testing procedure and the process for measurement. In a few days I have to draft up methodology for people to review.

Jim Manico said...

Jeremiah, I am very grateful that you are putting some brain power into metrics, but I feel your argument is fatally flawed. The metrics in physical security are somewhat easy to obtain. You set a fire to a safe/door and measure it using standard scientific methods. The cost for such studies would be acceptable.

Now take web security. The depth needed to truly have multiple double-blind studies with web sec analysis comparing tools is a burden, at best. Also, when you ask if a door is secure, there are only a 1/2 dozen factors needed to measure. To measure the "security" of a website would require a kind of metric that is infinitely more difficult to calculate. And this is while folks who control web software projects want them done faster, cheaper, with more functionality.

I feel we do not need engineering testing rigor in terms of metrics, that's the cart before the ox. Lets actually apply true engineering rigor to building software in the first place. We do not even get close to that as an industry yet. Until we do, detailed measuring of metrics will only help your marketing department.

Jeremiah Grossman said...

@ Jim Manico > You and Security Retentive have makes good points about the comparisons to physical security as it may not line up the way I’d like. However, I’m not ready to give up on it. In all likelihood will have to adjust my expectations and the processes to get something useful out of the effort. Still thinking my way through the mess.On the application of true engineering rigor to software, I’m all for it, if it can be done.

However, the current state of web security suggests that the cart IS already before the xx. There are roughly128 million websites out there and most (all?) are already insecure. These websites and the application within are not going to be re-coded anytime soon with any kind of “secure software” methodology. What the security guys of my customers face is trying to figure out what solutions provide the most bang for the buck in a given scenario, even if they are only bandaids and don’t address the root of the problem.

This is why I’d like to get a solutions measurement methodology in place sooner rather than later because the bad guys aren’t waiting for us to get our act together.


Anonymous said...

Jeremiah –

This is an excellent line of reasoning (it also dovetails nicely with that post you had a while back about determining when you are ‘done’). You have all the tools you need to begin the process if you are collecting the amount of time spent seeking vulnerabilities and at what point you found each individual one. Then, you have a “10-minute vuln” and a “10-hour vuln” (for example).

Some thoughts:

- I would put these into buckets suggested by the numbers or that simply make sense to you – i.e. 10 minutes; 1 hour; 10 hours; etc.

- Note that averages are probably not appropriate because the minimum time is the crucial element.

- At this stage, don’t worry about the relative skill levels of the individuals involved in testing – that all washes out over time. (If you are really concerned, then pit two independent groups against each other).

- The more opportunistic your methodology, the better. If you have a rigid one that looks for certain types first, then you may have to find a way to adjust.

- You may need to adjust for information that is given to you that an attacker wouldn’t necessarily have available, if/when that occurs.

- (Btw, one way to tell you are “done” is when you have spent x person-hours unsuccessfully looking for new vulns.)

With this information, you can suggest a “time-to-attack” figure that is useful.

Pete Lindstrom

Anonymous said...


The idea of using "time to defeat" as a network security metric has been around for a long time. I know that Black Dragon Software had a patent on this and actually sold a software product which measured threats and mitigations in this way.

Of course, Black Dragon went out of business in 2004, so it's hard to say if the idea was a good one or not.

Jeremiah Grossman said...

Hmph, good to know. I'll have to do some research there as well. Thanks!

Richard Bejtlich said...

Hi Jeremiah,

I am working on time-based metrics too. I totally agree that it's all about the result of your efforts. A few times in my blog I've mentioned this metric:

Time for a pen testing team of [low/high] skill with [external/internal] access to obtain unauthorized [stealthy/unstealthy] control of a specified asset using [public/custom] tools and [zero/complete] target knowledge.

This implies at least two assessments: an initial test, followed by subsequent tests whenever measures to (supposedly) improve security are applied.

Jeremiah Grossman said...

Hi Richard,

I remember our conversation well and I'm borrowing heavily from your work with a current side project. I Have several of your posts on the subject bookmarked. :) I think the methodology can help us out a lot and doesn't necessarily need to replace others that people may valuable.

The mission plan I'm working out is currently:

"time for an adversary with varying levels of skill with external access to identify a vulnerability in a specified website using any available tools and techniques and limited target knowledge"

It'll need some fine tuning and explanation, but I'm excited about the prospects.

Anonymous said...

Metrics... Can you really put a number on how secure an application is? What if/when a new disruptive class of vulnerabilities are found (like format string vulnerabilities did a while ago)? I just don't know how you can honestly come to a point where you can quantify the 'security' of an application.

Sure you can 'raise the bar' and state with some certainty that an application is 'more secure' then it was before after investing time into securing it... But that's just relative.

Jeremiah Grossman said...

While I can't put a assign a number for "security" I can assign a time to how long it takes to break into something, just like the safe industry does.