Recently I was asking a colleague how desktop black box web application vulnerability scanners, from a scalability perspective, approach scanning large numbers of websites (i.e. 100 to 500+) simultaneously. I was curious about the way they address the physical infrastructure requirements to support big enterprise deployments as compared to our own. Anyone with experience knows commercial desktop black box scanners can easily eat up several gigs of memory and disk space for even a single website. Nothing high-end workstations can’t handle, but multiplied out it’s an entirely different story. The person had an unexpected answer, “Desktop black box scanners don’t have the use-case you [WhiteHat] do, their technology doesn’t need to scale.” Say Whaaa!?!
Asking for clarification he said consider how black box scanners are normally used in the field. They are “developer” or “pen-test” tools where the use-case is one person, one machine, one website, one configured scan, and then let it run for however many hour or days it takes until completion. Attempting to perform dozens or hundreds of scans at the same time would be exceedingly rare, if ever, so the capability to do so doesn’t need to exist. He said, “Who beside you guys [WhiteHat] needs to scan that many websites at a time?” To which I humbly replied, “the customer.”
As we know new Web attack techniques are being published weekly and Web application code is change is rapidly (Agile). Web applications, even those that are old and unchanged, need to be tested often for these issues. Testing once a year, or even once a month, isn’t enough in an environment like the Web where daily attacks are normal. So if an enterprise has say 10 or more websites, to say nothing of those with hundreds or thousands, mass scanning is essential to get through them regularly. Burdening enterprises by having them wire scan nodes together with command and control systems to achieve scale is patently absurd. That's as an inefficient one-to-one model. 100 simultaneous scans = 100 scan boxes. Of course I’m sure they are happy to sell the hardware.
So yes I was a bit surprised that the desktop scanner guys haven’t seen fit to tackle the technology scaling problem, even though two of them are mega corps. They above all should know that scaling must be addressed if performing routine vulnerability assessments on all the Internet’s most important websites is to become a reality. To be fair, we’ve never pulled back the curtain to show off our own infrastructure. Maybe it’s time we did so because over the years we’ve invested heavily and it’s something we’re particularly proud of. I think others would be interested and impressed as well. The physical requirements for WhiteHat Sentinel, a SaaS-based website vulnerability management platform, are in a word -- massive.
Operationally we’re assessing over 2,000 websites on a fairly routine basis (~weekly). A dedicated IT staff is monitoring the systems for over 300 points of interest (utilization of network, cpu, memory, uptime, latency, etc.) ensuring everything is running smoothly 24x7. Metrics show at any moment 450 scans are running concurrently, generating about 300 million HTTP requests per month, and processing 90,000 potential vulnerabilities per day. We preserve a copy of every request sent and response received for audit, trending, tracking, and reporting purposes. This system itself is being access by over 350 different customers with tens of thousands of individual Sentinel users.
CPU and memory wise our ESX virtualization chassis allow us to control resource allocation and scale fast between multiple scanning instances and load balanced front-end & back-end Web servers. As you can see from the pictures we have some serious storage requirements. Our clustered storage arrays have 250TB ready to go (additional capacity at a moments notice), writing about 500GB to disk per day, and connected by dual 10GB backplane ethernet connections. Sick!
Oh, did I forgot to mention the two 100MB links to the Internet? Also very important is that the infrastructure is fully redundant. Pull any network cable, push any power button, and the system keeps on humming. I left out the pictures of the backside of the cages, which is every bit as cool as the front, but there’s a lot of network cords, firewalls, routers and other stuff we’d prefer to keep to ourselves. :) If someone else claims to have a SaaS scanning platform I’m wondering if it looks anything remotely like ours.
The data center where everything is housed is SAS70 Type II certified and state-of-the-art when it comes to power, fire protection, cabling, construction, cooling, and physical security. Guards are on site 24/7/365, active patrols both inside and outside the facility, with 54 closed circuit video cameras covering the interior and exterior of building. To get access to our area requires an appointment, government issued ID, thumbprint, retina scan and only then do they hand over the key to our private space where only two people at WhiteHat have access. I’m not one of them. :) Compare this to the scanner on laptop sitting somewhere unguarded in the enterprise. Clearly we’re not a desktop scanner behind a curtain like others out there. We’re not playing around. We take this stuff extremely seriously.