Wanted: Metrics for Measuring Cyber Performance and Effectiveness
Chief information security officers (CISOs) face a dizzying array of cybersecurity tools to choose from, each loaded with features and promised capabilities that are hard to measure or judge.
That leaves CISOs trying to balance unknown risks against growing costs, without a clear ability to justify the return on their cybersecurity investment. Not surprisingly, today’s high-threat environment makes it preferable to choose safe over sorry – regardless of cost. But is there a better way?
Some cyber insiders believe there is.
Acting U.S. Federal Chief Information Officer (CIO) Margie Graves acknowledges the problem.
“Defining the measure of success is hard sometimes, because it’s hard to measure things that don’t happen,” Graves said. President’s Trump’s Executive Order on Cybersecurity asks each agency to develop its own risk management plan, she noted. “It should be articulated on that plan how every dollar will be applied to buying down that risk.”
There is a difference though, between a plan and an actual measure. A plan can justify an investment intended to reduce risk. But judgment, rather than hard knowledge, will determine how much risk is mitigated by any given tool.
The Defense Information Systems Agency (DISA) and the National Security Agency (NSA) have been trying to develop a methodology measuring the actual value of a given cyber tool’s performance. Their NIPRNet/SIPRNET Cyber Security Architecture Review (NSCSAR – pronounced “NASCAR”) is a classified effort to define a framework for measuring cybersecurity performance, said DISA CIO and Risk Management Executive John Hickey.
“We just went through a drill of ‘what are those metrics that are actually going to show us the effectiveness of those tools,’ because a lot of times we make an investment, people want a return on that investment,” he told GovTechWorks in June. “Security is a poor example of what you are going after. It is really the effectiveness of the security tools or compliance capabilities.”
The NSCSAR review, conducted in partnership with NSA and the Defense Department, may point to a future means of measuring cyber defense capability. “It is a framework that actually looks at the kill chain, how the enemy will move through that kill chain and what defenses we have in place,” Hickey said, adding that NSA is working with DISA on an unclassified version of the framework that could be shared with other agencies or the private sector to measure cyber performance.
“It is a methodology,” Hickey explained. “We look at the sensors we have today and measure what functionality they perform against the threat.… We are tracking the effectiveness of the tools and capabilities to get after that threat, and then making our decisions on what priorities to fund.”
NSS Labs Inc., independently tests the cybersecurity performance of firewalls and other cyber defenses, annually scoring products’ performances. The Austin, Texas, company evaluated 11 next-generation firewall (NGFW) products from 10 vendors in June 2017, comparing the effectiveness of their security performance, as well as the firewalls’ stability, reliability and total cost of ownership.
In the test, products were presumed to be able to provide basic packet filtering, stateful multi-layer inspection, network address translation, virtual private network capability, application awareness controls, user/group controls, integrated intrusion prevention, reputation services, anti-malware capabilities and SSL inspection. Among the findings:
- Eight of 11 products tested scored “above average” in terms of both performance and cost-effectiveness; Three scored below
- Overall security effectiveness ranged from as low as 25.8 percent, up to 99.9; average security effectiveness was 67.3 percent
- Four products scored below 78.5 percent
- Total cost of ownership ranged from $5 per protected megabit/second to $105, with an average of $22
- Nine products failed to detect at least one evasion, while only two detected all evasion attempts
But point-in-time performance tests don’t provide a reliable measure of ongoing performance. And measuring the effectiveness of a single tool does not necessarily indicate how well it performs its particular duties as part of a suite of tools, notes Robert J. Carey, vice president within the Global Solutions division at General Dynamics Information Technology (GDIT). The former U.S. Navy CIO and Defense Department principal deputy CIO says that though these tests are valuable, they still make it hard to quantify and compare the performance of different products in an organization’s security stack.
The evolution and blurring of the lines between different cybersecurity tools – from firewalls to intrusion detection/protection, gateways, traffic analysis tools, threat intelligence, intrusion detection, anomaly detection and so on – mean it’s easy to add another tool to one’s stack, but like any multivariate function, it is hard to be sure of its individual contributions to threat protection and what you can do without.
“We don’t know what an adequate cyber security stack looks like. What part of the threat does the firewall protect against, the intrusion detection tool, and so on?” Carey says. “We perceive that the tools are part of the solution. But it’s difficult to quantify the benefit. There’s too much marketing fluff about features and not enough facts.”
Mike Spanbauer, vice president of research strategy at NSS, says this is a common concern, especially in large, managed environments — as is the case in many government instances. One way to address it is to replicate the security stack in a test environment and experiment to see how tools perform against a range of known, current threats while under different configurations and settings.
Another solution is to add one more tool to monitor and measure performance. NSS’ Cyber Advanced Warning System (CAWS) provides continuous security validation monitoring by capturing live threats and then injecting them into a test environment mirroring customers’ actual security stacks. New threats are identified and tested non-stop. If they succeed in penetrating the stack, system owners are notified so they can update their policies to stop that threat in the future.
“We harvest the live threats and capture those in a very careful manner and preserve the complete properties,” Spanbauer said. “Then we bring those back into our virtual environment and run them across the [cyber stack] and determine whether it is detected.”
Adding more tools and solutions isn’t necessarily what Carey had in mind. While that monitoring may reduce risk, it also adds another expense.
And measuring value in terms of return on investment, is a challenge when every new tool adds real cost and results are so difficult to define. In cybersecurity, though managing risk has become the name of the game, actually calculating risk is hard.
The National Institute of Standards and Technology (NIST) created the 800-53 security controls and the cybersecurity risk management framework that encompass today’s best practices. Carey worries that risk management delivers an illusion of security by accepting some level of vulnerability depending on level of investment. The trouble with that is that it drives a compliance culture in which security departments focus on following the framework more than defending the network and securing its applications and data.
“I’m in favor of moving away from risk management,” GDIT’s Carey says. “It’s what we’ve been doing for the past 25 years. It’s produced a lot of spend, but no measurable results. We should move to effects-based cyber. Instead of 60 shades of gray, maybe we should have just five well defined capability bands.”
The ultimate goal: Bring compliance into line with security so that doing the former, delivers the latter. But the evolving nature of cyber threats suggests that may never be possible.
Automated tools will only be as good as the data and intelligence built into them. True, automation improves speed and efficiency, Carey says. “But it doesn’t necessarily make me better.”
System owners should be able to look at their cyber stack and determine exactly how much better security performance would be if they added another tool or upgraded an existing one. If that were the case, they could spend most of their time focused on stopping the most dangerous threats – zero-day vulnerabilities that no tool can identify because they’ve never seen it before – rather than ensuring all processes and controls are in place to minimize risk in the event of a breach.
Point-in-time measures based on known vulnerabilities and available threats help, but may be blind to new or emerging threats of the sort that the NSA identifies and often keeps secret.
The NSCSAR tests DISA and NSA perform include that kind of advanced threat. Rather than trying to measure overall security, they’ve determined that breaking it down into the different levels of security makes sense. Says DISA’s Hickey: “You’ve got to tackle ‘what are we doing at the perimeter, what are we doing at the region and what are we doing at the endpoint.’” A single overall picture isn’t really possible, he says. Rather, one has to ask: “What is that situational awareness? What are those gaps and seams? What do we stop [doing now] in order to do something else? Those are the types of measurements we are looking at.”