Harold T. Martin III was arrested in August 2016 for allegedly stealing and hoarding terabytes of highly classified documents taken from multiple intelligence agencies, where he’d worked as a contractor for more than 20 years. Around the same time, the Shadow Brokers hacker group exposed a number of sophisticated cyber-hacking tools developed by the National Security Agency (NSA) and now believed to have been downloaded by an agency insider. Those tools were later tied to a pair of major international cyberattacks in spring 2017. A third incident dating to the same time frame, saw a retired DuPont engineer charged by the company with stealing proprietary trade secrets and then leveraging them for his independent consulting business.
In all three cases, trusted individuals with direct access to highly sensitive information violated that trust and evaded internal safeguards before making off with data that was never supposed to leave the premises.
Today, all government organizations and most large private sector firms operate insider threat programs to detect when insiders begin going rogue. The systems collect vast volumes of user data, sifting through it looking for anomalies that indicate a change in behavior patterns. Such changes include sudden copying, downloading or printing of unusual numbers or types of files. What began as a simple quest to track and correlate information is becoming a costly challenge. Tracking each user’s digital behavior means capturing everything from when and where they log on and off, to what applications they use, which data they touch and what and when they download, print or copy. Saving all that data – which can amount to terabytes daily and even petabytes, isn’t free.
“You need to have a long-term data strategy that optimizes that balance between access and cost,” says David Sarmanian, an enterprise architect with systems integrator General Dynamics Information Technology (GDIT), which builds and manages IT systems for a range of government customers in both the classified and unclassified worlds. “Then you need to revisit that strategy annually to make sure it’s current and up-to-date with advancing technology.”
As network speeds and encryption use have grown, insider threat protection has focused on monitoring user activity at the endpoint, said Mike Crouse, director of data and insider threat business solutions for ForcePoint in Austin, Texas.
For agencies with little sensitive data, the collection logs are relatively small. But where national security systems are involved – or data is highly sensitive because it contains personal or perhaps proprietary industry information – the volumes needing collection ramp up quickly.
Minimum standards developed for the National Insider Threat Policy call for “monitoring of user activity on U.S. government networks” to detect, monitor and analyze anomalous user behavior. How much monitoring is necessary is left to communities and agencies discretion – including which data to collect.
The Intelligence Community Standard for collecting audit data is most extensive, calling for the recording and collection almost every user action: logging on and off; creating, accessing, deleting or modifying files; uploads, downloads and printing or copying of files; changes to access or privilege levels and applications use. Even here, decisions must be made. Should the agency collect actual files and the specific changes made in every instance, or just that the files were accessed?
The latest best practices from the CERT division of Carnegie Mellon University’s Software Engineering Institute, begin in the hiring process and continue throughout an employee’s tenure. Necessary steps include:
- Monitoring and responding to suspicious or disruptive behavior
- Monitoring and controlled, remote access from all end points including mobile devices
- Establishing a baseline of normal network device behavior
- Employing a log correlation engine or security information and event management (SIEM) system to log, monitor, and audit employee actions
Agencies must decide for themselves how long of a period to maintain and review, said Michael Albrethsen, information system security analyst for the CERT division of the Carnegie Mellon Software Engineering Institute.
A baseline profile for each user tracks log-ins and log-outs to IT systems and applications, locations and devices used and specific data accessed or manipulated. Combined with job titles and functions, network and application permissions and physical access to facilities, this provides a portrait against which anomalous behavior can be detected.
But that’s really the easy part, says ForcePoint’s Crouse. “We have visibility into anything the user does on the endpoint or in the cloud,” he says. “But the ability to collect doesn’t mean we should collect it. It isn’t an issue of Big Data or small data, it’s the right data.”
The right data depends on the organization’s risk tolerance, mission, size and budget. An online app for reserving a National Park camping spot does not require the same level of scrutiny as a classified database.
In the least critical applications, log data may be retained for as little as a few months. In typical government applications, it may be a year or more and for classified environments, audit data for forensic investigations might be kept for the user’s entire career (and even beyond). The DuPont engineer’s theft wasn’t apparent until after he left the company. That’s when data really stack up and portability issues come into play. Data formats and media used today may be obsolete in a decade or two. Information security officers can’t afford to simply stack up disks or tapes for future use.
Even when an organization is selective about the information gathered, it can quickly accumulate to the petabyte range (1 petabyte equals 1 million gigabytes), for the law enforcement, defense and intelligence communities, Albrethsen said. “That is where the state of the art is moving.”
Containing Data Volumes
Managing this takes strategy. By strategically identifying which pieces of data must be saved from each event and how such data is stored, the volumes can be made more manageable.
“You’re not going to collect every bit and byte that comes from the end point. You don’t want to pay for data that you never use,” says Sarmanian. “And not all data being collected needs to be available for immediate access. Records from different sources will come in different formats. These logs and other data must be normalized before they can be stored and used. Typically, this is done at the point of ingestion, often by a SIEM solution designed for this purpose.
The next step is storage. Choosing storage requirements determines the long-term cost of a program. While storage costs are perpetually declining, savings are eaten up by increasing volumes of saved data.
“Storage is cheap, but you still have to buy a lot of it,” said James Cemelli, a GDIT project manager. “Most agencies now are storing data in the low petabyte range, but that will grow exponentially as long-term data collection continues.”
Fortunately, not all such data need be treated the same. Security officials should check network logs regularly for unusual activity and examine files for malware and malicious IP addresses. Network Operations teams will want to have 12 to 13 months of log data available. Tiered storage models provide lower-cost options for data only rarely needed, while still maintaining instant access to high-value current data.
Seldom-used data can be maintained in a storage-area network. Rarely-used data needed for forensic investigations following an event, could be kept on less expensive media such as hard disk drives, compact discs or tape drives.
There is a tradeoff between cost and availability. Data collection and storage policies for an insider threat program should be based on cost-benefit-risk analysis. “Creating a long-term data storage strategy helps agencies maximize the benefits of advancing technology while still being able to support their mission of defeating insider threat,” said Cemelli.