5 Best Practices for Incident Response in Cloud Environments

essidsolutions

As businesses adopt cloud and containers, cyberattacks are growing. Companies need to quickly collect, sort, and analyze data to reduce attacks. James Campbell, CEO and co-founder, Cado Security, discusses five best practices for incident response in cloud and container environments.

When it comes to incident response, speed and efficiency are critical components. However, the evolving computing landscape has made doing things fast and increasingly difficult. With the continued adoption of the cloud, cyber attackers are changing how they operate. The attack surface is growing quickly, expanding all the way out to containers at the edge. The sudden surge in remote work during the pandemic significantly compounded the problem. 

The average time it takes to respond to a cyber incident can range from a matter of days when a company is aware of the attack, such as in the case of stolen assets or a reported error, to weeks or many months in the case of stealthy attacks, whether from inside or outside. Many of the most damaging attacks are of the latter variety, as Verizon points outOpens a new window in its 2021 Data Breach Investigations Report. IBM’s Costs of a Data Breach Report 2021Opens a new window found that it took an average of 287 days to identify and contain a data breach (seven days longer than the year before), with the costs of the breach rising the longer it took. Breaches that took longer than 200 days to identify cost an average of $4.87 million, compared with $3.61 million for breaches identified and contained in fewer than 200 days.

Organizations need to be able to quickly collect, sort, and analyze data to mitigate attacks and limit the damage. In a cloud environment, it requires automation and, for the best results, the use of a cloud-native investigation platform. Automating evidence collection alone can save analysts days or even weeks during an investigation. Here are five best practices for dramatically reducing the time to investigation and response.

1) Identify data sources and collect prudently

Performing initial triage efficiently by collecting the right set of artifacts will significantly reduce processing time and the use of acquisition resources. It will help you identify additional data sources while ruling out others. As guidance from the SANS Institute and the National Institute of Standards and TechnologyOpens a new window (NIST) points out, live data triage collection should be based on:

  • The artifacts of likely value to the investigation
  • The volatility of the data 
  • The amount of effort required to acquire that data 

Standardize on a base set of artifacts for triage, including the network connection state, logged-on users, current executing processes, event logs, $MFT, registry hives and volatile memory. If analysis of triage evidence warrants a full-disk image, you can acquire, process and analyze it automatically using cloud-native tools. 

Full-disk captures traditionally have involved a time-consuming manual process using bootable USB sticks or shipping a device to a secure location. Snapshotting and using a cloud provider’s APIs made this easier in the cloud, but it still required the knowledge and skills to work with each provider’s APIs. Today, the process can be automated using cloud-native APIs. Another option is to use a cloud-native investigation platform that abstracts the cloud’s complexity and fully automates the acquisition, processing and analysis of full cloud volumes. And it can do this without impacting workloads since no agent is required.

See More: Setting Up an Isolated Recovery Environment for Incident Response

2) Collect and process data efficiently

The faster you can analyze key events, the faster you can respond, thus reducing the risk to your organization. It’s best to document and standardize the collection and processing of evidence and, wherever possible, work with systems of interest in parallel. 

Automation will greatly streamline the process and, in the context of cloud and container environments, is critical to ensuring data is captured before it disappears. You also can use remote commands to call up a Security Orchestration, Automation and Response (SOAR) platform to collect data from multiple sources and perform automated actions based on a predefined playbook. Further, by integrating your cloud-native investigation platform with these other solutions, you can ensure a deeper-dive investigation can kick off immediately following high-severity detections. 

3) Standardize the preservation of data

The value and volatility of data typically determine its lifecycle management. Be sure to define and document where data will be stored and for how long, and who will have access to it. When possible, define hot and cold storage requirements and the full chain of custody, including proper tagging and labeling of evidence. 

4) Analyze data in a holistic manner

Be prepared to collect and aggregate data at scale, enabling a comprehensive view across all systems and the ability to drill down into the data in a user-friendly manner, such as a timeline. Further, collected data should be enriched using threat intelligence so that analysts can quickly and easily dive into the most important evidence first and pivot their investigation from there. A holistic view enhances the effectiveness of the investigation and increases the speed to containment, eradication, and recovery. 

5) Refine and sharpen your toolset

Computing environments aren’t static, and your incident response process shouldn’t be static either. It should adapt to changes in the computing or security landscapes. For example, the COVID-19 pandemic accelerated cloud adoption, forcing organizations to apply current IR processes inadequately to cloud investigations or just to accept the risk of limited visibility and response capabilities in the cloud. However, you could instead use the cloud as an asset to security, especially by taking advantage of a cloud-native investigation platform. The cloud also can offer secure, flexible and efficient processes for collecting, processing and storing evidence.

Case in Point

In one example, using EDR/XRD, SOAR, and a cloud-native investigation platform enables the proactive capture, processing and analysis of data across the environment. 

EDR/XDR triggers alerts (possibly involving multiple events or systems), and the SOAR platform correlates the alerts, putting a playbook into action and issuing an API call to the cloud-native investigation platform. 

Subsequently, the SOAR calls the EDR/XRD API (or a cloud agent) to execute a command on the host in question. 

Meanwhile, a triage package is generated, uploaded to cloud storage and automatically processed into the investigation platform. Threat intel is applied to the artifacts, allowing analysts to view, search and collaborate in the investigation, working with a shared view in a single pane of glass and  a single timeline.

If further investigation is needed, a full-disk acquisition can acquire deeper artifacts. With the investigation performed, the organization can begin remediation. 

Response at the Speed and Scale of Cloud

With incident response, speed is of the essence, but so is accurately assessing the threat. It’s critical to have a well-defined, practiced plan for cloud and container investigations. Triage helps define incidents early on and focuses the investigation on the systems affected. The agentless full-disk acquisition allows for deeper analysis without disrupting production workloads. And automating processes throughout not only greatly increases speed but also improves the accuracy and consistency of the process, further reducing the time to resolution.

The basic best practices for data collection and incident response haven’t changed much in the past decade — NIST’s Computer Security Incident Handling Guide, for instance, dates to 2012 — but the computing environment and the tools available have changed substantially. Following these best practices with the help of automation and a cloud-native investigation platform enables a standardized data collection process, holistic analysis and reduction in the time it takes to resolve incidents. And it also allows security to keep pace with the rapidly evolving cloud landscape. 

Which of these practices do you have in place to reduce the time to incident investigation and response? Let us know on FacebookOpens a new window , TwitterOpens a new window , and LinkedInOpens a new window .

MORE ON INCIDENT RESPONSE:Â