Understanding the Data Lifecycle and Five Ways to Ensure Data Confidentiality

Data has become a treasure trove for cybercriminals, which is why companies now use data security technologies or software to secure their data instead of huge vaults and security personnel. Here’s a quick introduction to understanding each phase of the data security lifecycle and how companies can secure their data end-to-end. Read on.

Every data security model provides the same essential safeguards, requiring continuous attention to ensure effectiveness. This article describes a common model often associated with the CISSP common body of knowledgeOpens a new window . Although it is known as a lifecycle, it is a set of data states that enable effective policy creation, application of safeguards, and continuous attention to safeguarding effectiveness that helps manage general information handling risk.

The Data Lifecycle

As shown in Figure 1, the lifecycle consists of five nodes: create, store, use, share, and dispose. Each node requires unique policies and safeguards to achieve regulatory, ethical, and business requirements. Organizations that collect and process data, known as the data owners, use this model to ensure risk is managed across all nodes.

Create

Data stored and processed by an organization is generally â€œcreatedâ€ in two ways. First, data is collected from customers, employees, business partners, and other entities with whom the organization has a business relationship. Second, data is derived as part of business processes. In both cases, managing information creation is the first step in mitigating data risk.

Information handling risk management begins with only creating what is needed to enable business operations. Collecting unneeded sensitive information creates levels of unnecessary risk that should be avoided.

Once the data is created, the organization must classify and categorize it. Data classification determines the sensitivity of the data and how important it is to protect their confidentiality, integrity, and retention. There are many approaches to classification. One approach uses three classification levels: public, private, and restricted.

Public â€“ The unauthorized disclosure, alteration, or destruction of the data would cause little to no business impact.
Private â€“ The data’s unauthorized disclosure, alteration, or destruction would cause moderate business impact because of applicable regulations, litigation, or business agreements.
Restricted â€“ The data’s unauthorized disclosure, alteration, or destruction would cause significant business impact because of applicable regulations, litigation, or business agreements.

Correctly classifying information helps apply suitable safeguards, including separation of data, by classification for better control.

See More: Building Trust 101: Why a Modern Approach to Data Protection Is Key

Categorization measures the data’s value and management’s related risk tolerance. It is closely related to business continuity, keeping the data available and accurate for business operation. Table 1 provides guidelines for proper categorization as low, moderate, or high. Each security objective (confidentiality, integrity, and availability) receives its categorization, enabling effective assessment of how to achieve each for every data set.

The creation process is a critical part of the life cycle, providing the foundation for properly protecting collected and derived information. Watch this videoOpens a new window for more information on information classification/categorization.

Store

Safe information storage requires close attention to its classification and categorization as organizations implement risk-based safeguards. Not all data needs the same protection, so organizations should separate data sets based on their classifications and categorizations, placing data on network segments that enforce zero-trust, segments protected by behavior analytics, traffic control, and authentication/authorization controls.

Figure 3 is a model showing how this might work. Organizations create network segments on which they place storage devices. Each segment is assigned a trust level, a level of protection commensurate with the data’s sensitivity, ensuring an acceptable level of risk related to the data’s value and security.

Only applications and database administrators, using approved tools, should have direct access to the stored data, protected by strong access controls that enforce need-to-know, least privilege, and separation of duties.Â

See the following for a detailed look at how to create secure network segments and planned network trust zones.Â

Highly classified/categorized data usually requires more than just basic access controls. In addition, restricted data, for example, should be encrypted with an industry-accepted symmetric algorithm like AES. For a detailed look at symmetric and asymmetric encryption addressed in the next section, refer to the CryptographyOpens a new window video series.

An organization must apply the same focus on data collected by mobile devices, moving all sensitive data to network or cloud storage, preventing its long-term storage on mobile or desktop devices, and encrypting data still resident on those devices.

Use

Protecting data in use includes implementing the proper levels of access control, monitoring to ensure correct subject behavior, and isolating processes and their data.

Subjects must access stored information to complete business tasks, access controlled by authentication and authorization approaches that match the sensitivity of the data accessed. For restricted data, organizations should consider strong authentication.

Strong authentication uses two or more authentication factors that fall into three types.

Type I â€“ Something the subject knows (passwords, PINs, etc.)
Type II â€“ Something the subject has (smartcards, one-time passwords, â€¦)
Type III â€“ Something, the user, is (fingerprints, speech, retinas, â€¦)

For a more detailed look at these types, watch the video Single-factor Authentication vs. Multifactor AuthenticationOpens a new window .

In the article Risk Based Access Control and the Role of Continuous Authentication, I wrote about the need to consider ensuring that the device, application, or user that is authenticated continues to behave in expected ways. Suppose a threat actor compromises an authenticated entity. In that case, her behavior will likely be anomalous, as she makes unusual attempts to move to network segments and objects and acts in ways that deviate from a statistical norm. When this happens, the entity can be forced to re-authenticate. Continuous authentication can help prevent unwanted access to sensitive information when a threat actor uses authenticated credentials to access data.

Threat actors can access decrypted data without leaving the safety of a compromised system. Once a subject accesses information for use, it must be decrypted. (Homomorphic encryption, covered later, does protect data in use, but it is impractical for everyday use.) This makes data in use available to malicious payloads, as shown in Figure 4.

**Figure 4: Data in Use Attack Surface**

In this environment, compromised device firmware, drivers, the hypervisor, and other components can access data in use. However, confidential computing, increasingly used by organizations and cloud service providers, can help reduce the associated risk.

As shown in Figure 5, confidential computing, enabled in servers designed to support it, isolates processes and data, protecting them from compromised system components, and reducing the traditional risk. Only explicitly trusted entities have access to those processes and their data by extending zero-trust to actual processes. For a detailed look at confidential computing, see What Is Confidential Computing and Why It’s Key To Securing Data in Use.

**Figure 5: Confidential Computing Isolation**

Finally, all movement of information, either on-premise or across the internet, must be protected. As the video Communication Channels and Authentication ProtocolsOpens a new window describes, organizations must consider using secure channels for moving data, including authentication traffic. These channels, including TLS and IPsec, provide encrypted tunnels and other methods for making information inaccessible to threat actors.

In many cases, data owners share data sets with cloud service providers and researchers, often resulting in business planning information and other strategic data being returned to the data owner. During this data-sharing process, classified information is stored and processed by a third party, as shown in Figure 6.

Once the data is outside the control of the data owner, there is no guarantee that the owner has a complete picture of the trust level of the external processor’s environment. Further, the external processor’s location might lie in geographic areas not allowed by regulatory constraints, like the GDPR.

Disposition

Eventually, data is no longer needed for daily operation, requiring one of two approaches to disposition: archiving and sanitization. Disposition is vital for removing classified information from regularly used, regularly accessible storage locations.

Archiving is needed when regulatory or business requirements dictate data retention for a set period. Depending on the organization’s regulations, retention might require data archival from one year to forever. Federal regulations or agencies that regulate retention include

Internal revenue service (IRS)
Occupational Safety and Health Act (OSHA)
Family Leave and Medical Leave Act (FMLA)
Health Insurance Portability and Accountability Act (HIPAA)

Organizations must also ensure compliance with state, local, and industry requirements.Â

Storage of archived data requires the same attention to risk as data stored in production systems, including encryption and physical access. If encryption is used, organizations must ensure the retention of related keys. Keys change over time, and lack of key retention, part of a comprehensive key management process, would result in failures to recover archived data.

Data no longer required under retention requirements and no longer needed for daily business operation must be destroyed, making the data unrecoverable, including removing archived data from operational systems via media sanitization. How we sanitize media depends on the classification of the data and the media involved. Figure 7 shows the sanitization decision process provided in NIST SP800-88r1Opens a new window , a standard that provides detailed clear, purge, and destroy guidelines for each media type, including magnetic, optical, and solid-state storage.

Simply deleting data from media usually does not actually destroy the data, making them available for extraction through various methods.

**Figure 7: Media Sanitization Decision Tree**

In general:

Clear â€“ Use software or hardware products to use non-sensitive data (for example, 1 or 0 patterns) to overwrite user-addressable storage areas on the media.
Purge â€“ Use multiple overwrites of the media, block eraseOpens a new window , and cryptographic erasureOpens a new window .
Destruction â€“ When the media is not reused, when it is retired as part of a system retirement or replaced due to device failure, destroy it. Destruction includes shredding and burning.

For a practical application of SP 800-88r1 and a guide for establishing media sanitization policies and processes, download the Media Sanitization GuideOpens a new window .

See More: Homomorphic Encryption: How It Changes the Way We Protect Data

Ensuring the Confidentiality and Integrity of Shared Data

There are five popular and emerging ways to ensure the confidentiality or integrity of shared information and to prevent unwanted sharing: homomorphic encryption, data loss prevention, asymmetric encryption, hashing, and digital signatures.

Homomorphic encryption

Homomorphic encryption enables sharing of specific data elements without the external processor ever knowing their actual values. Further, the processing results are also always encrypted, preventing anyone other than the data owner from ever seeing it. Figure 8 shows how this works.

**Figure 8: Homomorphic Encrypted Sharing**

The data owner creates a data set for analysis.
The data owner designs a homomorphic plan that enables the service provider to process specific data elements.
The data owner implements the plan, encrypting the data, and creating the needed keys.
The service provider receives the data and uses a key to access data element values and create the homomorphically encrypted processing results without seeing the raw data or the results.
The service provider returns the analysis results to the data owner.
The data owner uses a private key, created when the data was originally homomorphically encrypted, to access the results.

During this entire process, no one other than the data owner can see the data. For a more detailed look at homomorphic encryption, see Homomorphic Encryption: How It Changes the Way We Protect Data.

Data loss prevention

On a smaller scale, small bits of information, including PII, are commonly shared within the data-owning organization, with customers/patients, and with business partners. This information is controlled by privacy laws, business policies, and ethical conduct, requiring user training on how sharing is allowed and data loss prevention (DLP). DLP helps ensure user behavior does not result in improperly shared PII, ePHI, or intellectual property.

Another consideration when sharing information is ensuring the information the source sends remains unchanged until the receiver receives and uses it. Hashing and digital signatures are two ways we protect both the data content and nonrepudiationOpens a new window .

Hashing

Hashing is a one-way process that creates a long value representing the content of the sent file or message, created using industry-accepted algorithms, like the SHA-256 algorithm, as shown in Figure 9. Also shown is how much a hash value changes when as little as one character is changed.

Figure 9: SHA-256 Hash

Regardless of the length of the data sent, the hash value created by a specific algorithm is always the same length.

Before the information is sent, the sender creates a hash value and provides it to the recipient. The recipient then rehashes the information and compares his hash value to the sender. If the values match, there is a high probability that the data has not changed.

Digital signatures & asymmetric encryption

The hash process is strengthened with digital signatures. Figure 10 depicts the digital signing process.

**Figure 10: Digital Signature Process**

The sender uses the SHA-256 algorithm to hash the sensitive document.
The sender encrypts the hash value with her private key.
The sender attaches the encrypted hash to the document and sends it to the recipient.
The recipient uses the sender’s public key to decrypt the hash value.
The recipient recreates the hash for the document content, also using the SHA-256 algorithm.
The recipient compares his hash value to the decrypted hash value.

If the hash values match, the recipient can assume two things with a high probability of truth. First, the document’s content has not changed, and its integrity remains. Second, the represented sender actually signed the document, providing nonrepudiation.

In addition to simply encrypting the hash value, the entire document can be encrypted, relying on asymmetric encryption and protecting confidentiality and integrity.

This approach to digital signing and encryption requires using key pairs, usually created by a PKI and closely related to asymmetric encryption. For a detailed look at PKI and asymmetric encryption, watch the video Asymmetric EncryptionOpens a new window .

Another digital signature approach is the Digital Signature AlgorithmOpens a new window (DSA). It does not use a standard PKI-issued key pair, which the U.S. federal government approved, but it will no longer be approved for use in new implementations with the future release of FIPS 186-5Opens a new window .

Final thoughts

From the time data records are collected to their destruction, the data owner is responsible for their protection. Regardless of the data lifecycle model used, organizations should pick one as a foundation for developing policies, standards, guidelines, and safeguards.

Threat actors continuously change how they compromise/disable systems and extract data needed for their purposes. The lifecycle is a continuous process, a continuous cycle, requiring daily reviews of emerging vulnerabilities and attack vectors.

What data security solutions do you apply to secure your data at each stage? Share with us on FacebookOpens a new window , TwitterOpens a new window , and LinkedInOpens a new window . We’d love to know!