Why Data Discovery and Classification Are Crucial for Modern Enterprises


As data becomes more valuable, so too does data discovery and classification (DDC).  A modern discovery and classification approach can help organizations drive forward their overall security strategy while meeting data privacy commitments, explains Trevor Morgan, product manager, comforteAG. 

Data discovery and classification (DDC) are not a new phenomenon, but its significance in the modern digital world is paramount. Data, especially sensitive data or personally identifiable information (PII), is considered to be worth more than gold to most organizations. 

As a by-product of this trend, the growth of data collection, storage, and usage has exploded. In fact, it is predicted that the amount of data in the world will be 175 zettabytes come 2025, an increase of 33 zettabytes from 2018. 

Data is a useful commodity for enterprises as the information telemetry provides sentiment around their customers’ behaviors, which can lead to improved products and services. However, if an enterprise does not have data discovery or classification capabilities, then this presents a serious issue concerning the tracking and management of these troves of information – a significant red flag in this age of data governance and compliance. 

Duty to Privacy

In recent years, we’ve seen a rise in the number of data privacy regulations from various jurisdictions. One of the most prolific is the European GDPR, which other governments have used as the blueprint to safeguard their citizens’ personal and sensitive data. 

However, for organizations that do not have data discovery and classification tools, they run the risk of being able to meet these crucial data privacy laws which are sprouting up all over the world. 

Here is a list of just a few of these laws: the US has CCPAOpens a new window , Brazil created LGPDOpens a new window , South Africa introduced POPIAOpens a new window , New Zealand formed the 2020 Privacy ActOpens a new window and Canada has the DCIAOpens a new window . Furthermore, if an enterprise operates within a highly regulated industry, such as healthcare or financial services, it must adhere additionally to the rules of industry-specific data protection standards as defined by PCIOpens a new window and HIPAAOpens a new window .

As the importance of data grows, neglecting these strict data privacy legislation will lead to organizations being penalized with heavy fines that can do severe damage both financially and reputationally. This is hardly news and something the industry has been keen to get across to enterprises for years. 

Learn More: Survived the Pandemic? Don’t Risk Your Business to a Cyberattack Now

Make Use of the Right Data Discovery and Classification Solutions


Yet, many of these data privacy regulations overlap with specific industry requirements. Therefore, it makes sense for large enterprises operating in various locations to make use of the right data discovery and classification solutions to achieve cross-compliance. 

Geo-dispersed organizations can be smarter about compliance and invest in solutions which make their lives easier. For example, in order to meet compliance mandates, organizations must have processes in place to know where an individual’s data is stored, have the ability to provide that information back to that individual, and ultimately give that individual the ability to request deletion of his or her data. 

Here is where data discovery and classification can come into its own.

Without DDC, securing data becomes problematic at best. Furthermore, an organisation may have knowledge of what’s in a particular database, but people won’t have visibility or specific information about what regulated data is being housed there. Matters are made worse once this data is extracted, put into reports, PDFs or excel spreadsheets, and copied around in unstructured formats. Here, the risk of suffering a data breach goes up exponentially as any control and security is lost whenever unstructured data is manipulated, duplicated, and distributed. 

Issues With Legacy DDC and Way Forward

Many traditional DDC tools have alleviated these concerns while helping automate many of the manual processes associated with data discovery and classification. However, most of these off-the-shelf tools have limitations that result in organizations compromising on specific, yet critical, capabilities.

Especially for unstructured data, the accuracy of any data discovered needs to be precise to help organizations understand the level of sensitivity and potential risk. Yet, traditional approaches to DDC can be described as static. This is due to only being able to conduct discovery for data at rest and brittle because they tend to be based on pattern matching – which is more of a hindrance as this requires a lot of trawling through false positives.

For example, if a credit card number was discovered, the pattern-matching mechanism would have the ability to tell if it was indeed a credit card number based on the format, and nothing more. 

Modern DDC tools that leverage artificial intelligence and machine learning components can take this further and decipher the following: 

  • If the credit card is associated with an individual that the company has a business-oriented relationship with 
  • If the individual is a customer or employee, where the individual is based, whether that credit card number is from a European citizen or an American citizen, and other correlations. 

By quickly making sense of unstructured data, this level of detail and understanding can then help organizations address key compliance and security requirements, which is an important differentiator from traditional methods.

Another challenge that has escalated over the past year involves organizations migrating to cloud services such as Office 365 and Google Drive to allow employees remote access to data. Before, many were cautious to embrace cloud technology, with organizations reluctant to store sensitive information in the cloud. 

However, due to the pandemic, which forced the majority of the workforce to work from home, on-premise infrastructures had to quickly be superseded by hybrid deployments as a minimum to facilitate remote working. 

Yet, if the data moved is not structured or classified, then a lack of visibility will still persist. Also, traditional data classification tools are not suited to meet the demands of modern cloud environments as they were not designed with this technology in mind. This in turn presents gaps, issues and limitations on overall functionality.

Learn More: Can Backup Data Be Trusted After a Ransomware Attack? 3 “I’s”for Steadfast Resiliency

The Journey From Data Discovery to Security

 Many enterprises still have no idea where data is located, and they cannot continue to remain idle when handling their privacy and security responsibilities. Accurately discovering and classifying all sensitive information that is created, captured, and processed across all environments is the first step in the right direction to compliance. Then, organizations can pinpoint where their biggest risks are and whether security is being leveraged appropriately for the data they store.

The ultimate goal is to limit the exposure of sensitive data. Once it has been mapped across the infrastructure, data-centric security must follow. Using technology like tokenization can help secure data throughout the entirety of its lifecycle. The benefits of following a modern discovery and classification approach can help organizations drive forward their overall security strategy, all while meeting data privacy commitments.  

Did you enjoy reading this article? Let us know your thoughts on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We would love to hear from you!