Sensitive Data Governance Still a Difficult Challenge

Most companies lack a comprehensive inventory of their data â€” most only have tabs on about 10 to 20 percent of their total data estate. Alex Gorelik explains how companies can automate the process of finding all sensitive data across heterogeneous systems.

According to Dimensional Research, only 14 percent of U.S.-based companies are ready to comply with the California Consumer Privacy Act (CCPA), which officially goes into effect this January. According to the same research, 44 percent haven’t even begun the implementation process.

Will organizations straighten up their act in time? Not if GDPR taught us anything. In another survey recently conducted by the International Association of Privacy Professionals (IAPP), a little more than half of respondents said they were still working to comply with GDPR.

Hoping to Slip Through the Compliance Cracks?

Granted, while EU organizations have reportedly experienced 60,000 data breachesOpens a new window through January (DLA Piper), only 100 fines were issued. Clearly, regulators are overtaxed, but businesses should not expect to slip through the cracks forever.

This was underscored in January when France’s data regulator, the National Commission on Informatics and Liberty (CNIL), fined Google â‚¬50 million ($57 million) for not properly notifying its users how data is collected from its properties, including Google.com, Google Maps and YouTube, to present personalized advertising.

While it may be the largest GDPR-related penalty we’ve seen so far, larger fines are sure to come.

GDPR- and CCPA-Level Regulation: Here to Stay and More on the Way

GDPR and the CCPA are just the beginning as public opinion against the indiscriminate use of personal data begins to take hold. More governments are sure to follow suit.

Plus, there’s the myriad industry regulations many organizations must adhere to anyway. The credit card industry, for example, has PCI regulations that define how sensitive credit card data should be handled. The medical industry has HIPAA. Regulation will only become more complex as the sources and nature of data change. Thus, it would behoove organizations to simply accept it now as a growing fact of life. The shift is on: all customers everywhere will probably eventually have full control over their own data.

Governance: An Enormous and Complex Task

The problem, of course, is the sheer enormity and complexity of the task.

Governing all of an organization’s mountains of sensitive data, or even knowing what sensitive data exists and where it’s located within the enterprise, isn’t easy. Data classification is hard to accomplish. Often it does not occur reliably given the volume of data that must be discovered and when the task is left to business users.

A large health care provider stores 4.1 billion columns of data. A financial services company sucks in more than 10 million data sets per day. As the data pours in, only a small percentage of it â€” the so called Critical Data Elements (CDEs) â€” are tagged in a painfully slow and error-prone manual process that leaves most data miscategorized, lost or still waiting to be discovered, and impossible to track. Most companies have between 100 and 200 CDEs, but customers typically have thousands and sometimes even millions of data elements depending on their business, data organization and representation. This presents a risk because CCPA covers any data you know and possess about your customers.

It’s a near-impossible undertaking for today’s enterprises and it’s a major reason why so many big data initiatives have stalled and continue to lag behind in compliance.

Automation in Governance: Helpful but Still Largely Unused

The good news is automation in this area can accelerate discovery of massive amounts of data and the subsequent governance of that data. The problem is most organizations still haven’t even completed this crucial first step.

At the most recent Catalyst Conference, speaker Gartner analyst Sanjeev Mohan seemed stunned to discover that most of his audience of data professionals didn’t even know such automation capabilities existed.

Some organizations are therefore still reacting to data governance initiatives like GDPR and CCPA by quarantining and limiting access to large volumes of data. But by treating all data as sensitive, including data that isn’t, business analysts are required to submit formal requests for access to understaffed IT groups that can take weeks if not months to respond.

Their data’s value in today’s real-time world is, as a result, in large part drowned by this firehose approach.

Sensitive Data: Out of Sight, Out of Mind?

Even data that’s buried somewhere and virtually inaccessible is still subject to regulations like GDPR’s right-to-erasure rule. This requires organizations to jettison personal data on a number of grounds, including when it’s no longer necessary â€œin relation to the purposes for which they were collected or otherwise processedâ€ and explicitly upon request (right to erasure).

If data is compromised, companies are required to notify customers about the breach. Imagine having to explain to a customer who asked to be forgotten and was told that the request has been fulfilled that their data has been compromised because the company was not aware it was in a particular data set.

But how can you jettison certain personal information (let alone prove it has been discarded) if you don’t even know where it is? Alternatively, how can a U.S.-based organization that wants to remove itself from GDPR’s purview by discarding EU-based PII find that data?

These are the questions many businesses are now asking themselves. Since most companies lack a comprehensive inventory of their data, including data related to development, test, production, warehouse and backup systems, they only have tabs on about 10 to 20 percent of their total data estate. This lack of knowledge around data lineage can also inhibit the organization’s ability to mask sensitive data (another GDPR requirement) and properly track all processing activities (yet another requirement), including categories of recipients of personal data, transfers of personal data to a third country or an international organization, and those who process data on behalf of the organization.

Other Governance Challenges Remain

Despite what I believe in most cases are an organization’s best intentions, it’s still very difficult for them to deliver on consumers’ wishes for complete control over their personal information. Implementing consistent governance policies across heterogeneous systems that use different technologies (which are managed by different teams with competing priorities) is a mind-bending challenge for most enterprises. Even with certain technologies available to assist, other challenges remain.

In NewVantage Partners’ latest annual Big Data and AI Executive Survey of blue-chip organizations, for example, it was discovered that only 31 percent of the organizations surveyed have succeeded in creating a data-driven organization. What did 95 percent of them pin the blame on? An inability to create a data-minded company culture.

And we all know how easy changing company culture is, right?