Machine Learning and the Myth of the Silver Bullet

essidsolutions

In this article, Jim Barkdoll, CEO, TITUS, reveals the three-pronged approach organizations need to take to infuse their data protection strategy with machine learning technology.

Organizations have embraced the use of artificial intelligence and machine learning to evaluate and understand the massive amounts of data generated and consumed daily. What this technology can do from a data management and analysis perspective is truly outstanding.

But what about its use in security?

We’re starting to see people leverage artificial intelligence — and in particular, machine learning — to bolster their security efforts. The challenge is that people are applying outdated thinking to their security application, which will hinder its success. In short, just as organizations have done with cloud computing, encryption, and access management, many incorrectly view machine learning as a silver bullet solution for their data security woes.

When used in conjunction with existing security investments including data loss prevention, encryption and cloud access security broker (CASB) solutions (to name a few), machine learning can automate many tasks, effectively complementing the strategic work of security professionals.

It becomes a successful strategy as the right machine learning solution can help security professionals accurately provide context around the ever-increasing amount of data being created in their organization. This context is critical, as it enables security professionals to protect sensitive data. This is particularly important when it comes to unstructured data.

What Exactly is Unstructured Data?

Consider the Sony breach back in 2014. It wasn’t structured data that caused the biggest issues. Structured data includes social analytics, application information and other more codified data such as policy numbers, medical codes, credit card numbers and so on.

The data that was more damaging to Sony was actually unstructured — details contained in emails and personal files, communications between employees and artists badmouthing co-workers and celebrities, executive PowerPoint presentations meant only for an internal audience.

Every organization takes this type of information for granted. We create new documents every day and fire off countless emails, many of which do contain language and information that we wouldn’t want the general public to view.

Another example might be an email from an executive to an HR rep containing personal health information (PHI) about another employee’s heart condition. The email might contain a medical code, which could be picked up by typical machine learning algorithms.

However, the email might only include a description of the health concerns, including words and phrases such as “congestive heart failure,” “cardiac arrest” or “high blood pressure.” This information is unstructured and, therefore, much more difficult to identify and secure using regular expressions. Yet, every organization would surely want to protect the privacy of the individuals involved if they knew how to track it.

And if every business understood the true value of its unstructured data, they would be likely to want to apply machine learning as a way to better protect it. However, this type of unstructured data cannot be identified by applying only basic machine learning algorithms. More targeted machine learning is needed to help organizations learn what unstructured information they have.

Unstructured Data Remains at Risk

When we hear conversations around information security and data protection, the focus tends to be on structured data — Hadoop, data lakes, etc. But what about unstructured data? According to the Harvard Business ReviewOpens a new window , less than 1 percent of unstructured data is being used or analyzed properly. So why is this? Why does unstructured data continue to be an issue?

The problem is twofold: First, unstructured data is harder to identify, as it includes emails and files that are created in ever-increasing numbers on a daily basis. Second, typical machine learning algorithms are based on regular expressions that are pretty simple and fail to take into consideration the full context of the data they encounter.

For example, while these algorithms might be able to easily identify a credit card number in a Word document, a description of an upcoming doctor’s appointment in an email exchange might be missed.

What’s Next? Leveraging a Three-Pronged Approach.

To be successful, organizations need to take a three-pronged approach to infuse their data protection strategy with machine learning technology. Here’s how to do that.

  • Discover. First, organizations need a mechanism for using machine learning that will help them find and harvest information. By defining information handling policies and training machine learning tools to apply them, businesses can broadly begin to understand the unstructured data in their environment.
  • Analyze. Using computer analysis and human review, organizations can further understand what types of information they have and how it’s being used, stored and shared. Through data analysis, organizations can create contexts around different data types and begin to determine their values. What impacts — good and bad — could the unstructured data have on the business?
  • Reapply. Using the knowledge gleaned through analysis and review, organizations can refine their information handling policies and reapply machine learning to further implement them.

Ideally, this process operates on a continuous loop, where data policies are reviewed and tweaked on a regular basis, within day-to-day workflows.

The challenge of setting up this type of machine learning approach to data analysis and protection is being able to implement the right digital tools to enable it, having enough data scientists and people resources to do the ongoing analysis, and getting executives to approve enough budget to cover the program.

Abandoning Silver Bullet Thinking

Machine learning is not a silver bullet technology. It’s a new way of learning about information and providing context so organizations can extract value and continue to use it in strategic ways.

Educating people — including those executives who approve IT budgets — within an organization is key. Identifying someone at a fairly high level who can champion the effort and oversee the process of getting everyone on board within a company is an important first step.

In a perfect world, every person within an organization would provide the context around every bit of information they create. But the average organization creates so much data on a day-to-day basis that it’s unrealistic; it would have a huge impact on productivity.

Most people simply don’t have the capacity to add identifiers to every email, new document or folder they create. Many wouldn’t understand the full context of the data they generate anyway, and organizations would be challenged to get everyone to treat the process consistently from department to department.
Machine learning enables organizations to automate processes based on existing information and then create policies with a high degree of confidence while offering a frictionless experience for end users. By using a data-driven workflow that is designed to build valuable information from which knowledge can be derived, organizations can create a successful data protection strategy.