Quality Over Quantity: Understanding the Importance of Data Quality


It is tempting to believe that data quality is something new, brought about by the advent of new regulations such as ePrivacy and GDPR. It is not. Data, its management, and quality have been around since information was first created. David Kolinek, vice president of product quality, Ataccama talks about the importance and applications of quality data. 

With the rise of new government regulations like ePrivacy laws in the U.S. and the European Union’s General Data Protection Regulation (GDPR), it’s easy to think data management, particularly functions related to improving the quality of data, has only recently emerged as a core enterprise function. But this has always been a necessary element in the world of information, right back to when people were carving thoughts on stone tablets.

To understand how and why this is important in today’s digital economy, we need to first settle on a working definition of data quality (DQ). According to the Data Management Body of KnowledgeOpens a new window , “Data Quality is the planning, implementation, and control of activities that apply quality management techniques to data to ensure it is fit for consumption and meets the needs of data consumers.”

We can extend this definition a little further by adding that DQ is also the process of turning data into operational knowledge, allowing data consumers to make more informed decisions based on individual pieces of data and patterns that would otherwise go unnoticed.

In this light, DQ is not a single step but a series of actions that pull together multiple resources and functions centered on the idea of making data usable in a clearly defined, purposeful way. 

DQ Dimensions To Focus On

DQ itself resides under the broader realm of data management. Its job is to provide a broad, holistic view of entire datasets and examine them from various perspectives to ascertain what kind of data it is and whether it fits the level of quality needed. These dimensions cover a range of attributes, including:

  • Completeness: Does a dataset contain gaps, and if so, where are they located? Some may be trivial, others highly consequential. A billing department, for instance, may require phone numbers and emails. Records lacking these may not be reflected in the final outcome, which can lead to poor decisions. 
  • Timeliness: Crucial applications like CRM thrive on real-time data. Data should be refreshed regularly because the digital universe moves fast. Still, organizations must define what is and isn’t timely. Some financial data is useful once per quarter, while other sets must be acted on within minutes or opportunities are lost.
  • Validity: Data teams should have confidence that emails and postal addresses are correctly populated. Validity checks are crucial to ensure data conforms to formats, type requirements and value ranges. This is particularly vital with automation because processes are more data-dependent.
  • Uniqueness: Duplicate data is the bane of effective processes. A uniqueness score can help duplicate data in a given set, either in a single column or an entire record. Mistyped information can often place the same order ID on two records, leading to incorrect sales, inventory, and more.
  • Accuracy: Errors creep into data in various ways and can snowball throughout a process. While total accuracy isn’t usually feasible, ever-increasing accuracy and consistency should be the goal. 

Quality Matters

Some data sets can produce greater commercial and financial value, and others reduce risk. By determining value, enterprises can prioritize one over another and data sources and domains, enhancing DQ and the projects that it’s applied to.

For instance, employee salary data is vital to HR but not marketing, whereas customer information is crucial to marketing but not HR – this needs to be accounted for. When it comes to risk, personally Identifiable Information (PII) can expose an enterprise to not only fines if not adequately protected but also costly damage to public trust in the brand’s reputation. And in finance, details like email addresses and credit card information are the foundation of most online transactions, which means proper profiling and reporting has a direct and vital link to sales.

In this light, it’s easy to see how quality counts and why DQ is a mission-critical function that can make or break everything from long-term strategic planning to daily operations. 

See more: Democratizing Data Science Doesn’t Require a Dragon

Poor Quality, Greater risk 

As enterprises become more sophisticated in DQ processes, they’ll discover crucial facts about themselves and the markets they serve. Decisions to take a proactive or reactive approach to changes can be made quicker and far more accurately. On the other hand, organizations that fail to implement data capture and management will be at a gross disadvantage. Without validation fields, for example, they must rely on free-form capture from a website – which can introduce faulty and inaccurate data into a system. 

This isn’t the cost of business – poor DQ introduces risk and can impact operations. A company sending communications via mail can lose a lot of money if names and addresses are wrong. Customer experience may be undermined and a brand damaged. Companies failing to meet security and compliance standards could face steep fines. Analytics may be flawed, AI models rendered unreliable. It can lead to unnecessary warehousing, fraud and abuse, product design that ruins manufacturing runs and more.

A 2020 Gartner surveyOpens a new window noted poor DQ costs organizations $12.8 million on average annually. Other estimates place the cost of bad data at anywhere between 10-30% of revenue while correcting individual mistakes can run between one and 10 dollars per record.

Key Benefits of DQ

It would be impossible to list all of the benefits to better DQ, but leading contributors to the bottom line include:

  • Higher ROI for marketing and customer outreach efforts, in part by providing smoother delivery and more reliable targeting for electronic media and print. 
  • Reduced time and effort to correct erroneous data, saving up to $10 per record. 
  • Improved personalization for goods and services, support functions and customer relations.
  • Faster, better decision-making.
  • Streamlined compliance and a transition to a more customer-oriented, data-driven business model

No two enterprises are alike, of course, so in the end, DQ will deliver benefits that are unique to your organization. These are two must-have capabilities to get there.

  • Data profiling: Such tools examine data sources faster than SQL queries. They also help determine necessary transformations for inclusion in processes and address issues going forward.
  • Cleansing and transformation: Data structure often needs to be changed to improve DQ. Technology should allow for: format standardization, data parsing into distinct attributes, data enrichment via external sources, deduplication, masking to obfuscate data for heightened security. 

Most of these functions can be automated, so organizations should institute processes to validate and treat data before entering a given system. These “data quality firewalls” usually exist as algorithms that perform functions like checking website form data for format compliance, then automatically alerting users to enter data correctly. Similar algorithms can be applied to other customer-facing apps and even back-office processes. 

Measure for Measure

Finally, as a management guru, Peter Drucker once said, “If you can’t measure it, you can’t improve it.” By tracking changes in data over time, organizations gain new perspectives on how their operations are performing and what needs to be done to move them ahead.

This task becomes markedly easier and far more effective using a DQ dashboard. This can enable you to see if your efforts deliver the desired results or if the data metrics need to be tweaked. In addition, continuous monitoring can reveal sudden inflows of inaccurate data and locate its source while reducing the time and effort devoted to regulatory compliance.

Do you think stricter compliance regulations are needed to ensure greater data quality, or are the benefits motivation enough? Share with us on  LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We’d love to know what you think!