Ensuring That Big Data is Fully Reliable Still Has a Long Way To Go

essidsolutions

For large corporations, big data is increasingly seen as the panacea for all business ills, but what if the data is wrong? What if somewhere during the process of gathering, moving or storing the data it got corrupted? This could mean insurance companies making wrong decisions based on personal data such as shopping habits, labs making mistaken diagnoses or oil firms wrongly predicting future problems on oil platforms.

Not only is there barely any regulation or standardization on big data but various national institutions and regulators are far behind the curve on what is happening on the technological front.

Some of the work on best practices has been done by the industry, most notable by chip maker Intel which accounts for 90% of the global data-center market.

Beyond that, the United Nations’ International Telecommunication Union (ITU) agency made the first stab at trying to standardize big data back in 2015 with the introduction of an international standard detailing requirements, capabilities and uses of cloud-based big data.

More recently, National Physical Laboratory (NPL), the UK’s national measurement standards institute, has been creating a system that should provide a measurable level of confidence in data. The institute, which is also headquarters of the UK’s largest applied physics organization, is attempting to apply its approach to the measurement domain in turn to the digital world.

The laboratory analyzes collection, connection, comprehension and confidence in data. It verifies the source of the data and assesses its credibility and accuracy. The institute then looks at how the data was transported and whether there was proper error correction in the event of interference.

The comprehension element means ensuring the data properly, says Neil Stansfield, head of the digital sector at NPL. “When we’re doing analytics, using data from lots of sources, how do we ensure that uncertainty propagation through those data sources is properly understood?” he says.

NLP measures the certainty of data by modeling it, which becomes more difficult as the model becomes more complex. The institute is developing a methodology to quantify the uncertainty associated with a model. It is mainly targeting the engineering domain, but also wants to address other areas like satellite imaging and life sciences. NPL is also exploring how metadata around data quality might be stored at a machine-readable level, to make this data more accessible.

A number of large global corporations are already working on ensuring that the data is managed properly in practical product applications.

Intel has been working on data management with the US Department of Energy’s National Energy Research Scientific Computing Center and five Intel Parallel Computing Centers (IPCCs) to create a large data center. The center will provide an infrastructure for data management.

In summer, 2017, Toyota and Intel agreed to create the Automotive Edge Computing Consortium to work on standards, best practices and architectures for emerging mobile technologies within the car sector.

In the big data stack sector Intel and Hadoop-specialist Cloudera have been working on open-source enterprise data management by tuning the data-crunching platform on Intel architectures.

Nevertheless, given how widespread the use of big data is becoming, and how it is now a major factor in business decisions across most industries, the issue of reliability of big data will only increase.