Big Data Supports Investigative Journalism Case

essidsolutions

In October, 13 million documents containing information about private accounts stashed in tax havens were leaked to journalists. The so-called Paradise Papers detailed the work of registries in places like Bermuda and the Bahamas.

While from a journalistic point of view this was a juicy leak, it was also nearly impossible to handle properly. It would take months, even years, for one journalist to sift through millions of loan agreements, financial statements, emails, trust deeds and other paperwork going back five years. Unsurprisingly, in the days after the leak the gossip-friendly portions of the information were digested the fastest, generating a series of damning headlines about celebrities and big companies.

But a more thorough analysis, one where investigative journalism overlapped with financial fraud investigations, required big data software. The 1.4 terabytes of leaked data were turned over to the International Investigative Journalist Consortium (ICIJ) which in turn brought on board Linkurious, a French software company specializing in visualization and analysis software in cases of money laundering and cyber-security. The company had previously worked with government agencies and journalists on similar financial data investigations including the Swiss Leaks in 2015.

Linkurious’ software was designed to identify and analyze complex connected data and to present the results in a form both accessible and easy to handle for non-tech savvy journalists. Neo4j, a graph database with a widely used open-source product, provided the visual part of the analysis.

The investigation started from raw, unstructured bits of data, which were mostly not machine- readable. They needed to be organized in a form that could automate the investigation and to use a predefined-data model. Part of the problem was that data was kept in silos that made it difficult to cross-reference and highlight connections.

ICIJ reporters used the information from the leak and but also from public databases, which made it essential for the siloed data to be connected.

With the help of graph technology investigators, reporters and analysts are now able to handle the complexity of data-driven investigations such as the Paradise Papers, or the largest-ever leak in history, last year’s 2.6 terabytes Panama Papers. Panama Papers are a case of successful large- scale data-driven investigation that illustrates the recent shift in fraud investigation.