The Top 5 Big Data Trends for 2018


Use of big data continued to evolve on all fronts during the past year, including software, hardware, financing and regulation. Some of the big themes included the wider use of artificial intelligence, preparations for impending new European regulation, IPOs of big data companies, the use of data fabric and the increasing popularity of R language. All of these will continue to shape big data well into 2018.

Artificial intelligence makes a leap

This year saw a boom in artificial intelligence thanks to emerging deep-learning technology and techniques that provide better and faster results derived from machine learning. AI is becoming omnipresent not only as a business tool in almost every industry but also in day-to- day activities such are tagging friends in photos or providing reminders when shopping online.

What we are seeing now is only the first generation of artificial intelligence which, though a potent tool, is at times not as intelligent as onlookers believe. Although AI can digest phenomenal amounts of data in a very short time and learn through repeating certain processes, it is not yet capable of some of the “sideways” thinking that a human brain can do. Hardware is one of the main limitations imposed on the growth of AI and as faster machines with bigger capacity become routinely available, AI development will keep in step.

According to data analytics company Teradata, some 80% of businesses are already investing in AI but many are still trying to understand where best to apply AI and are concerned with the feasibility of taking on an AI based initiative. In the year ahead, the AI boom is more likely to speed up than slow down as businesses digest lessons from their first set of AI applications.

Preparations for European big data regulation

The introduction of the new European Union regulation on big data is just around the corner, causing headaches for many a chief data officer, and not just in Europe. All the top social media and technology companies have devoted large amounts of time and resources in 2017 getting ready for what is to come.

Under the General Data Protection Regulation (GDPR), which comes into effect in May 2018, businesses will have to adopt strict rules when dealing with customer data; the regulation will mandate good data practices and give clients the right to be “forgotten”. Violations of record keeping, security and breach notifications could cost companies up to 2% of their global gross revenue in fines. Though technically a European regulation, it will have a bigger impact on US companies than any domestic laws surrounding big data. In a comment to the Financial Times earlier this year Facebook said that initial compliance will cost the company several million dollars.

The new rules will also make it much harder to use big data if certain users can “opt out” from being part of it. GDPR guidelines apply to all data gathered, whether willingly provided by customers or gathered by automated systems. This includes personal data stored and used in data lakes and big data analytic platforms. Each of these aspects will have to be managed, tracked and reported. 

Big data fundraising graduates to IPOs

Big data companies moved beyond their years of raising venture capital to fundraising via public listing with IPOs by Alteryx, Cloudera and MongoDB. If public comments from chief executives in other big data firms are anything to go by, 2018 will see even more listing activity.

Self-service data analytics specialist Alteryx was the first to IPO this year, raising $126 million. Since then its stock has risen over 70%. This was followed by Cloudera, a machine learning platform optimized for the cloud, which listed on the NYSE in April and raised $228 million. The shares initially rallied but by the end of the year they had lost almost one third of their value despite the fact that the company was identified by Deloitte as one of the fastest growing companies in North America. Last was MongoDB which raised $192 million when it listed on NASDAQ.

High on the list of companies expected to come to the market in 2018 is MapR Technologies. A Hadoop big data company previously valued at about $500 million, MapR says it can handle customers with petabytes of data smoothly. The company’s chief executive Matt Mills says he doesn’t have a timetable in mind yet, and will come to the market when the time is right.

Speculation that security startup Palantir Technologies may also IPO next year was stoked by a leak claiming that the company made a stock purchase offer to employees. Nothing has been confirmed yet by the company which habitually plays its cards close as it provides big data solutions to several branches of the US government on software dealing with antiterrorism. In a league of its own compared with other startups, the company is valued at $20 billion.

Other expected IPOs include Couchbase and Anaconda.

The emergence of data fabrics

Data fabric is a technology that “sews” data in all its disparate forms into one usable package. It is being increasingly embraced by businesses as it helps them deal with the sea of compiled data, ever-changing applications and processing needs.

Companies frequently use several different applications which double or triple the amount of data they store, all in different formats. Data fabrics translate it all into one format and avoid the doubling of information. Data that is currently maintained in files, database tables, data streams, objects, images, sensor data and even container-based applications can all be accessed using a number of different standard interfaces. Data fabric can combine data from established systems — regardless of size and future requirement for scalability, and make it available to applications.

Data fabrics also go by the clunkier name of converged data platforms and are available in commercial form from companies like MapR Technologies, NetApp and Talend.

R is for language

Big data is analyzed using several different methods but one of the most convenient one is programming language R.

For most data analysis projects, the goal is to create the highest quality analysis in the shortest time. R language, an open-source analytics package, covers that requirement.

Although it has been around for over a decade, the use of R has picked up in the last few years. It is consistently the preferred language for data analysis and it is particularly popular for academic and research projects. It is free and embedded in a number of commercial products such as Microsoft Azure Machine Learning Studio and SQL Server 2016. R is also strong in supporting machine learning. For deep neural networks, however, developers still need higher-performance computing that R can provide.

Developers in both the academic and corporate environment like R because they can easily write their own software and distribute it in the form of add-on packages. There are already thousands of R packages out there covering everything from finance via psychometrics to genetics. This ease of use will keep R popular over years to come.