Unstructured Data’s Moment Has Arrived: 3 Key Predictions for 2023


Across businesses in every industry, data is still vastly underutilized, even completely dark. But vector databases -until recently only the territory of hyperscalers and computer scientists – are poised in 2023 to make unstructured data processing practical and help ensure that powerful AI applications are within every business’s reach, shares Charles Xie, founder and CEO of Zilliz.

Despite all of the advances in data processing, database management and data warehousing over the last four decades, a basic, even uncomfortable, reality has held true: Most businesses have been largely unable to tap into the bulk of their vast data reservoirs to build real value. This is true for structured data in relational databases and even more so for semi-structured and unstructured data. Experts estimateOpens a new window that between 55% to over 80% of a business’ data is dark, that is, not used in any way to derive insights for decision-making or accelerate business growth. 

At a typical enterprise, including ones leveraging cloud services, a centralized IT unit cleans and transforms some select datasets, moves copies through data pipelines, and then specialists derive nuggets of business intelligence from those datasets. The use cases are often limited, process and system complexity is significant, and the data services are typically expensive, proprietary and inefficient. Gartner observesOpens a new window that “traditional platforms across the data, analytics and AI markets struggle to accommodate the growing number of new data and analytics use cases.” 

But as we enter 2023, certain powerful, not-so-traditional AI-driven technologies—until recently only in the domain of hyperscalers and their data science and research divisions—are emerging for widespread use to give any business genuinely expanded access to its data, as that data comes to exist in a more distributed fashion. By 2025, Gartner continues, “70% of organizations will be compelled to shift their focus from big data to small and wide data to leverage available data more effectively, either by reducing the required volume or by extracting more value from unstructured, diverse data sources.” 

The darkness around semi-structured and unstructured data processing is indeed lifting, and the democratized ability to explore untapped volumes and execute advanced, predictive analytics has arrived. 

Here are three major developments associated with this shift that will help direct data strategies in 2023.

See more: Data Observability and Proactive Data Testing: An Analytics Engineer’s Answer to Complexity

1. Vector Databases Will Power AI Applications at Scale

This will unleash concrete new value from unstructured data across enterprises and SMBs.

Purpose-built for unstructured data similarity search and analytics, vector databases are the muscle behind some of the most practical AI applications. They are the tool for storing, indexing and searching through embeddingsOpens a new window —which are a way of representing data points in unstructured data—with extreme speed, accuracy and scale. Vector databases enable AI applications running personalized e-commerce searches, targeted smart advertising, recommendation systems including user-generated content recommendation, video and image analysis, antivirus cybersecurity, improved chatbots with more natural language skills, banking anti-fraud detection systems, geospatial analysis, medicine discovery, and protein structure prediction, among many other use cases.

As more businesses embrace the AI era and attempt to make full use of its benefits in production, the volume of unstructured data will spike even further—and it is vector databases that will help businesses quickly make sense of the deluge of machine learning model output.

2. An Enduring Synergy Between Structured and Unstructured Data 

Despite the exponential growth of the unstructured data, structured data in relational systems still remains prevalent and will retain its substantial value into the foreseeable future. It’s almost inevitable for organizations to have to deal with both structured and unstructured data at the same time to realize maximum business growth. They’re increasingly doing that by turning to systems that recognize the many forms of data, the distributed nature of data, and that data treated as a product to be developed, shared and consumed can yield increased revenue.  

Incumbent solutions originally engineered to deal with structured data for traditional data analytics can, in fact, extend their processing capabilities to unstructured data through plug-ins. Consider “native vector search” in ElasticSearch 8.0 and “vector similarity search” in Redis 6.0, for example. But for AI applications known for their next-level, intensive unstructured data and computationally heavy functioning, a purpose-built solution like a vector database shines. On this front, Google has launched its Vertex AI Matching Engine, and open source Milvus backed by the Linux Foundation has found growing popularity, with a fully managed vector database from its creators on the market, The purpose-built vector databases can be complemented with hybrid search functionality that supports familiar kinds of relational filtering based on tags, attributes, etc. 

Businesses should decide on a case-by-case basis which one suits them better. Solutions and platforms that make that choice and deployment as easy and inexpensive as possible will be clear winners.

3. Heterogeneous Computing Will Supercharge Performance

Heterogeneous computing refers to the use of different kinds of microprocessors – the hardware that executes programs – like various kinds of CPUs and next-generation GPUs in computing tasks. By assigning different aspects of a task to different processors, applications can dramatically improve performance. 

CPUs prevail as the processors supporting existing technology solutions in the market because of their widely recognized cost efficiency. But, as the proliferation of AI gives rise to more unstructured data and diverse applications, the higher the requirements for performance become. High throughput in certain scenarios is a must-have that can only be achieved by GPU-accelerated solutions. Think billion-scale image search and video analysis at Meta, for example.

The rise of AI-driven vector databases for practical business use, the increasing interconnection of structured and unstructured data systems, and the mixing of processors to yield extraordinary performance are all developments that will kick into high gear in 2023 and continue into the foreseeable future. Making unstructured data processing common will help ensure that powerful AI applications and the value they create are within every business’s reach.

What growing trends do you see in the unstructured data space? Share with us on FacebookOpens a new window , TwitterOpens a new window , and LinkedInOpens a new window .

Image Source: Shutterstock