Why Analytics Warehouse Is the Answer to Big Data Analytics

essidsolutions

Luke Han, co-founder and CEO of Kyligence, shares the significance of analytics warehouse and how it will help businesses make informed data-driven decisions. Adopt these best practices to gain the full potential of the analytics warehouse.

The data warehouse is not what it used to be. The cloud and the data lake have turned it into something else. Is it a Lakehouse? Or something else? Some analysts suggest an analytics warehouse. These terms suggest the same thing- the data warehouse has intrinsic value to an analytics strategy. But, at the same time, the data warehouse as we have known it is just not enough. Or, more specifically, it is simply not practical technically or financially to put all your data in a data warehouse.

So, the argument goes that there must be some approach that fulfills the roles of both the data lake, a large repository of data that allows for lighter weight analytics and minimal ETL,and the data warehouse,the home of highly structured data that is ready for both traditional and forward-looking analytics approaches. 

For those who will remain focused on the cloud as their analytics infrastructure, figuring out the logistics, semantics, and economics of the analytics warehouse is critical. If we go with the thesis that the analytics warehouse needs to be able to provide the strengths (and avoid the weaknesses) of both the data warehouse and the data lake, then it is helpful to review these.

Source: Kyligence

Learn More: Getting the Data and AI Implementation Right for Your Organization

The Challenge

The challenge with each of these platforms is a relatively long adoption process, which in turn leads to higher costs and inefficiencies. But perhaps the greatest challenge is and will be containing all enterprise information in a single, all-encompassing system. Every data team knows that datasets have grown too large to move, therefore a single, central analytics warehouse is likely not practicable for most large organizations.

So, are we doomed to endure permanent data silos? We have learned from long experience that lack of visibility into data silos makes it difficult and expensive to identify and keep track of the organization’s most valuable data, especially since what is most valuable changes from day to day, from location to location, and from business unit to business unit. 

So, if you want an analytics warehouse that encompasses the strengths and utility of both, you need to consider the following:

Cloud Native Design 

The analytics warehouse (AW) is, by definition, a cloud construct, so it must take advantage of the core benefits of cloud infrastructure. That means it must be designed as a cloud-native set of services that can scale elastically, scale compute and storage independently, refresh its data sets without disruption of slowdowns, and be highly automated and guided by machine intelligence (ML/AI).

Intelligent Data

For most analysis tasks, your data needs some type of structure. But that structure does not need to be as rigid as a data warehouse that requires relational tables. More and more data will remain as CSV files, documents, text, and so on. The AW must also handle both historical data (data in place) as well as streaming data (data in motion). It must be able to handle petabytes of data under active analysis.

Data must also become more intelligent. A data warehouse keeps a certain amount of metadata about the runtime environment. With the AW, intelligence about the data must be created, maintained, and evolved as the data and analysis evolve. While “intelligence about intelligence” sounds like it may get confusing, it is essential to get the most out of your data assets,especially today’s dark data.

Learn More: How AI and Energy-as-a-Service Will Drive Decarbonization

New Analytical Models

The analytics warehouse model is going to need to support multiple analytical models- BI/OLAP, ad hoc exploration, predictive analytics, machine learning and AI, and new hybrid models containing one or more of these components. The need to quickly organize and structure datasets for these new hybrid analytical models will require something akin to governed data marts- a fungible, changeable, intelligent data organization that can quickly meet the requirements of these hybrid models.

The more we apply data science and get more sophisticated with distributed computing, the more we are going to want to accelerate the organization and modeling of our datasets. Vendors who are looking to become a standard component of new generation data pipelines will need to automate many of the processes that prepare an organization for its analytics workloads. Intelligent data plus highly automated, AI-augmented operations provide for maximum efficiency, stronger business linkages and will drive greater innovation and entrepreneurship across the enterprise.

Operational Intelligence

In order to evolve beyond the classic data warehousing paradigm for analytics, intelligence must pervade all aspects of the analytics warehouse. As I mentioned, intelligent data is a minimum requirement, intelligent self-tuning and predictive maintenance are going to become increasingly common as more organizations gain familiarity with AIOps, as it converges with DataOps and DevOps. Evolving data models and indexes via  AI-augmented tools and techniques will also become crucial for defining and maintaining peak performance.

In the world of cloud analytics, this operational intelligence is essential for establishing cost-optimized and performance-optimized operations. Machine intelligence is required to move toward perfect infrastructure efficiency and to get maximum horsepower from containerized workloads running on commodity clusters. That same intelligence in the form of intelligent agents will ensure that analyzing data residing in cloud object storage (S3, ADLS, etc.) or relational tables is delivering maximum throughput and the minimum cost.

The way data is queried must be optimal for the analytical model and the optimization must be done by the platform, leveraging different engines to balance performance and cost, combine best open source engines, including Apache Kylin and ClickHouse for a full analytics warehouse.

When Does the Analytics Warehouse Arrive?

The arrival of the analytics warehouse is a question of when not if. Organizations today do not really have a choice between running a data warehouse or a data lake. Virtually everyone is dealing with both. While the analytics warehouse may simply feel like this year’s strategic headache, it is a challenge that organizations are going to have to come to terms with. No single vendor can supply all the answers or all the components. No single cloud wins. 

Data warehouses will evolve and thrive. Date lakes will grow and intelligently mutate. They will come together and be recognized as the dominant analytics paradigm. Hail the analytics warehouse.

Let us know if you enjoyed reading this story on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We would love to hear from you!