How To Accelerate Enterprise AI With Data Architectures

essidsolutions

Over the past two years, while the pandemic has left IT organizations with stretched budgets and talent shortfalls, the demand to collect, store and transform data into new insights has only grown. As artificial intelligence (AI) transforms the way companies do business, conduct research, and development, and introduce innovations, they are beginning to realize that a data-centric architectural approach is the only way to de-silo the crush of unstructured data and gain governance, security, and application performance that enable the operationalization and long-term success of AI projects.

At the heart of this data-centric architecture are technologies that were previously considered exotic in traditional IT settings. These include computing (GPUs and IPUs), networking (InfiniBand), and storage (true parallel scale-out filesystems) that allow applications to address datasets that far exceed traditional capabilities.

Can Companies Address Both Data and Infrastructure Security?

By centralizing AI data, IT organizations can address both data and infrastructure security, two aspects that are critical to the AI pipeline. Many early AI experimenters and adopters were isolated lines of business that ran small trials against limited data sets on small systems, including computers sitting under a data scientists’ desk. This approach allowed organizations to move swiftly, but introduced risk, especially in regulated industries like healthcare and financial services. AI data became a more frequent target of data theft and other attacks, adding even more risk exposure. As AI moves into production, the centralization of data assets is essential to creating a well-managed governance model that addresses both internal and external access concerns and standardizes protection to minimize the risk of misuse.

Factors To Consider for Application Performance

Application performance, especially as AI utilization increases across companies, is a product of multiple factors. The first factor is the availability of sufficient quality data. For organizations utilizing complex unstructured data to inform their models, the datasets are approaching multi-petabyte sizes. The retention of data for future reference is also driving the need for scalable systems that preserve fast access to all the data. Traditional IT systems are focused on high performance for highly structured and typically smaller workloads. The emergence of GPUs as the most popular compute architecture for the data-intensive parts of the AI pipeline mean that equally, performant storage must complement these analytics systems.

See More: 5 Ways Enterprises Can Make the Most of Their AI Investments

This is the reason many organizations are looking to true parallel scale-out file systems for the backbone of the data-centric architecture. A recent IDC survey indicated that 58% of enterprise organizations have file storage workloads that require a parallel file system for performance (primarily throughput to a single file) requirements. These requirements are frequently driven by the performance needed to feed GPU systems running AI or analytics applications. These GPU systems are optimized to accelerate the multi-matrix calculations that are at the core of AI training and inference.  AI models with billions of parameters are becoming commonplace outside of the academic research setting, and for that reason, companies are investing in GPU-based clusters, networking, and storage systems that support these requirements.  

Harness the Transformative Nature of AI

Organizations looking to harness the transformative nature of AI should consider from the beginning what their path to productive success looks like.  The landscape is littered with projects that failed because of insufficient computing performance or storage scale, roadblocks that were encountered right when it mattered most when these applications reached the point of operationalization. By choosing a data-centric architectural approach that examines the true demands of the processing, management, and storage of data, organizations can reduce the risk of failed projects significantly.  Infrastructure vendors are continuously pursuing integrations that also remove the complexity from identifying, procuring, and deploying these systems. Choosing a highly integrated solution will allow companies to remove the complexity from the AI stack. Finally, by centralizing these functions, IT organizations can also ensure the long-term security and governance over the data that is at the heart of the pursuit of greater productivity, efficiency, and innovation.

Did you find this article helpful? Tell us what you think on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We’d be thrilled to hear from you.