With AI and DL, storage is cornerstone to handling the deluge of data constantly generated in today’s hyperconnected world. It is a vehicle that captures and shares data to create business value. In this technology guide, insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning, we’ll see how current implementations for AI and DL applications can be deployed using new storage architectures and protocols specifically designed to deliver data with high-throughput, low-latency and maximum concurrency.
The target audience for the guide is enterprise thought leaders and decision makers who understand that enterprise information is being amassed like never before and that a data platform is both an enabler and accelerator for business innovation.
Introduction
The stage is set for enterprise competitive success with respect to how fast valuable data assets can be consumed and analyzed to yield important business insights. Technologies such as artificial intelligence (AI) and deep learning (DL) are facilitating this strategy and the increased efficiency of these learning systems can define the extent of an organization’s competitive advantage.
Many companies are strongly embracing AI. A March 2018 IDC spending guide on worldwide investments on cognitive and AI systems indicates the level will reach $19.1 billion for 2018, an increase of 54.2% over the amount spent in 2017. Further, spending will continue to grow to $52.2 billion by 2021. By all indications, this is an industry on an upward trajectory, but limiting factors such as data storage and networking bottlenecks must be addressed to assure the maximum benefit from AI and DL applications.
Enterprise machine learning algorithms have historically been implemented using traditional compute architectures, where system throughput and data access latencies are measured by paring compute and storage resources through the same network interconnections that serve other business applications. With AI and DL, the increasing volume and velocity of arriving data are stressing these legacy architectures. Although compute has made great strides with GPUs, legacy file storage solutions commonly found in enterprise data centers haven’t kept pace.
Data is the New Source Code
Data’s role in the future of business cannot be overstated. DL is about growing autonomous capability by learning from very large amounts of data. In many ways, data is the new source code. An AI data platform must enable and streamline the entire workflow. AI and DL workflows are non-linear, i.e. not a process that starts and then ends, and then goes onto the next iteration. Instead, non-linear means the operations in the workflow happen concurrently and continuously (as depicted in the wheel graphic below). It’s all about iterating, completing each step as fast as possible through the acceleration afforded by a parallel storage architecture. It’s about getting the wheel going and allowing customers to grow their infrastructure seamlessly as the data sets grow, as the workflows evolve. Data is ingested then gets indexed and curated before being used for training, validation, and inference; all these different steps happen concurrently and continuously. Data continues to be collected as training occurs, as models are moving to production. The wheel gets bigger and more engaged as workflows evolve.
Over the next few weeks we will explore these topics surrounding data platforms for AI & deep learning:
- Introduction, Data is the New Source Code
- Unique Storage Demands for AI and DL Workloads
- Characteristics of Storage Solutions Optimized for AI & DL
- Accelerated, Any-scale AI Solutions
- Data Storage for AI/DL Case Studies, Summary
If you prefer, the complete insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning is available for download from the insideBIGDATA White Paper Library, courtesy of DDN.
Speak Your Mind