insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning – Part 2

Print Friendly, PDF & Email

With AI and DL, storage is cornerstone to handling the deluge of data constantly generated in today’s  hyperconnected world. It is a vehicle that captures and shares data to create business value. In this  technology guide, insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning, we’ll see how current implementations for AI and DL applications can be deployed  using new storage architectures and protocols specifically designed to deliver data with high-throughput, low-latency and maximum concurrency.

The target audience for the guide is enterprise  thought leaders and decision makers who understand that enterprise information is being amassed like never before and that a data platform is both an enabler and accelerator for business innovation.

Unique Storage Demands for AI and DL Workloads

The primary components of AI and DL, artificial neural networks (ANNs), have extraordinary data  consumption with limitless combinations of adjustments to hyperparameters and samples in data sets.  These applications pose exceptional challenges and put significant strain on compute, storage  and network resources. Legacy file storage technologies and protocols like NFS starve AI workloads of  data, slowing down applications and deterring important insights. A true AI data platform must  concurrently and efficiently service the entire spectrum of activities involved in DL workflows,  including data ingest, data curation, training, inference, validation and simulation.

At the core of AI and DL, the training process involves scale and complexity. Training is essential to  reach the desired accuracy for these algorithms, requiring immense I/O, data storage and computational resources. Parallelizing the training process serves to accelerate model refinement, with faster transition to production.

Reliable and rapid inference requires an iterative training process to achieve validation of accuracy —models with hyperparameter variations are run through multiple epochs (complete passes of the  data sets).

AI and DL workflows happen concurrently, continuously, and benefit from distributed computing. A  shared storage architecture provides simultaneous access to data from multiple systems, enabling  multiple operations to happen at the same time. An AI data platform must provide collection and  access of large amounts of heterogeneous data from a wide variety of sources. To be useful for the DL  application, the ingested data sets must be indexed and curated. From a user perspective, it’s  important to enable easy data discovery, which means making the information available everywhere,  anytime, and through an interface easily accessible by data scientists and the applications.

This is the second in a series of articles appearing over the next few weeks where we will explore these topics surrounding data platforms for AI & deep learning:

  • Introduction, Data is the New Source Code
  • Unique Storage Demands for AI and DL Workloads
  • Characteristics of Storage Solutions Optimized for AI & DL
  • Accelerated, Any-scale AI Solutions
  • Data Storage for AI/DL Case Studies, Summary

If you prefer, the complete insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning is available for download from the insideBIGDATA White Paper Library, courtesy of DDN.




Speak Your Mind



  1. Best source
    Really helpful for newbie and pro as well

  2. Hey Daniel
    Thanks for sharing this helpful learning resource.