insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning – Part 5

Print Friendly, PDF & Email

With AI and DL, storage is cornerstone to handling the deluge of data constantly generated in today’s  hyperconnected world. It is a vehicle that captures and shares data to create business value. In this  technology guide, insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning, we’ll see how current implementations for AI and DL applications can be deployed  using new storage architectures and protocols specifically designed to deliver data with high-throughput, low-latency and maximum concurrency.

The target audience for the guide is enterprise  thought leaders and decision makers who understand that enterprise information is being amassed like never before and that a data platform is both an enabler and accelerator for business innovation.

Data Storage for AI/DL Case Studies

In this section we’ll consider some compelling use case examples of how DDN storage systems have enabled customers to maximize the value of their data and easily and reliably accelerate time to insight  using AI and DL. DDN enables thousands of customers all around the world, in a wide cast of  industries, to accelerate their businesses using AI and DL. DDN A³I solutions are fully-optimized to deliver massive performance acceleration to these enterprise applications.


Autonomous vehicles engage some of the toughest workloads in AI at unprecedented scale. They  require the handling, ingest and delivery of a broad range of data set types and sizes, generated from  many different sources such as video cameras, lidar, radar and other sensors. Very large data sets  captured over millions of miles undergo many cycles of processing, labeling, sub-sampling and  categorization, before being presented to the DL applications.

Self-driving vehicles require the maximization of the number of testing scenarios to improve vehicle perception accuracy and operational autonomy. This requires a reliable data storage framework that  scales to TB/sec of throughput and hundreds of PB of capacity is essential.

For this customer, a massive data set for training neural networks was developed, data from  experimental vehicles and ridesharing engagements was collected, and an extensive and complex DL  framework was trained, tested and refined for the autonomous driving capability. The resulting  software was loaded onto experimental vehicles for evaluation in the field, and operational data from  the ride fed back into the loop to further enhance the DL process.

The customer’s requirement called for the creation of a very large scale parallelized data storage system  to feed an extremely large scale GPU based computing platform. The storage solution had to  ingest, keep and deliver massive amounts of data rapidly and reliably, scaling linearly to extreme levels  in performance and capacity. With original increments set at nearly one hundred petabytes of  capacity, highest data center density and efficiency with low management and support overhead were  additional must haves.

The DDN storage platform effortlessly handles the concurrent ingest of these massive data streams,  organizing and structuring the underlying data sets.

Millions of GPU cores continuously access the DDN storage system, executing extensive and complex  training processes, continuously refining the self-driving capabilities of the fleet of vehicles. DDN  storage has enabled this customer to harness data at immense scale, successfully and reliably building  an advanced AI framework that is revolutionizing the transportation industry.


By using AI techniques such as machine learning and artificial neural networks, researchers are building systems to improve the detection, diagnosis, treatment and management of diseases. In  addition, clinicians, researchers and industry players are working to co-develop and validate  algorithms that can recognize patterns of disease and advance diagnostic capabilities. Corps of data  scientists, developers, and fellows train and test models with the potential for commercialization.  There is a focus on the pipeline of translation—from model conceptualization to clinical validation. AI  platforms enabled by DDN storage greatly enhance the ability of researchers to identify and cure  diseases.

DDN AI and DL in life sciences use case

A research facility selected DDN to implement a solution capable of covering all ingest, processing and  management of the data sets, training and inference from the DL applications, and real-time visualization.

The storage system was required to hold a large repository of data sets for neural network training with rapid shared access to multiple GPUs that execute intense training, testing and inference. The DDN all  flash system deployed reliably, handles complex data ingest while simultaneously supporting post-processing, inference, visualization, training and validation operations.


Another compelling use case involves a leader in next generation retailing technology that developed  ground breaking software enabling consumers to shop without having to go through the cumbersome  check out process. A series of high-definition cameras within each store are coupled with advanced  computer vision and DL to identify shoppers and keep track of which items they collect in real time.  Shoppers are billed automatically for the items as they leave the store. Live feeds are ingested from  each store’s video cameras during opening hours, while an intensive training activity is engaged in the  limited window after closing time, leveraging the day’s collected data sets.

The customer selected DDN for their requirement of an all flash component due to the limited training time window and in order to ensure saturation of the GPUs used by the DL application. DDN delivered  a solution which ingests live feeds from cameras in real-time and provides built in scalability to handle  the collection of additional daily data sets over time. The DDN solution combines an all flash layer,  with integrated controls for automatic staging of the day’s freshly acquired data set, with a hard disk  layer for longer term economical storage. GPUs get fastest and most efficient access possible to the  daily capture data and achieve highest productivity.


With the help of storage solutions fully optimized for AI and DL training and inference, data scientists, data engineers as well as academic researchers are able to focus their complete attention on what really  matters most—transforming valuable data assets into important insights with unparalleled velocity and accuracy.

In this technology guide, we’ve reviewed the unique storage demands for AL and DL workloads, along  with the characteristics of storage solutions optimized for AL and DL. We also provided a description  of products available from DDN and how they suit the requirements of storage solutions well-adapted  for workflows involving AI and DL. Here are some important takeaways when considering next steps  to take in choosing your storage solution:

  1. Performance is a critical aspect of data storage for AI and DL workloads. Parallel data access is the key for keeping pace with the demands of these popular technologies.
  2. Flexibility in the AI workflow is also vital in order to be able to deal with multiple data types, and engage multiple workflows.
  3. Scalability enables the ability to think ahead. Your needs today may be of limited scale. You may  have a small data set in 2018, but there is high likelihood that you’ll be on a path of collecting  more data because you have new sensors, new connectivity such as the new 5G coming out, and  higher resolution data sets. The technologies that are enabling AI like GPUs have a very fast  refresh cycle—every 8 months your GPUs are quadrupling in capability. Suddenly you’re able to  collect and process more information. In terms of scaling, enterprise applications are built on software and that iteration is in real- time as data scientists are able to come up with new algorithms for consumption. Benefit comes from maximum amounts of performance. This is the difference between break through innovation vs. incremental upgrade.

Time is of the essence in making strategic decisions about storage solutions for managing accelerating demands put in place by AI and DL applications. Your competitors are making the same decisions to gain strategic advantage in the marketplace. To take important next steps for learning how you can  facilitate breakthrough innovation by easily leveraging the power of new turnkey AI solutions for the  data center visit DDN. By simultaneously expediting deployment and delivering acceleration in time to  insight, DDN’s groundbreaking approach enables you to manage the entire AI lifecycle in-place and  simplify your data center. DDN can show how their storage solutions have the following advantages:

  • Easy to deploy AI solutions that immediately transform your AI concepts into business innovation
  • Possess long-term advantages that enable you to achieve high-performance AI at every stage of your growth
  • Show you how to realize the greatest technical and economic benefits through leveraging deep AI-expertise

This is the fifth and final in a series of articles appearing over the last few weeks where we explored these topics surrounding data platforms for AI & deep learning:

If you prefer, the complete insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning is available for download from the insideBIGDATA White Paper Library, courtesy of DDN.


Speak Your Mind