insideBIGDATA Guide to Optimized Storage for AI and Deep Learning Workloads

Artificial Intelligence (AI) and Deep Learning (DL) represent some of the most demanding workloads in modern computing history as they present unique challenges to compute, storage and network resources. In this technology guide, insideBIGDATA Guide to Optimized Storage for AI and Deep Learning Workloads, we’ll see how traditional file storage technologies and protocols like NFS restrict AI workloads of data, thus reducing the performance of applications and impeding business innovation. A state-of-the-art AI-enabled data center should work to concurrently and efficiently service the entire spectrum of activities involved in DL workflows, including data ingest, data transformation, training, inference, and model evaluation.

The intended audience for this important new technology guide includes enterprise thought leaders (CIOs, director level IT, etc.), along with data scientists and data engineers who are a seeking guidance in terms of infrastructure for AI and DL in terms of specialized hardware. The emphasis of the guide is “real world” applications, workloads, and present day challenges.

Frameworks for AI and DL Workflows

DDN storage solutions incorporate three critical core technologies which come together in order to provide easy to manage, highly efficient access to AI applications and workflows. These technologies are: (i) a high performance parallelized filesystem storage appliance, (ii) a highly scalable, very low latency RDMA capable networking fabric, and (iii) comprehensive management, monitoring and reliability enhancing software.

With these technologies in place, the summary below highlights the acceleration benefits along with important benchmarks delivered by the DDN shared parallel architecture for widely-used AI and DL frameworks and convolutional neural networks:

TensorFlow

A commonly used open source framework for DL applications. Used for image, voice and sound recognition, TensorFlow applications depend on a large and diverse data set with rich media content. DDN systems provide the capacity needed to store and deliver massive heterogeneous data sets. DDN storage systems offer enhanced training throughput of 2x more images and 2x faster training times with TensorFlow for Inception v3.

Horovod

An open-source distributed DL framework for TensorFlow. The shared architecture of DDN systems provides a significant performance boost to saturate multiple GPUs engaged through distributed computing. This furthers the benefits of the Horovod DL framework for TensorFlow applications.

TensorRT

A high-performance DL inference optimizer from NVIDIA. DDN storage platforms enable TensorRT to deliver maximum improvements to neural networks using distributed computing at large scale.

Torch and PyTorch

Torch is an open-source scientific computing framework which provides a wide range of algorithms for DL that are optimized for parallel execution on GPUs. Pytorch is a python package based on Torch designed for rapid neural network development though an intuitive interface. DDN systems enhance and accelerate Torch and PyTorch frameworks. The DDN A³I solution’s shared filesystem architecture provides accelerated distributed computing on multiple systems, with no data management overhead. Concurrent access to multiple data sets from all computing systems enables workflow flexibility, allowing complete freedom for data scientists to design
and engage neural network training activities. DDN storage systems offer 3x faster training time for ResNet-152, 3x faster training time for VGG16, 3x faster training time for AlexNet, and 3x faster training time for ResNet-50 with PyTorch.

Caffe and Caffe2

Caffe, the convolutional architecture for fast feature embedding, is an opensource DL framework that’s optimized for image classification and image segmentation. DDN storage systems offer training throughput with 2.4x more images and 2x faster training time for Caffe GoogLeNet. Caffe2 is a flexible DL framework that extends the capabilities of the original Caffe and addresses its architectural bottlenecks. DDN storage systems offer 3x faster training time for AlexNet, and 2x faster training time for Inception v3 with Caffe2.

CNTK

The Microsoft Cognitive Toolkit is a DL framework highly optimized for speed, scale and accuracy. DDN storage systems offer enhanced training throughput of 3x more images and 2.5x faster training times with CNTK for ResNet-50.

MXNet

An open-source DL framework for training and deploying state of the art models, including deep neural networks, convolutional neural networks, and long short-term memory networks (LSTM). DDN storage systems offer enhanced training throughput of 2x more images for CIFAR-10, and 2.2x more images for Inception v3 with MXNet.

Theano

A python library for rapid efficient definition, optimization and evaluation of mathematical expressions using multi-dimensional arrays. DDN storage systems offer enhanced training throughput of 3.5x more images and 3x faster training times with Theano for AlexNet.

This is the third in a series appearing over the next few weeks where we will explore these topics surrounding data platforms for AI & deep learning:

Introduction, How Optimized Storage Solves AI Challenges
A³I – Accelerated, Any-scale AI Solutions
Frameworks for AI and DL Workflows
Partners Important Role for Leading-Edge Case Studies, Summary

If you prefer, the complete insideBIGDATA Guide to Optimized Storage for AI and Deep Learning Workloads is available for download from the insideBIGDATA White Paper Library, courtesy of DDN.

insideBIGDATA Guide to Optimized Storage for AI and Deep Learning Workloads – Part 3

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Featured RSS Feed

More News from insideHPC