insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning

With AI and DL, storage is cornerstone to handling the deluge of data constantly generated in today’s hyperconnected world. It is a vehicle that captures and shares data to create business value. In this technology guide, insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning, we’ll see how current implementations for AI and DL applications can be deployed using new storage architectures and protocols specifically designed to deliver data with high-throughput, low-latency and maximum concurrency.

The target audience for the guide is enterprise thought leaders and decision makers who understand that enterprise information is being amassed like never before and that a data platform is both an enabler and accelerator for business innovation.

Data Storage for AI/DL Case Studies

In this section we’ll consider some compelling use case examples of how DDN storage systems have enabled customers to maximize the value of their data and easily and reliably accelerate time to insight using AI and DL. DDN enables thousands of customers all around the world, in a wide cast of industries, to accelerate their businesses using AI and DL. DDN A³I solutions are fully-optimized to deliver massive performance acceleration to these enterprise applications.

AUTONOMOUS VEHICLES

Autonomous vehicles engage some of the toughest workloads in AI at unprecedented scale. They require the handling, ingest and delivery of a broad range of data set types and sizes, generated from many different sources such as video cameras, lidar, radar and other sensors. Very large data sets captured over millions of miles undergo many cycles of processing, labeling, sub-sampling and categorization, before being presented to the DL applications.

Self-driving vehicles require the maximization of the number of testing scenarios to improve vehicle perception accuracy and operational autonomy. This requires a reliable data storage framework that scales to TB/sec of throughput and hundreds of PB of capacity is essential.

For this customer, a massive data set for training neural networks was developed, data from experimental vehicles and ridesharing engagements was collected, and an extensive and complex DL framework was trained, tested and refined for the autonomous driving capability. The resulting software was loaded onto experimental vehicles for evaluation in the field, and operational data from the ride fed back into the loop to further enhance the DL process.

The customer’s requirement called for the creation of a very large scale parallelized data storage system to feed an extremely large scale GPU based computing platform. The storage solution had to ingest, keep and deliver massive amounts of data rapidly and reliably, scaling linearly to extreme levels in performance and capacity. With original increments set at nearly one hundred petabytes of capacity, highest data center density and efficiency with low management and support overhead were additional must haves.

The DDN storage platform effortlessly handles the concurrent ingest of these massive data streams, organizing and structuring the underlying data sets.

Millions of GPU cores continuously access the DDN storage system, executing extensive and complex training processes, continuously refining the self-driving capabilities of the fleet of vehicles. DDN storage has enabled this customer to harness data at immense scale, successfully and reliably building an advanced AI framework that is revolutionizing the transportation industry.

LIFE SCIENCES AND HEALTHCARE

By using AI techniques such as machine learning and artificial neural networks, researchers are building systems to improve the detection, diagnosis, treatment and management of diseases. In addition, clinicians, researchers and industry players are working to co-develop and validate algorithms that can recognize patterns of disease and advance diagnostic capabilities. Corps of data scientists, developers, and fellows train and test models with the potential for commercialization. There is a focus on the pipeline of translation—from model conceptualization to clinical validation. AI platforms enabled by DDN storage greatly enhance the ability of researchers to identify and cure diseases.

DDN AI and DL in life sciences use case

A research facility selected DDN to implement a solution capable of covering all ingest, processing and management of the data sets, training and inference from the DL applications, and real-time visualization.

The storage system was required to hold a large repository of data sets for neural network training with rapid shared access to multiple GPUs that execute intense training, testing and inference. The DDN all flash system deployed reliably, handles complex data ingest while simultaneously supporting post-processing, inference, visualization, training and validation operations.

CONSUMER RETAIL

Another compelling use case involves a leader in next generation retailing technology that developed ground breaking software enabling consumers to shop without having to go through the cumbersome check out process. A series of high-definition cameras within each store are coupled with advanced computer vision and DL to identify shoppers and keep track of which items they collect in real time. Shoppers are billed automatically for the items as they leave the store. Live feeds are ingested from each store’s video cameras during opening hours, while an intensive training activity is engaged in the limited window after closing time, leveraging the day’s collected data sets.

The customer selected DDN for their requirement of an all flash component due to the limited training time window and in order to ensure saturation of the GPUs used by the DL application. DDN delivered a solution which ingests live feeds from cameras in real-time and provides built in scalability to handle the collection of additional daily data sets over time. The DDN solution combines an all flash layer, with integrated controls for automatic staging of the day’s freshly acquired data set, with a hard disk layer for longer term economical storage. GPUs get fastest and most efficient access possible to the daily capture data and achieve highest productivity.

Summary

With the help of storage solutions fully optimized for AI and DL training and inference, data scientists, data engineers as well as academic researchers are able to focus their complete attention on what really matters most—transforming valuable data assets into important insights with unparalleled velocity and accuracy.

In this technology guide, we’ve reviewed the unique storage demands for AL and DL workloads, along with the characteristics of storage solutions optimized for AL and DL. We also provided a description of products available from DDN and how they suit the requirements of storage solutions well-adapted for workflows involving AI and DL. Here are some important takeaways when considering next steps to take in choosing your storage solution:

Performance is a critical aspect of data storage for AI and DL workloads. Parallel data access is the key for keeping pace with the demands of these popular technologies.
Flexibility in the AI workflow is also vital in order to be able to deal with multiple data types, and engage multiple workflows.
Scalability enables the ability to think ahead. Your needs today may be of limited scale. You may have a small data set in 2018, but there is high likelihood that you’ll be on a path of collecting more data because you have new sensors, new connectivity such as the new 5G coming out, and higher resolution data sets. The technologies that are enabling AI like GPUs have a very fast refresh cycle—every 8 months your GPUs are quadrupling in capability. Suddenly you’re able to collect and process more information. In terms of scaling, enterprise applications are built on software and that iteration is in real- time as data scientists are able to come up with new algorithms for consumption. Benefit comes from maximum amounts of performance. This is the difference between break through innovation vs. incremental upgrade.

Time is of the essence in making strategic decisions about storage solutions for managing accelerating demands put in place by AI and DL applications. Your competitors are making the same decisions to gain strategic advantage in the marketplace. To take important next steps for learning how you can facilitate breakthrough innovation by easily leveraging the power of new turnkey AI solutions for the data center visit DDN. By simultaneously expediting deployment and delivering acceleration in time to insight, DDN’s groundbreaking approach enables you to manage the entire AI lifecycle in-place and simplify your data center. DDN can show how their storage solutions have the following advantages:

Easy to deploy AI solutions that immediately transform your AI concepts into business innovation
Possess long-term advantages that enable you to achieve high-performance AI at every stage of your growth
Show you how to realize the greatest technical and economic benefits through leveraging deep AI-expertise

This is the fifth and final in a series of articles appearing over the last few weeks where we explored these topics surrounding data platforms for AI & deep learning:

Introduction, Data is the New Source Code
Unique Storage Demands for AI and DL Workloads
Characteristics of Storage Solutions Optimized for AI & DL
Accelerated, Any-scale AI Solutions
Data Storage for AI/DL Case Studies, Summary

If you prefer, the complete insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning is available for download from the insideBIGDATA White Paper Library, courtesy of DDN.

insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning – Part 5

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Featured RSS Feed

More News from insideHPC