Sign up for our newsletter and get the latest big data news and analysis.

Machine Learning Deployment Options: in the Cloud vs. at the Edge

In this special guest feature, Neil Cohen, Vice President at Edge Intelligence, examines the question: where should businesses develop and execute machine learning?  This article explores the pros and cons of in the cloud versus at the edge. Neil brings more than 15 years of combined marketing and product management experience to his role as VP of Product Management & Marketing for Edge Intelligence. Previously, he was VP of Global Marketing at Akamai Technologies, where he ran worldwide marketing for a $1.3 billion cybersecurity and web performance business. He was also VP of Product Marketing at Akamai where he helped the organization double revenue and repeatedly launched new products and helped grow them into businesses exceeding hundreds of millions of dollars. He has held several other senior marketing and product positions.

Similar to how companies had to determine where to deploy their cloud workloads — on-premise, in the cloud, or between a mixture of the two — they now face a decision as to where to apply machine learning algorithms. It turns out that where data science algorithms are trained and deployed can arguably have as much of an impact, if not more, than the actual algorithms themselves.

So where should businesses develop and execute machine learning?  Let’s explore the pros and cons of in the cloud versus at the edge.

Machine Learning Applied in the Cloud

Applying machine learning in the cloud is the prominent, typical method today. Services offered by large cloud platform providers enable data scientists and developers to quickly and easily build, train and deploy machine learning models. Support for deep learning frameworks allow for an open, flexible environment. It’s easy to prepare and load data for machine learning directly from cloud-based storage and data warehouse services offered by the cloud provider.

The primary limitation of this approach is the challenge of having to move the data from where it’s generated to a cloud data center so it can be used to prepare and develop machine learning models. Latency, bandwidth limitations and cost issues often make it impractical to move large volumes of data to a centralized data center.

In a world where many connected devices generate hundreds of megabytes or even terabytes of data daily, this delay presents many challenges. Since machine learning accuracy is only as good as the data itself, businesses face an unwanted compromise. They must choose “sample” data to transfer to the cloud to train and continuously refine their machine learning algorithm because it’s often unfeasible to move all their data. Given this, businesses are delayed in obtaining actionable insight from newly generated data and are restricted in their abilities to analyze a complete dataset combining the newest data and historical data. In addition, shipping data can raise privacy and geo-political concerns related to moving data across geographies.

Machine Learning Applied at the Edge

With data volumes rapidly rising and the growing need to take action in real- or near-real-time, it becomes increasingly important to consider shifting aspects of machine learning to the edge, instead of — or in combination with — the cloud.

Businesses can develop and train in the cloud and upload the machine learning algorithm into the device to execute close to where data is generated. This method preserves the ease and flexibility of developing machine learning in the cloud but performs the algorithm close to the data source.

Another option is to develop and train machine learning in the device itself, based on the data flowing through it. This approach leverages intelligent-edge software where embedded system design or operational technology-friendly GUI environments are the foundation for development environments.

While these two approaches help overcome latency concerns, they can present accuracy issues. Cloud-developed machine learning brought to the edge is based on a sampling of data. Developing highly accurate machine-learning models solely within edge devices can be challenging because they’re optimized for low cost and low power – thus store a limited amount of data for analytics.

A newer approach is to perform machine learning at the edge near the device. This incorporates edge resources that reside near – but not within – the device and overcomes the issues with having to transfer data to a centralized location. Analytics can be distributed at the device edge – by deploying compute and storage in very close proximity to devices located in places like factories, hospitals, oil rigs, banks and retail stores.  Similarly, analytics can also be deployed at the infrastructure edge, within the same last-mile network as devices, distributed to thousands of locations such as cell towers and micro data-centers.  This approach aggregates unlimited amounts of data from millions of devices, scales to petabytes, uses data of any age and federates across geographies. As a result, it brings the attributes of a big data warehouse to edge computing environments while applying analytics and machine learning on distributed data from all edges/devices simultaneously.  There is no need to sample data, and all of the data appears as though it was in a single location.  It’s also ideal when data privacy or compliance prohibits data from being moved across geographies.

The downside of this approach is that currently there are fewer tools that exist to perform machine learning on distributed, federated data. It will take time before additional environments give developers similar choices to what exists in the cloud today.

Summary

Choosing where to develop and execute machine-learning algorithms is a critical decision. The final choice will greatly impact how accurately applications can make decisions and how quickly they can respond to events.  Most likely, the cloud and edge will serve complementary roles as the use of machine learning evolves across industries with varying requirements for precision and responsiveness.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: