The Challenges of Pruning AI Models on the Edge

Print Friendly, PDF & Email

In this special guest feature, Nick Romano, CEO, Deeplite, discusses how struggling to fit advanced models in edge devices with limited resources forces deep learning teams to start “pruning” models – essentially trimming parts of it that are deemed not critical. Deeplite is a startup company whose mission is to enable AI for everyday life. Deeplite is headquartered in Montreal and has an office in Toronto. Nick is a serial entrepreneur & accomplished CEO delivering successful outcomes for over 20 years. Recently, he co-founded & scaled an enterprise SaaS platform with multi-million-dollar recurring revenues & over 100 employees. He has been honored by McMaster University Engineering as a Top 150 Alumni.

It’s no secret that the incredible power of AI technology is taking off around the globe. In fact, AI models are impacting nearly every facet of our personal and professional lives – from online shopping and facial recognition at airports, to improving the real-time detection capabilities of cybersecurity defenses and enabling autonomous vehicles. All this and more are being driven forward through the use of AI models.

And while these capabilities are far-reaching and essential to the continued innovation of society, they don’t come without challenges. The deep neural networks (DNNs) that enable these tasks are built on large AI models, which require vast amounts of data. With plenty of training and tuning, these models do become very accurate, but they also need a significant amount of storage, memory and compute resources. This requirement can make AI projects too complex and costly for many organizations.

Pruning AI Models

If you’re running these large AI models in a data center, you probably don’t have to worry about pruning them, but what about a small camera on a manufacturing floor? Or on a drone analyzing farming and agriculture? These small devices cannot support the massive amounts of data and computing power required to make the models work efficiently and accurately. So, how do deep learning teams try to make their models more efficient? The common approach is to  “prune” them – which means essentially trimming out parts of it that are deemed not as critical. Pruning can reduce the size of the AI model but it comes with a price. Pruning can significantly reduce the accuracy of the model, meaning it doesn’t perform as well as the original.   

In order to try to minimize the impact to accuracy, that same deep learning team will need to try to manually determine which parts of the model should be pruned and which others shouldn’t. This trial and error takes a lot of time, effort and cost, and generally only marginally improves the results. For example, if we have a mission critical application like person detection for an autonomous vehicle, utilizing pruning techniques to optimize the model results in less accuracy. Less accuracy in person detection for this use case is a non-starter. 

And that’s exactly what we don’t want to be doing with AI. The power of AI should be unleashed at the edge with full accuracy and the ability to run on devices with limited resources. That means optimizing the technology within the models.

The Power of Optimization

Automated optimization engines are making this possible. With these tools, AI models don’t have to sacrifice accuracy by pruning out large amounts of data. Instead, automated optimization engines only focus on the way the model was constructed in the first place and transform the model into an optimized version of its former self. Model developers can focus on training their model to the best accuracy possible, and rely on automated optimization engines to quickly and reliably produce a compact version of their model that doesn’t sacrifice functional performance.

Further, by using training data along with the optimization process, these tools ensure that the accuracy in the new model is not only preserved, but can be made more robust on totally unseen (real-world) data. Models that are smaller, faster and less power consumptive – but just as accurate – can then be used to solve complex problems directly on devices like smart phones and automobiles, as well as tools that are making enterprises run – like video cameras at manufacturing plants.

Making AI faster and smaller while maintaining accuracy isn’t the only thing optimization engines can do. It also reduces costs on cloud and hardware back-ends, which can help businesses scale their AI services. Additionally, it can lead to a faster time-to-market by more easily finding robust, efficient designs. In fact, computer vision applications such as image classification, object detection and segmentation can now be operational within days or even hours – rather than several months.

These AI optimization tools can be used on any devices including smart phones and cameras, vehicles, drones, connected devices for IoT and so much more. Optimization can be applied to different application domains such as vision, audio and natural language processing. In some cases, optimized AI models can achieve the same accuracy as their initial counterparts, with as little as 100x memory and 10x less power!

AI is becoming a part of everyday life, and with optimization engines, pruning models can become a thing of the past, which will help AI take off faster than ever before.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Speak Your Mind

*