The Future of AI is the Edge; Can GPUs Take the Heat?

Print Friendly, PDF & Email

In this special guest feature, Ludovic Larzul, Founder and CEO, Mipsology, describes how in the future, AI will be everywhere. And though some computation will continue to take place in data centers, more will happen at the edge. Ludovic Larzul has more than 25 years of experience driving product development, and has authored 16 technical patents. He previously co-founded and served as VP of engineering for Emulation and Verification Engineering (EVE), a startup that designed specialized ASIC validating supercomputers. Ludovic led the company to a 2012 acquisition by Synopsys, where he served as R&D group director before founding Mipsology in 2018. He holds an MS in Microelectronics from France’s Universite de Nantes, and an MS in Computer Science from Polytech Nantes.

AI is playing a larger role in our lives each day, and in many ways is becoming the biggest man-machine collaboration in history. AI that runs autonomously within edge devices has limitless potential, due to its capabilities of empowering real-world devices with low latency and secure, adaptive decision-making.

But despite the constant desire and need to be at the edge, the majority of AI processing remains in cloud-based or enterprise data centers that require a wide array of energy and compute resources. For edge AI, data is stored and processed locally on a device. This provides numerous benefits for increased responsiveness, pliability, and improved user experience, but it also comes with its challenges and restraints. These challenges include issues of scalability, stability, data management and movement, power efficiency, accuracy and more.

The Edge AI Challenge

Clearly the data center is the optimal location for deep learning neural network (DLNN) model training and research. Training requires a lot of computations, so is best suited for a climate-controlled data center that is protected from the elements and has the capacity to store big iron mainframes. Not only does the data center provide the multitude of computers necessary; it can also store the substantial amount of curated input data needed to match the complexity of the AI.

But the most useful AI can only happen in the field. This is where edge computing comes into play. Computing on the edge offers opportunities in all markets spanning vital areas – cars, surgery, security, retail, robotics, assembly lines and more.

Unfortunately, moving the neural network (NN) to the edge is not as simple as you might think. Deploying NN models presents as many problems as training does. How much computing is actually available? Does the computing system have external cooling or a power constraint? What about the input data – do the real world and curated data match? Is latency an issue due to any constraints on the response time? Simply put, it’s an illusion to believe that the main challenges will be passed after a NN is trained in the tidy data center environment.

One Size Does Not Fit All

In addition, different edge AI applications can have radically different requirements.

Take autonomous vehicles, for example. They require extremely accurate and rapid processing of a massive number of computations in a split second. If the AI isn’t quick enough to recognize a pedestrian entering the intersection, people could die, and the results would be disastrous. Pressure also comes from other places, such as the price. To make autonomous vehicles more consumer-friendly and affordable, manufacturers need the best AI for the lowest possible price.

On the opposite spectrum are the smart devices such as those that make up the Internet of Things (IoT). For some reason, consumers want everything to be smart now, from doorbells to cameras and even litter boxes! The most crucial element enabling this kind of edge AI is battery life. Consumers certainly don’t want to charge all their smart devices every day when the ideas were born out of the promise of ease of use. Quality and accuracy generally are of less importance. After all, it’s unlikely that anyone has died because Siri didn’t understand their song request.

Finally, there’s everything in between, whether it be security devices, smart cities, smart hospitals, etc. While they may not require the lightning accuracy of a self-driving car, they require far better performance than a litter box.

Edge AI applications don’t all require the same level of accuracy, but they do all share a similar trait – they’re not located inside a data center. Most if not all of them need to have a live reaction and can’t afford the time it takes to send data back to a data center. In fact, there are numerous reasons why edge applications can’t rely solely on data center processing. Some, such as transportation, retail, access control and security applications, require continuity of service. Others, including smart cities, robots and autonomous vehicles, require a rapid reaction time. Applications using large video streams need the ability to transmit massive amounts of data. Healthcare and security devices require data to be extremely secure and protected. And others just can’t afford the cost of cloud processing or a 24/7 data center!

Above all, these devices and applications must be optimized to react immediately — especially, as mentioned, for things like autonomous vehicles. It’s impossible for input to be processed through a data center in time to prevent an accident from occurring.

GPUs Fail in the Field

By now the question you’re probably asking yourself is, “How do we solve this problem?” Currently, the industry’s most popular offering for powering AI is the graphics processing unit (GPU).  Data scientists almost always use GPUs to train the NN. However, GPUs can encounter a multitude of issues once the application is ready for use in the real world.

The key issue is that GPUs are only stable below 80 Celsius degrees, and they rapidly exceed that temperature when exposed to the summer heat on street corners or in public markets. This may seem like an unreachable figure, but GPUs idle around 40 degrees Celsius, and operate at around 60-75 degrees during a heavy workload in an air-conditioned environment.  Moving GPUs to the real world exposes them to much warmer environments than that of the cooled data center and can lead to operating temperatures that are much higher than the 80-degree mark.

These high temperatures significantly reduce the GPU’s life span, as do excessive vibration from motion, inclement weather, and the wide temperature swings that occur throughout the course of a year (or even within a day on our warming planet). Edge AI applications are highly sophisticated and can’t afford replacing a chip every two years.

It’s also very difficult to provide adequate levels of cooling at the edge — something GPUs need due to their high power usage. GPUs heat up quickly, and constant operation at high temperatures leads to less computing power per second — with the most excessive instances causing them to drop to half of their expected performance.

As the temperature increases, GPUs must decrease their frequency to prevent further overheating and hardware damages. As a result, it takes much more time for the NN to make a decision. This isn’t a major issue for a device such as a smart refrigerator, but for an autonomous vehicle, taking more than double the response time can be catastrophic.

What’s even worse is that GPUs in smart city applications have no good way to cool down. A self-driving car has fans or air conditioning, but for a device located outdoors, such as a camera at an intersection, the fan is subject to environmental elements like dust, wind, rain and even vandals!

Not only does this increase the risk of failure, but it boosts the total cost of ownership as well. If a GPU fails in the data center, it’s easy to replace, but at the edge, that concept goes out the window. Replacing millions of units that cost thousands of dollars is highly impractical economically. Fixing or replacing these broken systems also requires on-site technicians — another barrier that isn’t as streamlined as the process needs to be.

Advancing Edge AI with FPGAs

If your application needs to be deployed at the edge, consider using a field-programmable gate array (FPGA). In the past, FPGAs had a steep learning curve, with specific expertise required for their programming. However, this need has been removed by new software tools that enable the same abstraction levels as GPUs. GPUs were a success for NNs for this very reason; software libraries like CuDNN/Cuda hide the programming complexity of GPUs. With the use of these software tools, FPGAs can be enabled via a single command for NN inference.

FPGAs are massive parallel chips that enable the computation of heavy loads like NNs; the largest FPGAs can perform an astonishing 25,000 operations per clock tick, with approximately a billion clock ticks per second. They can also be programmed at the hardware level, so they are better equipped to meet the requirements of a specific application, compared to the graphic-oriented approach of the GPU. In addition, they do not require tailoring for every minute variation of the NN, thanks to overlays that keep them generic while specializing for NNs.

Another critical aspect of FPGAs is that they use less power than GPUs at an equivalent computing load. This reduces the need for cooling and in turn makes them optimal for use at the edge (though they can still be used in the data center, of course). By using less power, FPGAs operate at a lower temperature than GPUs and have a significantly longer lifespan. Even after constant 24/7, 365 usage, some FPGA-based systems are still functional after a decade. They’re so reliable that NASA has even used them on Mars — so it’s safe to say they can withstand the elements of Earth!

The following illustration shows the temperature of a Xilinx Alveo U50LV card running Zebra to compute image classification based on a classical ResNet50. No HVAC was used in this experiment, so the temperature reached 70 degrees Celsius (158 degrees Fahrenheit). Nevertheless, frequency remained stable. FPGAs operate with more parallel computing, so the fact that the frequency is lower than a GPU doesn’t hurt the computing performance. The figure shows that FPGAs are stable when conducting dense operations like image classification at very high speeds. The problematic case of the GPU in comparison is that it remains unstable throughout the entire computation and may make life at a street crossing much more dangerous than anyone wants. The bottom line is that FPGA-based systems can be used more successfully and efficiently at the edge than GPU-based systems.

This graph demonstrates the temperature (red) and clock frequency (blue) of Xilinx Alveo U50LV when computing ResNet50 with Zebra 2021.02 for two hours. The FPGA clock is set to 550MHz. The temperature at rest is 45°C, and a stable 70°C is reached after 10 minutes. Once two hours have passed and the computing is complete, the temperature drops immediately and is back to 45°C after 10 minutes. (source Mipsology)

Not only are FPGAs more efficient than GPUs at the edge, but they are also available in so many sizes that they can fit anywhere. Whether it’s a small-sized home application, mid-sized smart city application like a camera, or a large application with significant requirements like an autonomous vehicle, FPGAs fit. They compute NNs more efficiently and their size flexibility enables them to accurately meet all computing needs.

In the future, AI will be everywhere. And though some computation will continue to take place in data centers, more will happen at the edge. There is just no other way for smart consumer devices, smart city applications and autonomous cars to work. As such, application developers must ensure that their designs work as well in the real world as they do in the climate-controlled lab.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Speak Your Mind

*