Bookmark and Share

Can FPGAs Challenge GPUs as a Platform for Deep Learning?

This market research report was originally published at Tractica's website. It is reprinted here with the permission of Tractica.

Over the past several years, graphics processing units (GPUs) have become the de facto standard for implementing deep learning algorithms in computer vision and other applications. GPUs offer a large number of processing elements, a stable and expanding ecosystem, support for standards such as OpenCL, and a wide range of intellectual property to develop applications rapidly.

However, as the industry matures, field programmable gate arrays (FPGAs) are now starting to emerge as credible competition to GPUs for implementing deep learning algorithms. A recently published paper from Microsoft Research garnered quite a bit of attention in the industry when it contended that using FPGAs could be as much as 10 times more power efficient compared to GPUs. Although the performance of FPGAs was much lower than GPUs, the FPGA used for comparison was a mid-range device, which left the door open for further lowering the power on FPGAs. The fact that power consumption was much lower could have significant implications on many applications where high performance may not be the top priority.

FPGAs, in essence, make hardware “soft.” It is possible to reconfigure FPGAs on the fly, making them as adaptable as GPU software. FPGA manufacturers take pride in the fact that they have an advantage over GPUs in terms of performance per watt, and have begun to make further investments in deep learning as a target market. Intel’s Altera group, for example, has published a white paper that talks about using OpenCL for FPGA development. Meanwhile, Xilinx recently made an investment in a startup called Teradeep that accelerates deep learning algorithms on a Xilinx FPGA platform.

While FPGAs may look attractive based on the power savings, there are few areas where they need to gain ground on GPUs before they can become a credible alternative. The development flow, for instance, is radically different for FPGAs compared to GPUs. The GPU runs software, while the FPGA runs hardware. One can easily take a software program and run it on a GPU. For FPGAs, however, one has to convert the software algorithm into hardware blocks before it can be mapped onto the FPGA. Given the complex nature of deep learning algorithms, mapping them to hardware becomes an extremely complicated problem and there are many possible solutions that are currently only being studied in academia.  A recent UCLA paper states that optimal implementation of convolutional neural networks (CNNs) on an FPGA platform is a function of computational resources and represents a tradeoff in terms of memory bandwidth. If the hardware is not carefully designed, its computing throughput may not match the memory bandwidth, leading to a significant degradation of performance. Given these issues, converting algorithms to hardware remains a challenge for FPGA implementation.

FPGAs also are not good at training for deep learning algorithms. FPGAs have a limited number of equivalent gates in each product family, capping the number of nodes one can use for training. Training is also much more computationally intensive, requiring constant adjustment of parameters and making it harder to implement the algorithm architecturally in hardware. In practical terms, this may not be as much of a concern for FPGA vendors, as deployed CNNs are used in the final device and that is where the volume comes from.

The overall ecosystem for deep learning application development for FPGAs is also in its early stages and there are not currently enough vendors who offer solutions to accelerate time to market for deep learning applications.

Training engineers on both deep learning and hardware is another issue that the FPGA vendors will need to address. As it is, there is a shortage of engineers who understand deep learning. The number of engineers who have an understanding of deep learning in addition to the hardware development process is even smaller. A good digital signal processor (DSP) hardware engineer may be trained to learn about computer vision algorithms, but training them to become expert at CNNs would be a greater challenge.

Given this, FPGAs could be a platform of choice for well-understood deep learning problems.  In the long run, if and when FPGA vendors overcome some of these challenges, they will become more formidable competitors to GPU manufacturers in the deep learning market. But who knows, by then the GPU manufacturers may have invented something new to keep them ahead.

By Anand Joshi
Principal Analyst
Tractica

nagesh_gupta
Offline
Last seen: 1 week 4 days ago
Level 1: Prestidigitator
Joined: 2013-08-16
Points: 1

Hi Anand,

At Auviz Systems (www.auvizsystems.com), we develop libraries of algorithms optimized for Xilinx FPGAs. AuvizDNN is our library for implementing CNNs on the FPGA. This library combined with the OpenCL infrastructure from Xilinx enables customers to make use of the FPGA without having to learn the hardware, or the intricacy of programming the FPGAs. Auviz has demonstrated several networks for image classification, and also Semantic Segmentation using Xilinx FPGAs. The most recent demonstration was at the Embedded Vision Summit, and was for "Free Space Detection Using FPGAs". 

Thanks,

Nagesh