Performance Analysis for Optimizing Embedded Deep Learning Inference Software

Summit Track: 
Technical Insights I

Deep learning on embedded devices is currently enjoying significant success in a number of vision applications – particularly smartphones, where increasingly prevalent AI cameras are able to enhance every captured moment. However, the considerable number of deep learning network architectures proposed every year has led to real challenges for software developers who need to implement these demanding algorithms very efficiently.

In this presentation, we present a structured approach for performance analysis of deep learning software implementations. We examine the fundamentals of performance analysis for deep learning, presenting metrics and methodologies. We then show how our top-down approach can be used to detect and fix performance bottlenecks, creating efficient deep neural network software implementations. And, we illustrate typical software optimizations that can be used to make the best use of available computational resources.


Gian Marco Iodice

Staff Compute Performance Software Engineer, ARM

Gian Marco Iodice is a Staff Compute Performance Software Engineer in the Machine Learning Group at Arm. With more than 5 years of experience in the development and optimization of computer vision and machine learning on embedded devices, Gian Marco is currently driving the ML performance optimization software team for the Compute Library across Arm CPUs and GPUs. He received the MSc degree, with honours, in electronic engineering from the University of Pisa (Italy) where he specialized in SW/HW Co-design. In the last few years Gian Marco has been a frequent speaker at EVS and Arm Techcon where he presented optimization techniques and design solutions for CNNs on embedded devices.

See you at the Summit! May 20-23 in Santa Clara, California!
Register today and reserve your hotel room!