Using Deep Learning for Video Event Detection on a Compute Budget

Summit Track: 
Technical Insights I

Convolutional neural networks have made tremendous strides in object detection and recognition in recent years. However, extending the CNN approach to understanding of video or volumetric data poses tough challenges, including trade-offs between representation quality and computational complexity, which is of particular concern on embedded platforms with tight computational budgets. This presentation explores the use of CNNs for video understanding. We review the evolution of deep representation learning methods involving spatio-temporal fusion from C3D to Conv-LSTMs for vision-based human activity detection. We propose a decoupled alternative to this fusion, proposing an approach that combines a low-complexity predictive temporal segment proposal model and a fine-grained (perhaps high-complexity) inference model. We find that this hybrid approach, in addition to reducing computational load with minimal loss of accuracy, enables effective solutions to these high complexity inference tasks.

Speaker(s):

Praveen Nayak

Tech Lead, Pathpartner

Praveen Nayak is currently a Tech Lead at the Automotive IP Team at Pathpartner Technology, developing driver monitoring solutions. His areas of interest include complexity-aware design of learning algorithms and structured prediction with potential applications in computer vision and computational modeling. Praveen received his B.E in Electronics and Communication Engineering from M.S Ramaiah Institute of Technology, Bangalore, India in 2014 and his M.S in Electrical Engineering from the University of California, Santa Barbara in 2016.

See you at the Summit! May 20-23 in Santa Clara, California!
Register today and reserve your hotel room!