Getting More from Your Datasets: Data Augmentation, Annotation and Generative Techniques

Tuesday, May 22, 2:50 PM - 3:50 PM
Summit Track: 
Technical Insights II
Room 203/204

Deep Learning for embedded vision requires large datasets. Indeed the more varied training data is, the more accurate the trained network. But, acquiring and accurately annotating datasets costs time and money. This talk will show how to get more from existing datasets. Firstly, state-of-art data augmentation techniques are reviewed, and a new approach, smart augmentation, is explained. CNN network-A vs. trained, learning optimal augmentation strategies for CNN network-B. Secondly, Generative Adversarial Networks (GAN) learn the structure of an existing dataset and several example use cases show how GANs can generate “new” data corresponding to the original dataset. The example of creating a very large dataset of facial training data is presented. But, building a dataset is not the whole problem—data must be annotated in a way that is meaningful for the training process. An example of training a GAN from a dataset that incorporates ‘annotations’ is given. This enables ‘pre-annotated data’ to be generated, providing an exciting way to create large datasets at significantly reduced costs.


Peter Corcoran

University Professor & Science Foundation Ireland, Lead PI, Xperi

Peter Corcoran is a Fellow of IEEE, the Founding Editor of IEEE Consumer Electronics Magazine and holds a Personal Chair in Electronic Engineering at the College of Engineering & Informatics at National University of Ireland Galway (NUIG). He is a co-founder of FotoNation, now a core business unit of Xperi Inc., headquartered in San Jose. He is lead Principle Investigator and Director of C3Imaging, an Industry/Academic research partnership between Xperi corporation and the National University of Ireland Galway, partly funded by Science Foundation Ireland (SFI). His current research interests include biometrics, deep learning, embedded computer vision and consumer electronics. He is co-author on 300+ technical publications and co-inventor on c.300 granted US patents.

See you at the Summit! May 20-23 in Santa Clara, California!
Register today and reserve your hotel room!