Bookmark and Share

Intermediate



This chapter describes describes some general concepts to consider when optimizing kernels.

Academy

This chapter describes describes some useful optimization methods, the logic for them, and the results they provide on the test platform.

Academy

Song Han of Stanford delivers a presentation at the March 2016 Embedded Vision Alliance Member Meeting.

Academy

This chapter describes some initial (as well as the simplest and most intuitive) implementations of convolution algorithms.

Academy

Convolution operations are important in image processing, particularly in filtering. GPU compute can improve performance significantly.

Academy

This article discusses the optimization motivation, vectorization techniques and resultant performance of the FFT on ARM Mali GPUs.

Academy

This article extends the mixed-radix FFT OpenCL implementation to two dimensions and explains optimizations for Mobile ARM Mali GPUs.

Academy

This article analyzes the three main computation blocks of the FFT mixed-radix in a step-by-step approach, in both theory and implementation

Academy

This article builds up the background for the 1D complex to complex FFT algorithm, pointing out the limits of DFT using direct computation.

Academy

Middleware libraries together with SDAccel enable software developers to program DNNs in their native C/C++ environment.

Academy