Bookmark and Share

ARM Guide to OpenCL Optimizing Pyramid: Conclusion

Register or sign in to access the Embedded Vision Academy's free technical training content.

The training materials provided by the Embedded Vision Academy are offered free of charge to everyone. All we ask in return is that you register, and tell us a little about yourself so that we can understand a bit about our audience. As detailed in our Privacy Policy, we will not share your registration information, nor contact you, except with your consent.

Registration is free and takes less than one minute. Click here to register, and get full access to the Embedded Vision Academy's unique technical training content.

If you've already registered, click here to sign in.

See a sample of this page's content below:

This chapter describes the conclusions from the example optimization process.


This example shows one way to implement and optimize the creation of a Gaussian image pyramid using OpenCL and OpenCL buffers.

Small changes in the OpenCL code can produce significant performance improvements. For example, processing the RGB color planes separately and then recombining them after reduces the number of loads, but also simplifies the handling of pixel boarders and enables some kernels to be merged.

The following techniques are useful for optimizing pyramid image generation:

  • Separate the convolution stage.
  • Use padding to avoid expensive boundary checks.
  • Split the image into its individual color planes.
  • Change the storage method to improve the vectorization of the loads.
  • Merge kernels to reduce the time spent enqueuing kernels and reduce the execution time of the most expensive kernel.