Bookmark and Share

ARM Guide to OpenCL Optimizing Convolution: The Test Method

Register or sign in to access the Embedded Vision Academy's free technical training content.

The training materials provided by the Embedded Vision Academy are offered free of charge to everyone. All we ask in return is that you register, and tell us a little about yourself so that we can understand a bit about our audience. As detailed in our Privacy Policy, we will not share your registration information, nor contact you, except with your consent.

Registration is free and takes less than one minute. Click here to register, and get full access to the Embedded Vision Academy's unique technical training content.

If you've already registered, click here to sign in.

See a sample of this page's content below:


This chapter describes how to re-use the code from this sample in your application, the limitations of the test method that produces the results in this document, and a method of analyzing the results of each optimization step.

Before re-using the sample kernels

This section describes some steps you must take to ensure that any optimizations implemented are as effective as possible and produce the correct result.

Integer values for the convolution matrix

Instead of using floating point values in a convolution matrix, it is more memory efficient to use integer values. This is combined with a scale factor to ensure the resulting image has the same brightness as the original.


Figure 9-1: Float equation

The float equation and the convolution expression become the integer equation. RH is the set of discrete points covered by the function H.


Figure 9-2: Integer equation

The following figure shows that the scale is equal to the sum of all elements of the convolution matrix H.


Figure 9-3: Scale equation

Note: If the sum of all elements of convolution matrix is one, then scale is set to one to avoid a divide by zero.

Correct data types for the convolution matrix

You must use the correct data types for the convolution matrix to avoid performance loss.

Before implementing the convolution sample, check what range it requires and compare the result to the limits for different data types. For example, if the result is lower than 32 768, you do not require a 32-bit int to represent the convolution matrix. To avoid memory overflows and incorrect outputs, the range must not exceed 32 768.

Most common filters such as Sobel, Gaussian, and Laplacian use very small coefficients. This means you can use 8-bit signed integers to represent the convolution matrix. If you are performing a 3 x 3 convolution, the data type selection has no effect on the result, because 16-bit and 8-bit matrices are loaded in a single instruction.

To further reduce the risk of an overflow:

  • Use an int8 variable to accumulate the partial convolutions. However, this method introduces more arithmetic instructions.
  • Use short variables and apply the scale to the result of each partial convolution. This method introduces some inaccuracies and extra...