Bookmark and Share

ARM Guide to OpenCL Optimizing Pyramid: Optimization Process

Register or sign in to access the Embedded Vision Academy's free technical training content.

The training materials provided by the Embedded Vision Academy are offered free of charge to everyone. All we ask in return is that you register, and tell us a little about yourself so that we can understand a bit about our audience. As detailed in our Privacy Policy, we will not share your registration information, nor contact you, except with your consent.

Registration is free and takes less than one minute. Click here to register, and get full access to the Embedded Vision Academy's unique technical training content.

If you've already registered, click here to sign in.

See a sample of this page's content below:


This chapter describes an example optimization process for creating image pyramids.

Convolution matrix separability

Convolution with an nxn convolution matrix requires n2 multiplications per pixel and n2-1 additions per pixel. However, a property of linear algebra called separability enables this task to be completed more efficiently. When this property is used, you can achieve significant optimization improvements because it requires n multiplications and n-1 additions.

If a matrix is separable, then it can be represented as the outer product of two vectors with dimension n.

The following figure shows the separated parts of the Gaussian 5x5 matrix.


Figure 4-1 Separating the Gaussian matrix

A matrix is separable if its rank is one. The rank of a matrix is the maximum number of linearly independent columns of the matrix or the maximum number of linearly independent rows of the matrix.

A row or column is linearly independent if it cannot be expressed as a multiple of another row or column, and then added to an offset. Therefore the following equation is false for linearly independent rows or columns, c0 = c1 x alpha + beta, where c0 and c1 are two different rows or columns.

The Gaussian 5x5 matrix has a rank of one, therefore it is separable.

To use a separable convolution matrix efficiently, perform the total convolution by sequentially applying two convolutions using the separate parts. Apply one of the convolutions along the x direction of the image, and store the intermediate results in a temporary buffer. Then apply the second convolution along the y direction of the temporary buffer. The result from this method is identical to the result that the full unseparated matrix provides but requires fewer operations.

The following code shows how this task can be achieved.

// Pseudo code
// Convolution 1D along Y direction
for(int y = 0; y < height; y++)
{
     for(int x = 0; x < width; x++)
     {
          sum = 0.0;

          for(int i = -2; i <= 2; i++)
          {
               // Get value from SOURCE image
               pixel = get_pixel(src, x, y + i);
               sum = sum + coeffs[i + 2]*pixel;...