Bookmark and Share

ARM Guide to OpenCL Optimizing Canny Edge Detection: The Test Method

Register or sign in to access the Embedded Vision Academy's free technical training content.

The training materials provided by the Embedded Vision Academy are offered free of charge to everyone. All we ask in return is that you register, and tell us a little about yourself so that we can understand a bit about our audience. As detailed in our Privacy Policy, we will not share your registration information, nor contact you, except with your consent.

Registration is free and takes less than one minute. Click here to register, and get full access to the Embedded Vision Academy's unique technical training content.

If you've already registered, click here to sign in.

See a sample of this page's content below:

This chapter describes how to re-use the sample implementation and test the performance of optimizations.

How to test the optimization performance

The following performance test method produces the results that are shown in this guide:

  1. Use the difference in the CL timer values, called by CL_PROFILING_COMMAND_START and CL_PROFILING_COMMAND_END, to measure the time that the kernel on the GPU takes.
  2. Measure the execution-time ratio between the optimized implementation, and the implementation without the optimizations to evaluate the performance increase each optimization achieves. This enables you to see the benefits of each optimization as they are added.
  3. Run the kernel across various image resolutions, to see how different optimizations affect different resolutions. Depending on the use case, the performance of one resolution might be more important than the others. For example, a real-time web-cam feed requires different performance compared to taking a high-resolution photo with a camera.

The resolutions that have been tested are:

  • 1280 x 720.
  • 1920 x 1080.
  • 3840 x 2160.

To obtain the results that this guide uses, the results from 10 runs are averaged. This reduces the effects that individual runs have on the results.

After measuring the performance of the code with a new optimization, you can then add more optimizations. With each new optimization added, repeat the test steps and compare the results with the results from the code before the implementation of the new optimization.

Mali Offline Compiler

The Mali™ Offline Compiler is a command-line tool that translates OpenCL kernel source code into binary for execution on the Mali GPUs.

You can use the Offline Compiler to produce a static analysis output, that shows:

  • The number of work and uniform registers that the code uses when it runs.
  • The number of instruction words that are emitted for each pipeline.
  • The number of cycles for the shortest path for each pipeline.
  • The number of cycles for the longest path for each pipeline.
  • The source of the current bottleneck.

To start the Offline Compiler and produce the static analysis output, execute the command, mali_clcc -v on the kernel.

To get the Offline Compiler see,