News
All news »
Question?

Your Name:

Your Email:

Title:

Message:

verification image, type it in the box
Put in the spam-code here:
+7 (495) 790 8117
GPU benchmark tool for image processing PDF Print E-mail

As you might have heard, modern graphics cards, normally used for gaming purposes now can be applied to accelerate general purpose computations. In this project we tried to compare several optimized Intel OpenCV library functions with their GPU analogs, written using OpenGL and GLSL shader language. Intel OpenCV uses MMX/SSE instructions for fast image data processing on CPU, while GPU (Graphical Processing Unit) uses multiple-core processors (up to hundreds in modern generations, such as nVIDIA 8800GTX, for example). The approach of using GPU to general computations received the title “GPGPU” and forms a corresponding society, see http://www.gpgpu.org/ ...

As you might have heard, modern graphics cards, normally used for gaming purposes now can be applied to accelerate general purpose computations. In this project we tried to compare several optimized Intel OpenCV library functions with their GPU analogs, written using OpenGL and GLSL shader language. Intel OpenCV uses MMX/SSE instructions for fast image data processing on CPU, while GPU (Graphical Processing Unit) uses multiple-core processors (up to hundreds in modern generations, such as nVIDIA 8800GTX, for example). The approach of using GPU to general computations received the title “GPGPU” and forms a corresponding society, see http://www.gpgpu.org/.

Sample benchmark application can be downloaded here. To install, simply unzip the archive into a folder and run install.bat. To start the demo run gpudemo.exe . We run antivirus, so it should be safe. To install updates, unzip their contents in the same folder and allow file(s) overwrite.

Download GPUBenchmarkTool.zip, v 1.2.3, 2.95 Mb - includes all updates
Download update (15-OCT-2007), 0.05 Mb

The following OpenCV functions are considered in this app:
- cvAbsDiff (absolute difference image)
- cvResize (downsampling)
- cvCvtColor (convert to greyscale from RGB)
- cvThreshold (apply given threshold to receive B/W image)
- cvSmooth (median filtering with given radius)
- cvUpdateMotionHistorty (update space-temporal motion history image with given time depth)

Note, that on GPU, it does not make sense to make all function calls separate, so the computations sequence is divided into three distinct steps, for each step one shader does the work. Normally programming on CPU one would also put a few operations into the same loop. However, since OpenCV functions are highly optimized, it is faster to call several of them in sequence, rather than to code a single loop yourself.


click here to see what to look at

Timing. The demo shows timing in milliseconds for each OpenCV function call, grouping by the steps that correspond to a single GLSL shader that does the same computations on GPU. Corresponding GPU timing is on the right side. All numbers are averaged through time, so wait a few seconds after you change some parameter in the dialog until the numbers “converge”. Your computer background activity subject these numbers to slight fluctuations. It’s a feature to see these numbers “alive”, if they are frozen it means they are not updating for some reason. Last two rows of the timing table shows total time in milliseconds for the CPU and GPU variants and framerate. The GPU column includes data loading and readback time, that is, unfortunately is an obvious bottleneck. Therefore it make sense to perform as much computations on GPU as possible without transferring data to CPU. Modern generation graphical cards do better job in terms of data transfer speeds, so hopefully, this issue will not be that critical in the near term. Framerate row may look dissapointing, but this is due to the overhead which is done on CPU (data capturing or simulation, resolution simulation, displaying large images in windows, OS background activity, antivirus, etc). Therefore the most interesting benchmark data is in the table step-rows where you can see the much less time the GPU spends for the same step than even optimized code on CPU. Try to pause your antivirus software as well to get best results. Sample Results for nVIDIA 7800 GTX can be foundhere...

Quick program reference
- System check dialog. Appears after the program start and ensures your GPU has essential features supported in order to perform general purpose computations. For now we have succesfully tested that on nVIDIA GeForce 6600, 7600 GT, 7800 GTX; ATI x1300, x1650 PRO, ATI Mobility FireGL V5600.
- Input Window. Shows the input video stream. If you don't have a webcam, it will be emulated.
- CPU / GPU switch. Chooses whether computations are pefromed on CPU or on GPU. Note that timing is only updated for the current “branch”, if you change parameters, don’t forget to switch to another branch before you compare any numbers.
- Video Source. Lets you select a videocamera (DirectShow-compatible one) as the source. If you don’t have one, the video sequence will be emulated.
- Video Format Settings button. Lets you choose the resoulution and framerate of your camera (driver window will appear).
- Video Image Settings button. Lets you change the capture parameters, such as exposure (one of the major parameter to our experience).
- Resolution multiplier. The demo can simulate operating on larger data arrays than your camera is capable of delivering. However, there is a limit on maximum possible size. Too large image size may cause overrunning your GPU memory size and then output window for GPU branch will become black.

- Output Window. Shows the result of computations (extracted motion pattern, called “silhouette”).
- Downsample factor. Both CPU and GPU expend some time for image downsampling. Compare who’s doing this faster.
- Threshold. Rising this number make the system less sensitive to the noise and to the useful signal. The speed of computations does not depend on this param, but playing with it giving some sense on “what is going on there” in computation process.
- Median Filter Radius. OpenCV does optimized median filtering algorithm with the O(1) complexity. The GPU version is much simpler with O(n*n) complexity, yet it achives better results for the wide range of reasonable settings.
- Motion History Depth. Another parameter that does not influence speed. Useful for playing-with-the-picture purposes only.
- STEP checkboxes. Give you ability to exclude or include 2nd and 3rd steps of computations, making less job performed on GPU increases the relative time spent for GPU data uploading and readback, so it worth experimentating and comparing total times for CPU and GPU under different steps configurations.
- Input/Output image unhide buttons. Open the corresponding windows if you accidently closed them.

Enjoy!

We would be interested in your comments and benchmark reports you did with this tool. If you have any, write to This e-mail address is being protected from spam bots, you need JavaScript enabled to view it . If you’re going to share this tool or publish papers using the results received with this tool, please don't forget to mention our website.

Thanks!

 
 
Our products
Sample Image PWNBinaryTrainer 1.0

PWNBinaryTrainer 1.0 is a simple GPGPU console application designed to bring alternative to training of the neural networks in expensive commercial packages. Training process takes now minutes instead of hours and the accuracy is similar or better ...

Read more...

Sample Image PWNLIB 1.1

Software C++ library for neural network processing and training. GPGPU via NVIDIA CUDA and Intel-MKL performance boosting is available now.

Read more...

Our services

Custom coding of math algorithms

Even though modern tools exist that greatly simplify math algorithms coding (e.g. MATLAB), there are cases when it make sense to outsource this work to us:
Read more...

Data visualization

We offer 1D,2D,3D data visualization (real time if necessary) by utilizing the latest dimension reduction techniques, linear and non linear PCA-related technologies, volume visualization, custom ray-tracing and radiosity algorithms. We offer fast OpenGL or GPGPU / CUDA implementations.
Read more...

Data/Image processing

There are common computer vision libraries such as Intel OpenCV that one can take as a starting point for their algorithms implementation.
Read more...

Recognition and Clustering

Typical task for neural networks is recognition and clustering. Send us your task description and we will provide you with a proposal on time and budget for your particular system implementation.
Read more...

Embedded programming of intellectual algorithms

Embedded intellectual algorithms must be very simple, fast, and reliable. Pawlin Technologies currently works on getting the maximum from the limited hardware capabilities in term of pattern recognition, computer vision.
Read more...

Neural Network Programming

We reveal new capabilities of using data and information for the goals of your business. Now many tasks historically done by humans can be left to intellectual software and hardware by using some of the many existing AI technologies customized for your particial needs.
Read more...
All services »
Poll
Help us to be more informative, please, classify yourself: