Advertisement

Does it matter for performance if I visualize GPGPU processes?

Started by May 14, 2017 03:25 PM
5 comments, last by Alberth 7 years, 6 months ago

I have to program some lame GPGPU. I'm learning it now. I am using DirectX. I am not going to use the compute engine. Should I avoid the use of a window? If I visualize the visual no-sense produced by GPGPU, would the GPU Adapter heat more? Would it slow my GPGPU computing in some way? Only thing I'm sure about is it's not gonna break a pixel :D (I am a hobbyist, and a beginner)

For you, as a hobbyist and a beginner, I would advice you not to worry about performance now, it's counter-productive. Learn to write good overall code first, it helps you more in becoming a good programmer that can write good clean and (eventually) fast code.

Performance matters in about 5% of your code. For the other 95%, it doesn't matter at all, as long as you don't write bad code. To make it more concrete, in a program of 10,000 lines, 5% is just 500 lines of code. For the remaining 9,500 lines, the important thing is not to mess up. Worrying about the precise speed or making it extra fast is just wasted effort.

To understand, that 95% of code takes, say, 10% of your performance, the other 90% is in those 500 lines (exaggerating a bit to express my point perhaps, but numbers are not far off). So if you write those 9,500 lines very well, and you manage to get it twice as fast. Congratulations, your 10% performance has become 5% performance, and overall program performance is down from 100 to 95%. A lot of effort in carefully writing all 9,500 lines for 5% performance gain.

If instead, you manage to improve speed of the 500 lines that matter, by 50%, you gain 45% performance, going down from 100 to 55% overall, in just 500 lines careful writing.

So it really pays to not bother about the 95% of the code, and concentrate on the 5% of the code. A major problem here is that nobody can tell you where that 5% of the relevant code is going to be hiding in those 10,00 lines. Modern processors are so complicated, we cannot understand it enough to pinpoint the problems beforehand. Instead, the general approach is to write all 10,000 lines like the 95% code. Good clean code, that is easy to read and understand, with reasonable choices.

Once you have that, you can measure a) if it is fast enough, and b) where the hot-spots in the code are. If "a)" holds, you're done, it's fast enough, it doesn't need to be any faster. This leaves you with as much as possible good clean code, which makes maintenance and future changes simpler.

If "a)" doesn't hold, you can use the results to find "b)", where in the 10,000 lines of code are the problems? Find them, and fix them. This is generally very simple in the beginning, but it gets more complicated very rapidly. Fixes also tend to make a big special case handling for speed-up, which makes that the code is harder to maintain in the future.

When done, do the next round of measuring, and either stopping, or do more optimizing.

Advertisement
Probably you don't need to worry.
You can use GPU timestamps to measure execution time for individual compute and / or graphics workloads, so you can differentiate between both easily.
(assuming you do both things serially and not async in parallel)

If your graphics are so intense that they heaten the GPU this might affect compute a little bit, but in real life it's very hard to saturate a GPU that much.

So as a beginner this is the last thing you should worry, just visualize whatever you want.
You can keep graphics optional and print compute execution time to a text file to verify - there should be little to no difference.

Thank you, Alberth!

My question was more related to hardware. I often read in books things like: "...GPU does this/another for free..." So I have the feeling that there are some free(unused) circuits in the hardware that are so well optimized so maybe there is almost no importance if GPU shows the computation in a window. Let say it has a direct wire from the memory where the second buffer writes to the screen and it nearly doesn't matter. It could only block the buffer and some "free" part of the hardware displays it. Under the hood maybe GPU implements it in some way that makes no difference for me. If once in the beginning of the program, Windows configures some state inside the hardware that on real-time rendering doesn't copy/offset/scale/etc but directly shoots on screen, i consider it to be a fast OS implementation. If swapping the last chain involve copying memory specially for displaying purposes, I would prefer to not show computation, or only show it when some key is pressed.

Anyway, thanks for the worked out answer that could be very helpful for some other users too, because it is a very important talk -> codding productivity VS run-time speed.

Thank you, JoeJ!
I will print it then. At this point as a beginner, viewing that there is happening something on screen is giving me a hint that something is actually happening.
Still, at the end, I could remove the printing too. But only when everything is working nicely.

viewing that there is happening something on screen is giving me a hint that something is actually happening.


I use a debug buffer that i pass to every shader. I upload it before actual work starts and download it afterwars.
Using this i have some communication with the GPU, since there is no debugger or logger or anything.
E.g. shader 1 does an atomic add on dbg[0] from each thread (so i can see i launched the right number of threads),
shader 2 writes some results to dbg[1-5] (so i can see they are as expected to verify the shader works properly),
shader 3 reads some index from dbg[6] and the thread working on that index writes its result to dbg[7] (so i can check a unique piece of work), and so forth...

You can print the first 500 numbers of the debug buffer to screen or file and you can dis/enable debugging using a preprocessor.

Something like that should help a lot more than e.g. displaying some memory region as a bitmap on screen.
Advertisement

My question was more related to hardware. I often read in books things like: "...GPU does this/another for free..." So I have the feeling that there are some free(unused) circuits in the hardware that are so well optimized so maybe there is almost no importance if GPU shows the computation in a window.
The GPU has a big pipe to the video display, so dumping a chunk of memory to video is likely not very costly.

I have no insight in GPU hardware works, but it seems weird if they have created dedicated circuitry for such a thing, since it takes chip-area that could have been otherwise used for more parallel processing power. As such, there is never really "for free", it likely more "we compute it anyway, making a copy isn't expensive."

However, my story mostly works for hardware use too. Programming is a very complicated activity, and it is very slow. If you can get help from anything, use it. Don't refrain from using the GPU facilities because "it may slow down execution" or "the GPU may become warmer". Hardware is very very very cheap compared to labor costs.

Until you can code things adequately for the general case (ie for the large 95% bulk), optimization is not relevant. Messing up in the common code is your main enemy as a beginner, as that has a much larger negative impact than any low-level speed optimizations you can do.

This topic is closed to new replies.

Advertisement