SuperVGA said:
But “obviously” the issue can be to make the compute stuff run as stupidly as possible, making for fewer potential issues on the GPU.
You are right. I did not do it that way, partially by the need for speed, but honestly, partly because of my mental problem with premature optimization. I guess it would have worked without the optimizations i did not because of the need but because of… mental. My TDR time limit is at 30 seconds. For example, without the optimizations, one of the shader would have been 50 times slower.
Usually i first program the shader in the dumb way, and then the monitor turns black, amd says - timeout error, I am like - WTF, not even 30 secs enough? And start optimizing. (Add to it some mental…) Similar for making it fit inside the VRAM.
I do it in a very dumb way inside the debug code for example.
I have many folders with big data.
But i take only one of them and work on it.
I make the GPU parse small chunk of the data in that folder in order to fit inside the VRAM. I divide the work to N parts. Then i call the dispatch() from inside a loop from the C++. Every loop parses a small amount of data. Otherwise it does not fit inside the VRAM. On that small amount, i can make some cheap tests that covers all the dimensions of the dispatch. Some tests are too heavy and they run only on one dimension/tick. Like that -
#ifdef DEBUG
….not optimized at all debugging code goes here….
a lighter test here(sends few debug data to the resource)
if (threadID.y > 8 && threadID.y << 16) {
a heavier test here(sends more debug data to the resource)
}
if (threadID.z == 12) {
sending lot of debug data to the resource
}
#endif
It is all very tight. Once in a month, i run it with #define DEBUG outcommented, and it takes lot of time to compute. But not a problem for me. This will be the final product and most probably the client will use it on his supercomputer, not me. I need it fast enough for my developing process to not stop.
SuperVGA said:
Alright, I have it the other way around wrt. language, but it's a preferential thing - It also depends on what I'm making. If you don't run a lot of stuff on the CPU anyways, I guess it hardly even matters when it comes to performance.
CPU waits for the GPU to be ready with the first passes. Staying idle. Much later, much much later, the CPU will work on the data produced by the GPU from folder.00001 while the GPU is working on folder.00002.
SuperVGA said:
If that style was used from the birth of the project, I'm quite sure that would in deed have helped with the productivity.
Not quite from the start. For my first tests i was taking the image of the debug resource, putting it in Krita, then making changes, running the shaders again, and putting it again in Krita, then hiding/unhiding until manually seeing an error. Then i made HTML/JS compare the two resources. Then i started making more and more effort. Still i use HTML/JS for visualization of the data. When i want to see some graph, not only a message “all tests passed OK!” With big data, sometimes i could say - “it is not pink enough. Something is wrong” haha
Inside the HLSL, in some situations, i don't initialize variables, because i want the shader to crash if some of my “if {} else if” misses the closing "else".
And my code inside C++ is intentionally different than the HLSL version. If i used “while” inside the shader, i use "for" inside C++(id does matter in my case). C++ version is explicitly different -
C++:
if (!(n ≤ m)) {
var = bim(n);
var += bam(var);
} else {
var = bim(m);
var += bam(var);
}
HLSL:
if (n > m) {
var = bim(n);
} else {
var = bim(m);
}
var += bam(var);
To say it in a way as an example. It is more complicated than that. It helps to have two different ways of doing it. To rephrase it, it helps me to rethink again what i am exactly doing. Computer will not do it wrong, it helps me to find my human made bugs of human nature. For the computer is the same. So i am not copy-pasting code from the shader to C++.
Then in the tests i am explicit - Is a pixel at the correct place? Does it have correct value? If pixel is not there, is it expected to be missing? And so on, many on
One thing for sure, compared to before the tests, i have one mental problem less - anxiety for if it works correctly. With these tests i sleep much much better.
And the client will take it, if take it, the way it is. They have enough money to hire lot of top tier programmers to port it to any language/hardware they want. Once i sell it, i am free as a bird.