frob said:
It can conceptually be easier to work with multiple data streams working in parallel, and drivers for Vulkan and modern DX can work with those multiple CPU threads and make them work internally with the drivers, but under the hood, there's still only a single hardware interface.
I guess it's a win if you really have complex rendering going on, so MT helps to translate many API calls to data and commands the GPU can actually use. The bus bottleneck after that remains ofc.
But to me it feels more attractive to minimize the number of materials, pipelines, draw calls etc., and then do bindless and GPU driven rendering. The rendering cost for CPU should be so small no MT is needed for gfx.
Though, i'm not sure this works as well as i hope - still need to learn about this first…
Gnollrunner said:
What I do is keep N number of threads waiting on a queue and I have a dispatcher thread which sends work to the first available thread or waits until there is one that becomes free. I noticed a huge speed up going to this method, from just simply creating and destroying threads every time I needed to do something.
When i started work on my current editor / offline tools, i tried to get rid of a cumbersome job system but only use new C++ features instead for MT.
But this was much too slow even for offline needs. It caused launching new threads constantly, and also way too many threads at the same time. : (
The burden with the job system is that i need to write callbacks for every thing i want to do in parallel, plus eventually some struct to hold context. Very often i'm just too lazy for that, so lots of my stuff remains single threaded.
Having all those callbacks around also makes the code harder to maintain. It sucks and feels old school.
I know it's possible to do better. The physics engine i use can parallelize lamba functions without any callback. I guess it has some extra cost from using function objects, but not sure. I should look how this actually works…
I must say it is still a bit hard to do efficient parallelization even for CPU. Too much extra work and expertise is required for something which should be easily accessible and have native support in languages. Still hoping for future C++ improvements, but it's a long wait.