What components of a game Engine can I put to dedicated threads? How best to multithread the Renderer?

Engines and Middleware Programming multithreading thread engine Component

Started by Key_C0de December 19, 2020 02:40 AM

7 comments, last by Vilem Otte 4 years ago

Key_C0de

Author

December 19, 2020 02:40 AM

First things first, the Windows event handling must go to the main thread.

Can the audio System run on its own separate thread?

Can the physics system run on its own separate thread?

Can AI run on its own separate thread or multithreaded?

Can networking run on its own separate thread or multithreaded?

For submitting tasks to RenderQueues (each Pass has its own render queue of tasks and an object has an Effect which basically links the object to potentially multiple passes's renderqueues - puts itself there for later draw call), loading from files, textures etc. can I have a thread pool (max threads on my system) and dynamically spawn threads to handle them? My Number 1 concern is this one, because RenderQueues often contain tasks passed to them in a certain order, and a Pass sometimes needs to be drawn before another, isn't that right? Now can I record command lists for each renderqueue (on separate threads taken from the threadpool dynamically) and later play them all back on the main rendering thread? How best to parallelize that?

I'm on Dx11 btw, I'd really love an evaluation and thoughts about multithreading strategy as well as anything else you'd like to add. Thanks.

None

Vilem Otte

3,390

December 19, 2020 04:13 AM

So, to answer your question shortly - Yes, Yes, Both Yes and Both Yes. But I guess you don't want the short answer, otherwise I wouldn't be writing.

Long answer is, it depends and for different projects you will need/want to use different approaches. On lower level you can design more generic job system - I do use one of such in some of my projects. So here is simplified description:

The base component is Task - which has flags (None, Repeating, Thread safe and Frame sync) and a virtual method to execute it. Flags are almost self-explanatory:

Repeating - task runs repeatedly unless it is killed
Thread Safe - task can be executed by worker threads, not main thread
Frame Sync - task has to be synced every step/frame

Then there is Scheduler - which has a concurrent queue of task list that have to run on main thread (this is double buffered - so you can push repeating tasks back into queue), and other 2 queues - one for background tasks and one for tasks requiring synchronization. It also has a pool of worker threads to execute background tasks, tasks that run on main thread are run in its main loop - and once single iteration is finished - you wait for all frame sync tasks.

…

The point of such system is simple - you do not need to start any threads in runtime - you have a set of worker threads and only assign work to them. This all is also abstracted to higher level with Systems, which can communicate through Events among each other. I won't go into implementation details on my side right now (that would be for article or more), but let's go forward.

A project on this engine will generally run multiple systems - PhysicsSystem, GameSystem, RenderingSystem(s), NetworkingSystem, etc. - depending on the requirements. Let's look at one in detail:

“PhysicsSystem”

Which would be a repeating background task. Whether you need to frame sync it or not - depends on your implementation. It would take world in one state, perform physics simulation step, and end up in world within another state. You will always get a valid step (along with timestamp - in case that you want to interpolate or extrapolate).

You may go even further and the actual physics integration may not even be in this system - but you could start multiple “PhysicsSolverSystems", that could run in parallel just as background tasks. Once that is finished “PhysicsSystem" would update.

But just as well you can do it as single threaded repeating task and execute it on main thread. It can be enough for your project.

…

I think you got my point of answering your 4 questions - it entirely depends on the way you want to handle it. You can implement lower level in a way that makes it a bit simpler for you, but the actual decision and design will be project dependent.

…

Now, just shortly to your rendering question.

You can record your command lists at any point and I believe that in any thread - what matters is, when you execute them and in what order (the actual work submission can be done entirely from your main thread). I believe that's the standard approach right now.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Gnollrunner

474

December 19, 2020 08:20 AM

I'm by no means an expert on this but for what it's worth here's my general strategy that I've used in both my libraries. I have the rendering itself in a single thread. In my case I've found that's all I need. However I keep a copy of geometry on CPU side which I update using a thread pool. I'm using procedural generation and also have a rather exacting LOD system, so in my case the thread pool can contain many threads, however even with more traditional techniques, I think you could probably do something similar.

To add a bit more detail, I usually have one controlling build thread, a dispatcher thread which simply waits until it gets a job from the build thread and passes the job off to one of the thread pool threads, and of course the pool threads themselves. One thing I try to avoid is having threads polling. I try to make sure the threads are in a waiting state when they have nothing to do as to not burn CPU time unnecessarily. In my tests I found that thread pools are well worth the trouble, especially if you are handing them relatively small jobs. Creating and destroying threads all the time seems to kill performance quite a bit.

On the GPU I always have 2 copies of the geometry: One that is in used by the rendering thread, and one that gets built from the CPU geometry when it's ready. When a new copy of the geometry is ready to go I simply switch over, and start the building processes over again. This has the advantage that the frame rate never really slows down since the graphics doesn't have to wait for anything to render. On the down side it uses basically 2X the GPU memory space required for meshes.

As for physics, that also has it's own thread. For the kinds of things I'm doing the physics is less demanding than the graphics. In general I don't want the graphics and LOD systems to slow down the physics. Even if the LOD were to lag the physics should remain responsive. I keep it separate from the rendering system as much as possible.

Not sure how much of this applies to what your doing but this is how I handle it. I'm sure a lot depends on exactly what kind of game you are programming.

Key_C0de

Author

December 21, 2020 12:00 PM

@Gnollrunner Strange system. You pass work from a build thread to a dispatcher thread which passes work to the thread pool threads. Never heard it before. Why would you have a thread that is only a mediator and simply passes work elsewhere? That's strange.

I have a thread pool too, which I use it to provide work from the main thread (in your case you could call the “main” thread the “builder” thread). You keep a copy of the geometry.

Physics in it's own thread; got it. I'm in the verge of doing this too. Did you face serious gotchas you had to deal here?

Why would the LOD lag anything? The lod is simply choosing the right offset on the imported geometry of a model by doing a couple of easy calculations for each mesh.

Ok you keep the rendering in its own thread. As far as I could see the rendering thread is the most demanding and it's wasteful you don't leverage d3d11 (or +) multithreading ability, to record command lists on separate threads (using your thread pool) place them into a render queue and once everything is built, your main thread can playback the command lists - send all work/draw-calls to the gpu. That's the thing I'm puzzled about though, since there can be a draw call A for object O1 which belongs to Pass P1 and needs to be sent to the GPU before draw call B for object O1 which belong to Pass P2 - because this object's Effect requires P1 to precede P2, as that's how my Passes will blend properly and I can achieve the desired visual effect on the object.

As for miscellaneous loading stuff, files, whatever from disk (all of them basically required for the Rendering Tasks ) I also want to dispatch to a thread pool, since file loading meshes/textures can take time (I haven't done this yet, but I will - I figured it's a good source for multithreading - I wanted opinions first).

None

Key_C0de

Author

December 21, 2020 12:09 PM

@Vilem Otte Your thread dispatching depending on granularity of work is interesting. Your multiple threads for physics integrations seems overkill to me, but you may be working on a big game, even aaa quality. If you could elaborate a bit on those integrator threads or share a link or something for further study It'd be nice.

“what matters is, when you execute them and in what order” well yeah that was my point of confusion, that's what I was asking advise for. I suppose I may be overcomplicating it myself though. I have multiple render queues for each different pass. So the passes that should be run before others, I simply send them off first to the GPU, since in the end all work (draw call) submission to the GPU is done from the main thread (Dx11 “limitation”). I'm not sure where I was confused here, but discussing things with others certainly helps me too.

None

Gnollrunner

474

December 21, 2020 02:00 PM

@Key_C0de

Key_C0de said:
@Gnollrunner Strange system. You pass work from a build thread to a dispatcher thread which passes work to the thread pool threads. Never heard it before. Why would you have a thread that is only a mediator and simply passes work elsewhere? That's strange.

As I said I am trying to avoid polling. I tried it without the dispatcher but it would lock up on rare occasions. For instance take the degenerate case where you have one thread in your pool. That thread checks for a new job but it finds none, then it goes on to wait on a condition variable. However in the meantime before it gets to the wait, a job comes in and since it isn't yet waiting, it misses the notify. Now the thread is just waiting for a message that it will never get. It's basically a race condition.

There may be ways to fix this without the dispatcher and also no polling, but threading can be tricky, and I want to avoid any lockup situations. Having the dispatcher simplifies things since it can lock the main queue while the dispatcher is running, so that no new jobs can come in. Hence when one does come in, it will for sure see the notify. It also gets a notify whenever a pool thread ends so it always knows when it can send a job to the pool and when all pool threads are in use. Since all it does is move some pointers around, it's loop is pretty quick and it won't lock the job queue for very long and therefore won't lock the main thread for very long also.

Key_C0de

Author

December 21, 2020 02:56 PM

@Gnollrunner Hmmm looks like a nasty corner case, or it can be a badly designed thread-pool. I'll keep it in mind.

None

Vilem Otte

3,390

December 22, 2020 12:56 AM

Key_C0de said:
Your thread dispatching depending on granularity of work is interesting. Your multiple threads for physics integrations seems overkill to me, but you may be working on a big game, even aaa quality.

An example of when that happens (not directly physics related - could be used there too) - what I need to do each frame is to build acceleration structure for rendering (I'm using multi-level BVHs). So each frame, there is let's call it BVHUpdaterTask which runs in background (each ray traced frame) when executed loops through all objects in the scene that are dynamic (their geometry updated), for each of these it spawns a new task BVHBuilderTask which builds a single BVH for such object (it is technically a bottom-level BVH for that object). This in short - determines which BVH algorithm given object uses, builds BVH based on that, flattens it to linear BVH for rendering representation and then signals that it has finished.

Once all these task are done, another BVHBuilderTask is started which builds this time top level of BVH. Once that is done, the BVHUpdaterTask is finished and we know that we can continue in performing ray tracing.

…

This is probably one of the smallest examples of how I used this system.

The game is, sadly, still quite incomplete (I'm currently working on animation/skinning system … which will have to be then properly used in ray tracer after) - and this year made a severe hit on my time table for obvious reasons (I'm basically months off). I've finally been able to at least find some time during the weekends to do some actual work on it.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

What components of a game Engine can I put to dedicated threads? How best to multithread the Renderer?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

What components of a game Engine can I put to dedicated threads? How best to multithread the Renderer?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines