Advertisement

Understanding data flow in a multi-threaded render pipeline

Started by February 22, 2018 07:42 PM
6 comments, last by getoutofmycar 6 years, 11 months ago

I'm having some difficulty understanding how data would flow or get inserted into a multi-threaded opengl renderer where there is a thread pool and a render thread and an update thread (possibly main). My understanding is that the threadpool will continually execute jobs, assemble these and when done send them off to be rendered where I can further sort these and achieve some cheap form of statelessness. I don't want anything overly complicated or too fine grained,  fibers,  job stealing etc. My end goal is to simply have my renderer isolated in its own thread and only concerned with drawing and swapping buffers. 

My questions are:

1. At what point in this pipeline are resources created?

Say I have a


class CCommandList
{
   void SetVertexBuffer(...);
   void SetIndexBuffer(...);
   void SetVertexShader(...);
   void SetPixelShader(...);
}

borrowed from an existing post here. I would need to generate a VAO at some point and call glGenBuffers etc especially if I start with an empty scene. If my context lives on another thread, how do I call these commands if the command list is only supposed to be a collection of state and what command to use. I don't think that the render thread should do this and somehow add a task to the queue or am I wrong?

Or could I do some variation where I do the loading in a thread with shared context and from there generate a command that has the handle to the resources needed.

 

2. How do I know all my jobs are done.

I'm working with C++, is this as simple as knowing how many objects there are in the scene, for every task that gets added increment a counter and when it matches aforementioned count I signal the renderer that the command list is ready? I was thinking a condition_variable or something would suffice to alert the renderthread that work is ready.

 

3. Does all work come from a singular queue that the thread pool constantly cycles over?

With the notion of jobs, we are basically sending the same work repeatedly right? Do all jobs need to be added to a single persistent queue to be submitted over and over again?

 

4. Are resources destroyed with commands?

Likewise with initializing and assuming #3 is correct, removing an item from the scene would mean removing it from the job queue, no? Would I need to send a onetime command to the renderer to cleanup?

First off, welcome to GameDev.

A lot of your questions have super open ended answers and they all depend on how your engine works and how you designed it to work.

A good read about the flow of how a modern 3D engine works is found in these blog posts about the Autodesk Stingray Engine (now discontinued I believe): http://bitsquid.blogspot.com/2017/02/stingray-renderer-walkthrough.html

It's a general great blog, reading over older posts wouldn't hurt either.

Another good read, though it's mostly for Direct3D 11/12 and Vulkan (but the concepts are sound and would work for OpenGL) if I recall correctly is: 

As this sounds like it is your first go at a multithreaded engine, you're quite likely going to make design mistakes and likely refactor or restart several times, which is fine.  It's part of learning and the best way to learn is from ones own mistakes.

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety." --Benjamin Franklin

Advertisement

Thanks for the welcome Mike! I agree they are open ended as you deduced. I have a very vague idea of what to do but new to the threading bits so wouldn't mind blindly following anyone's working approach until it sits with me and I can experiment. I will give those articles a read, I've also read the excellent high level ones over on the molecular blog but the inner workings are lost on me. 

Also forgot about these videos:

And:

Case you had any free time left ;-)

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety." --Benjamin Franklin

Thanks. I've actually watched these and more, Naughty Dog's one and others as well, but they are all still high level overviews that eventually wind down to stuff about atomics, actors, how to scale and none really delve into the intrinsics that I ask about but will no doubt be useful once I can get this running properly. Stuff like bgfx seems like what I would want but is a bit too dense to pick apart with all the backend stuff and whatnot.

Maybe my skull is extra thick, but none of my questions are answered in these. As I said before I do have a general idea of what to do, I'm more looking for actual implementation details mostly around how to handle resources, even pseudocode would be fine. I'm still reading the articles you linked above and will finish them tonight, hopefully I can glean anything from them to translate to code.

The easiest way to get started is with a single "main thread" and an unstructured job system. The API for that might look as simple as:


void PushJob( std::function<void()> );

From your main thread, whenever you have a data-parallel workload, you can then farm it off to some worker threads:


//original
vector<Object*> visible;
for( int i=0; i!=objects.size(); ++i )
  if( IsVisible(camera, objects[i]) )
    visible.push_back(objects[i]);

//jobified for a split into 4 jobs:
#define NUM_THREADS 4
vector<Object*> visiblePerThread[NUM_THREADS];
Atomic32 jobsComplete[NUM_THREADS] = {0};
for( int t=0; t!=NUM_THREADS; ++t )
{
  vector<Object*>& visible = visiblePerThread[t];
  Atomic32& jobComplete = jobsComplete[t];
  int minimumWorkPerJob = 64;//dont bother splitting up workloads smaller than some amount
  int workPerThread = max( minimumWorkPerJob, numObjects/NUM_THREADS );
  //calculate a range of the data set for each thread to consume
  int start = workPerThread * t;
  int end = workPerThread * (t+1);
  start = min(start,numObjects);
  end = min(end,numObjects);
  if( start == end )//if nothing for this thread to do, just mark as complete instead of launching
    jobComplete = 1;
  else//push this functor into the job queue
    PushJob([&objects, &camera, &visible, start, end]()
    {
      for( int i=start; i!=end; ++i )
        if( IsVisible(camera, objects[i]) )
          visible.push_back(object);
      jobComplete = 1;
    });
}
//at some point before "visible" is to be used:
//use one thread to join all the results into a single list
for( int t=1; t!=NUM_THREADS; ++t )
{
  //block until the job is complete
  BusyWaitUntil( [&](){ jobsComplete[t] == 1; } );
  //append result set [i] onto result set [0]
  visiblePerThread[0].insert(visiblePerThread[0].end(), visiblePerThread[i].begin(), visiblePerThread[i].end());
}
vector<Object*>& visible = visiblePerThread[0];

This simple job API is easy to use from anywhere, but adds some extra strain to its users. e.g. above, the user of the Job API needs to (re)invent their own way of figuring out that a job has finished each time. In this example, the user makes an atomic integer for each job, which gets set to 1 when the job is finished. The main thread can then busy wait (very bad!) until these integers change from 0 to 1.

In a fancier job system, PushJob would return some kind of handle, which the main thread could pass into a "WaitUntilJobIsComplete" type function.

This is the basics of how game engines spread their workloads across any number of threads these days. Once you're comfortable with these basic job systems, the very fancy ones use pre-declared jobs structures and pre-scheduled graphs, rather than the on-demand, ad-hoc "push anything, anytime" structure, above.

The other paradigm is having multiple "main" threads -- e.g. a game update thread and a rendering thread. This is basically just basic message passing with one very big message -- the game state required by the renderer.

Going off this bare-bones/basic job system, to answer your questions --
1. Probably on the main thread, maybe on a job if it's parallelisable work.
2. Above I used simple atomic flags. If you're using a nice threading API, then semaphores might be a safer choice.
3. Yes.
4. Objects are not persistent in the job queue -- the main thread pushes transient algorithms into the job queue, so there's nothing to remove. The logic is the same as a single-threaded game.

With a fancier job system, the answers could/would be different :)

17 hours ago, getoutofmycar said:

I would need to generate a VAO at some point and call glGenBuffers etc especially if I start with an empty scene.

GL is the worst API for this. In D3D11/D3D12/Vulkan, resource creation is free-threaded (creation of textures/buffers/shaders/states can be done from any thread), and in D3D12/Vulkan you can also use many threads to record actual state-setting/drawing commands (D3D11 can too, but you get no performance boost from it, generally). It's probably worthwhile to do all your your GL calls on a single "main rendering thread", rather than trying to make your own multi-threaded-command-queue wrapper over GL.

Nonetheless, the entire non-GL / non-D3D part of your renderer can still be multi-threaded. That involves preparing drawable objects, traversing the scene, culling/visibility, sorting objects, etc.... In a D3D12/Vulkan version, you can also multi-thread the actual submission of drawable objects to the API.

Advertisement

Thank you, this is some great affirmation on my shaky logic. Your talk was what actually really inspired me to try my hand at this being the most approachable for a threading newbie. I think one of my misconceptions was about how I would be blocking the rendering thread when doing any allocation. Going to finish up my simple spec and tackle this over the weekend.

This topic is closed to new replies.

Advertisement