Advertisement

Tips about sync between several threads

Started by June 02, 2017 10:59 AM
4 comments, last by Tonyx97 7 years, 8 months ago

Hello guys, I reached the point where I care about the sync between threads because I encountered a big problem long time ago but now it turned to be really annoying. My project has 4 thread which 1 of them doesn't need sync (audio one). The other threads are: Scripting (LUA), Physics (Bullet 2.83.7) and Rendering thread. I tried to add some ghetto bools to test if they sync better, it works but I don't want to do this since it's not good and also it doesn't work properly when the physics thread has a lot of entities to process. You can see the main problem here. As you can see when there are no entities the character goes decently smooth but when there are tons of entities it's so lagged and annoying. There are examples for multithreading in Bullet but I don't know if that's enough to cover my problem since scripting thread also change entities position, velocity etc. I tried using the following simple code to sync threads but I think it's not enough nor good practice of mutex concept.


class CBarrier
{
public:
	CBarrier(const CBarrier&) = delete;
	CBarrier& operator=(const CBarrier&) = delete;
	explicit CBarrier() : m_continue(true), m_gen(0) {}

	void wait_continue()
	{
		unsigned int gen = m_gen.load();

		m_continue = false;

		while (m_gen == gen && !m_continue)
		{
			std::this_thread::yield();
		}

	}

	void set_continue()
	{
		unsigned int gen = m_gen.load();

		if (m_gen.compare_exchange_weak(gen, gen + 1))
		{
			m_continue = true;
		}
	}

private:
	std::atomic<bool> m_continue;
	std::atomic<unsigned int> m_gen;
};

This class has a very simple use so you can guess how it works. I heard rendering thread should wait until physics thread is done (like Unity does). I need some tips about how to sync everything properly, like what I should use, winapi critical sections, mutexes, atomic, a mix of some of them...? I'm a bit confused now. Someone maybe knows about this and he/she can recommend me what I should use to achieve my goal. Maybe some kind of organization for threads, the correct flow of the game etc. Thanks for reading :)

Edit: I've been reading about bullet physics timestep and fixed timestep, maybe could it be the problem? Also entities that are processed from the physics thread directly without changing anything from them (only gravity and collisions) are smooth but since I move my character from the scripting thread the problem appears. Should I really wait scripting thread before rendering...?

There's one thing I don't understand, and one thing that is immediately obvious as being "desastrous".

I don't quite understand the purpose of m_gen. The render thread can only proceed after physics simulation has finished, so really all it needs to know is "are we there yet?". Not sure what good an increasing gen value will do (nor why it must be compare-exchanged, which is approx. twice the cost of incrementing -- are there several physics simulations competing with each other?).

The "desastrous" thing is that you spin when you want to be blocking. std::this_thread::yield() calls something like SwitchToThread() under Windows and sched_yield() under Linux. Either of these burns something around 2,000-20,000 cycles invoking the operating system's scheduler, and then, likely (at least possibly) returns, spinning and immediately invoking the scheduler again. That will burn an awful lot of CPU time while you wish that time was rather spent on evaluating physics!

A much better approach would be to atomically check the are_we_there boolean, and wait for an event to be signalled if it's not set (KEV or futex if you're courageous enough to write the low level stuff -- use a cond var otherwise). This will use zero CPU time when there is nothing to do, and it will continue as soon as the prerequisites are there.

Optionally, you could spin a few times, but you will only want to do that if you there is a compelling high likelihood that wait times are very short. In your example with physics lagging behind, that is almost certainly not the case. Either way, when spinning, you would -- on X86 -- want to emit a rep-nop instruction to allow hypercores to steal resources, but you wouldn't want to invoke the scheduler.
Advertisement

My question, why are you locking threads to do only one thing? Instead use them to do tasks. Like all threads do physics work now. Then the next thing. Your render thread is likely idle a lot of the time as all we do now a days is push commands onto the cards. Make a few extra for lower priority things like loading in the background or waiting on other data, etc. When your game gets on a 8 core CPU it will be wasting a lot of cycles it could be using for faster simulation speeds *shrugs*.

<rant>

Anytime I see a C in front of a class name I die a little on the inside. In this day and age there is no use for it. Intellisense is everywhere, and it just makes things harder and slower to read honestly.

</rant>

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety." --Benjamin Franklin

Thanks both for answering. I'll investigate further about condvars etc.

My question, why are you locking threads to do only one thing? Instead use them to do tasks. Like all threads do physics work now. Then the next thing. Your render thread is likely idle a lot of the time as all we do now a days is push commands onto the cards. Make a few extra for lower priority things like loading in the background or waiting on other data, etc. When your game gets on a 8 core CPU it will be wasting a lot of cycles it could be using for faster simulation speeds *shrugs*.

<rant>

Anytime I see a C in front of a class name I die a little on the inside. In this day and age there is no use for it. Intellisense is everywhere, and it just makes things harder and slower to read honestly.

</rant>

Adding a C in front of a class is pure esthetic I like it and I won't have any problems of naming in other places and it doesn't conflict with other libraries so it's fine, I don't know why would it be harder to read a class name since it has the class keyword in front of it and it's just a simple C, if you tell me that underscore bar it's bad naming I can agree since it can conflict with other headers and libs. Anyways, The renderer thread needs to wait to the GPU job including scripting and physics afaik. I was curious and opened Unity and I checked that the FPS time was depending on the CPU time, for example it shows: CPU: main 15 ms | renderer thread: 0.2 ms and then the FPS was 1000 / 15 so I guess the CPU time goes first and then the renderer. You mentioned tasks, I guess this is a kind of queue that is called with priorities each frame before rendering, isn't it? Also I'm not really locking threads now, I stopped using that class (the class it's not mine, I just pasted it to test it). I used an atomic bool between renderer, physics and scripting thread but it's not enough. I think I have to implement some kind of task manager...

My question, why are you locking threads to do only one thing? Instead use them to do tasks. Like all threads do physics work now.
While that is generally a good approach, I don't think you can gain a lot in this case. Running physics in parallel is a quite non-trivial thing, but Bullet to my knowledge already uses multithreading anyway to what extent it makes sense.

On the other hand, there is not that much to gain from parallelism here. It's better than nothing, sure, but it's not like it automatically runs twice as fast only because you have two threads, or 6 times as fast on 6 threads. You can do some work such as update texture data or submit draw calls for static geometry while the physics simulation is running, OK... but that's probably very fast.

Then you need to wait until physics are done, otherwise you can't continue doing the dynamic stuff. So, calling stepSimulation in a separate thread can really only, in the best case, save you as much time as it takes you to submit the static geometry. And then, if you need shadow maps (which include dynamic objects) for drawing the static geometry, you might possibly not even be able to do that (unless you're OK with reusing previous frame data).

I am not saying "scratch multithreading altogether" because it still buys you something compared to running draw_static(); stepSimulation(); draw_dynamic(); in one thread. But it's nowhere near galactic gains. Trying to go really complicated with a task-based approach that somehow tries to further parallelize physics (I wouldn't even know how that is easily possible without rewriting Bullet!) for a marginal gain is definitively not something I'd do. Complexity usually doesn't improve systems.

Okay I'm understanding this, I just want to develop a good way to avoid this problem. What do you think about this. It's a class example to use tasks etc. I've been testing some stuff in my project and I realized something. No matter what, the character will move lagged if I put the function to change the velocity in several parts of each threads. I tried putting setvelocity in those 3 threads (before and after everything) and still is lagged when there are a lot of entities. Even it flicks when there are no entities but with less rate. Maybe the friction idk.

This topic is closed to new replies.

Advertisement