Advertisement

Low-latency, determinism, and multithreading

Started by November 07, 2017 10:53 AM
7 comments, last by Shaarigan 6 years, 10 months ago

Usually, an engine has to strike a balance between these three factors, sacrificing at least one to maximize another.

I'm looking for some information on how various engines deal with each of these factors, helping one to make the right choice depending on the requirements of a game. For example, determinism is imperative for physics puzzle games, low-latency is in high demand for twitchy shooters, and multithreading suits large-scale simulations.

5 minutes ago, silikone said:

Usually, an engine has to strike a balance between these three factors, sacrificing at least one to maximize another.

I wouldn't agree with that statement. They're all independent concerns.

Every modern console game is multi-threaded. My workstation has roughly the same CPU model as the XbOne/PS4, except that the Consoles are clocked around 1.6GHz, while my PC is clocked at 4GHz. The consoles are slowwwwwwwwww... but they have 8 cores. So you have to write your code so that it will actually run across 8 cores. The PS3/Xb360 were the first platforms to force this change in mindset, so multi-threaded games have been a thing for about a decade now.

By latency I assume you mean the latency between the user physically pressing a button and the monitor showing some change as a result of that button press -- input-to-photon latency.

In a typical game this will be about 3-4 frames, or 2-3 on a CRT/projector... In the best possible situation it looks something like:
* user presses button, the game will poll for the button state at the beginning of the next frame. This will be somewhere from 0 to 1 frames from now (+0.5 frames on average).
* game starts a new frame, polls the input devices, updates game state (+1 frame)
* game finishes the frame by issuing the rendering commands to the GPU.
* GPU then spends an entire frame executing these rendering commands (+1 frame).
* Your nice LCD monitor then decides to buffer the image for about a frame before displaying it (+1 frame)

At 60Hz, that's around 50-60ms. The easiest was to reduce those numbers is to just run at a higher framerate. If you can run the loop at 200Hz (no vsync) then the latency will be <20ms.

Games may make tradeoffs that produce worse latency than this, but not because of multi-threading/determinism. A common one is using a fixed-time-step for your gameplay update loop, in order to keep the physics more stable... actually yeah, stable physics is a determinism concern, so you're right there! In order to use a fixed-time-step and keep the visual animation perfectly smooth you have three choices:
1) use a very small time-step (like 1000Hz) where the jittery movement won't be noticed.
2) buffer an extra frame of game-state and interpolate object positions.
3) extrapolate object positions instead of interpolating, though this still causes jittery movement when objects change direction/collide/etc..
Choice #2 adds another frame to the above calculations.

Another reason to add an extra frame of latency is if input rhythm / relative timing of inputs is important. To get perfect rhythm, you can poll the input device very fast (e.g. at 1000Hz) and push all the inputs into a queue with a timestamp of when they were recorded. The game can buffer up a whole frame's worth of inputs in this manner, and then in the gameplay logic it can process the most recent frame's worth (e.g. a 16.67ms slice of inputs) at once, taking the timestamps into account. You're then able to process inputs with sub-frame accuracy -- e.g. even though you still receive all inputs right at the start of an update cycle, you can determine facts such as- the player pushed this button 3/4ths of the way through the frame.

Determinism is often down to our desire for fast floating point math. The IEEE has a specification on how floats should behave, and you can tell your compiler to follow that specification to the letter, and then you know that your calculations are reproducible... However, this is often a hell of a lot slower than telling your compiler to ignore the IEEE specification. Then there's loads of other things like being very careful how you generate random numbers, and very careful about when and how any kind of user input is allowed to affect the simulation. e.g. in a peer-to-peer RTS game, you might need to transmit everyone's user-inputs to every other player first, get acknowledgement, then apply everyone's user inputs simultaneously on an agreed upon simulation frame. Ok... that's also a case where determinism does necessitate higher input latency :) I'm getting your post now! However, in that situation, the local client can start playing sound effects and animations immediately, which makes it seem like there's no input latency, even though they're not allowed to actually modify the simulation state for another 500ms.

There are situations where certain multi-threading approaches can introduce non-determinism, but all of those approaches are wrong. If you're multi-threading your game in such a way where your support for threading means that the game is no longer deterministic, then you're doing something terribly wrong and need to turn around and go back. I can't stress that one enough. There's no reason that multi-threaded gameplay code shouldn't behave exactly the same as a single-threaded version.

Advertisement
19 minutes ago, Hodgman said:

There are situations where certain multi-threading approaches can introduce non-determinism, but all of those approaches are wrong. If you're multi-threading your game in such a way where your support for threading means that the game is no longer deterministic, then you're doing something terribly wrong and need to turn around and go back. I can't stress that one enough. There's no reason that multi-threaded gameplay code shouldn't behave exactly the same as a single-threaded version.

This is where one may or may not see latency introduced in order to stay deterministic. If you are smart about it, you could detach the game simulation from the player input and feedback. It is crucial that the mouse feels instant, but gunfire and animations don't have to be instant. Of course, for the best programmers, nothing beats having instant everything.

Java can also make deterministic floating point operations, but it's not the case for C#. So in the .NET case you can make your own int-based fixed point math library with bitshifting wizardry and taylor series for functions like sines/cosines/etc (I don't know if there's a better way to approximate a function). Also, you could get also a bit more precision in exchange of range using something like this. 

Deterministic gameplay depends also on the implementation of your engine. even if your engine runs multithreaded you might have such systems like Unity 3D that does not allow your gameplay code to be executed on different threads (without doing some management heavy synchronization tricks) while a good tested and maintained task/job system will help your gameplay code to speed up (depending on the complexity and dependencies) with splitting different sections into different tasks and spread them along your system so they could run in parallel. This gets performance and behaves deterministic but needs to be modified for different scenario.

Multithreading as @Hodgman mentioned is part of game development since the day as Unreal Engine 3 first releases (and I think also much earlier). Any AAA game engine these days supports multiple threads to run those massive titles these days. Sure a computer CPU runs on a lot faster tick but you do not have scheduling with other processes on a console platform so could exclusively use any except one thread (for PS4, I do not about XB1) for your game running without beeing scheduled into a wait queue. That lets make consoles "seem" fast ;)

Game development these days is more or less planning of what runs when where and how could wait locks be reduced depending on dependencies between different tasks/jobs and systems rather than the difference between being determinstic or performing. Sure floating point and double precision is another topic to be discussed about and I have had many bugs in my career (mostly related to AI) when floating point precision leads to minimal differences but in the end this could be also utilized to let an AI look more natural in behavior :D

I'm developing a game that uses opencl and it has low enough latency even though it uses millions of gpu threads. Did you mean real-time-computing?

Need an open-source multi-gpu OpenCL load-balancer for C#? Here it is= https://github.com/tugrul512bit/Cekirdekler/wiki it also has pipelining.

Hello world in all GPUs:


ClNumberCruncher cr = new ClNumberCruncher(
    AcceleratorType.GPU, @"
      __kernel void hello(__global char * arr)
      {
           printf(""hello world"");
      }
");
ClArray<byte> array = new ClArray<byte>(1000);
array.compute(cr, 1, "hello", 1000, 100); 
Advertisement

What advantage for games does openCL have? That seems like an unusual approach. 

OpenCL/Cuda is mostly used to make physics computation faster for the drawback of consuming some GPU power. It has some advantage in heavy computing situations but a good designed and tested game loop (for example in a multithreaded task based environment) does the job as well without OpenCL/Cuda

This topic is closed to new replies.

Advertisement