Hello, I have been ghosting this forum for quite a while, soaking in knowledge and the time has come for me to post my first post.
The short version:
I am looking to start a discussion on ways to deal with jitter in simulation frequency of client commands in an client-authoritative server setup. In this setup, the client sends its input state to the server and server simulates this input. There is of course no guarantee, that these messages arrive at a constant rate, which presents a problem how to synchronize a state on an entity E controlled by client C with other clients, so that the clients see the entity E move smoothly (no position snapping or velocity spikes in the usual case) when we synchronize E's state at a constant rate, but don't update it at constant rate.
I try to explain the problem in depth in the [Problem in depth] section and the present some solution ideas I have come up with/across in the [Suggested solutions] section. If you are up for it you can also read the long version, which gives you insight to the networking model I am using to better understand the exact situation. Again, as I am very new to the the area of networking in games, I would greatly appreciate any opinions on the presented solutions and their applicability and suggestions/opinions for ways to deal with this problem efficiently as I am pretty sure it is solvable, otherwise one wouldn't be able to play an mmo on a wifi, without everyone else seeing the avatar warp insufferably all the time.
The game for which I am working on a networking component is not twitch heavy, in the sense that you can not alter you orientation or velocity vector very abruptly from one frame to other, but simulation has to be precise enough to support shooting using raycasting and some melee using shapecasting, no interplayer collisions. This post however, only deals with issues of smooth movement in adverse conditions.
The long version: (the long long version is this article on source networking model, which is essentially what I use)
A couple of commonly used terms, so I don't type myself to death:
- Simulation Timestep (STS): Time period being simulated, usually your previous frame time. Your dt.
- User Command (UC): Structure containing all relevant user input into the simulation for and STS for the current frame. It would contain data such as velocities derived from your wsad buttons. Along with the current world state it constitutes simulation input.
- Networked entity (Entity): Anything whose state has to be synchronized over the network. May be controlled by a client (player character) or server (npc).
- Authoritative server (AS): A network setup, where (simplified) clients send UCs to the server which runs simulation with received UCs and sends resulting entity state back to clients. A key element of this setup is, all inter-client communication happens via the server and never between two clients directly. Also used to refer to the server component in such setup.
- Client side prediction (CPS): A technique in networked games aimed at providing instant feedback to player actions. In a normal AS setup, client sends UC and then must wait for the server to send back results before displaying them. This translates into a noticable, unpleasant delay between key press and cool stuff happening. You can remove this delay, by simulating the UC on the client at the same time it is sent to the server. If the UC's simulation result (player position, velocity, etc) received from the server is the same on the client and the server, no adjustments have to be made and client enjoys instantaneous feedback to his actions.
- Server time/simulation time/network time (ST): This is the time at which the world is currently represented by the simulation. It is ideally the exact same value for the server and all clients regardless of when they connected.
- Client interpolation/extrapolation (CI/CE): A technique used on client to display smooth movement of entities controlled by other clients. A client receives entity updates from server at some given rate (say 10-20times/s). Due to network latency however, these states are already in the past. When rendering, client can choose to render the other entities at some time t = ST - lerp_delay (say lerp_delay = 100-200/ms). While client will never have access to the entity state at ST (current stat), given low enough latency, the entity state for time t has already arrived from the server. Client can thus interpolate between two states to present entities as moving smoothly. In the case there are no updates older than even t, entity state may optionally be extrapolated using the last known position and velocity.
[What is the goal]
The goal is to facilitate synchronization of moving entities throughout a game world driven by an AS with an emphasis on smooth movement of entities on all clients.
[Network setup]
Lets illustrate the network setup by observing how the state of two player characters is synchronized with the AS and other clients.
Assume that we have a mechanism in place, that reliable initializes the entity on all clients and the server to an initial state S at time Tstart.
Client frame loop (device dependent FPS)
- Pack simulation input and STS into an UC, save the UC into an UC history buffer.
- Send the UC from (1) to the server for execution.
- Perform CSP by running the simulation with the UC from (1).
- If any, receive the state of other entities from the server (position, orientation, velocity, ST timestamp), add it to a state history buffer for the given entity.
- If any, process the server acknowledge message for latest UC send in (1) in some previous frame. Adjust the position of the player and remove all UCs from the UC history buffer (1) up to including the acknowledged UC. Rerun simulation client side with all UCs remaining in the UC buffer (1) in turn. If the CSP was successful (client and server side simulation results were identical) client will see no noticeable jerk in the movement of the entity he controls.
- Render the game world. For the entity controlled by the client, use the state computed in (2)/(5). For other entities use CI/CE.
Server frame loop (60 FPS)
- For each client, for each UC received: Simulate UC representing client actions in current wold state. Save the resulting client state and stamp it with ST it corresponds to (which is the stamp of the last state + STS from the UC. Send message to the client acknowledging the processed UC along with the state obtained.
- Simulate server controlled entities.
- [Every 3rd frame or so (~20times/s)] Synchronize entity states with clients. For each entity: broadcast the latest entity state along with the current ST to each client except the one who controls it (ie: ncp state to everyone, player character state to everyone except the client who controls the character).
[The problem in depth]
So the idea is: Your character moves smoothly due to the CSP. Server sees you a bit in the past, but that is fine, because other clients render you even further back in time. On other clients, your character movement is smooth because they smoothly interpolate it between two closest available states.
This all works well as long as you are on a network with very low packet loss. Once you try this on something wireless, you run into issues.
The state synchronization from server to client still works fine because CP can easily hide one or even a few lost state syncs once in a while, with large enough lerp_delay you might even consider sending the state syncs over reliable channel if you have the need and bandwidth budget. The weak link proves to be getting UCs from client to the server at a sort-of-constant rate.
The UCs have to be executed in the same order on the server a on the client. In the opposite case CSP on the client has little point because even a potentially correct prediction would get corrected just because a packet carrying UC was lost or UCs were in different order on the server. Thus UCs are sent reliably and executed in order. When a packet transferring an UC is lost, the server must postpone simulating all following UCs until the reliable protocol you are using to send UCs realizes it was lost and resends it. This may be quite some time.
In the ideal world (wired network) UCs from a client arrive to the server at some more/less constant rate of every R ms (disregarding latency jitter which manifests itself in the same way as packet loss, possibly less pronounced). As a consequence an UC from a client C is simulated on the server also roughly once every R ms. A delay introduced by packet loss translates into a sudden drop (the wait for the lost packet) and subsequent spike (simulating queued up UCs immediately) in frequency of simulation execution.
The problem is that the server client state synchronization frequency is constant, which means that some different state syncs don't necessarily correspond to the same amount UCs executed effectively meaning that even if a client moves at a constant velocity in a straight line, the state syncs sent to other clients may not have positions spaced equally while having equally spaced timestamps - leading to visually very noticeable speed bumps even with CP.
[Suggested solutions]
- Use unreliable channel to send UCs. Keep SEQ# of the last processed command, only process arriving commands with greater SEQ#, discard rest. [pros] This would help to reduce drops and spikes in the UC simulation frequency on the server. Every lost/discarded UC is still a drop in frequency, but at least there are no spikes (which is more important I dare say). [cons] The (big) downside is that if a perfectly valid UC gets lost, it is already predicted on the client machine, so when you send ACK for some later UC with different position, client gets a screen jerk (you can do the position sync over a couple of frames of the difference is small enough). However this does not account for latency jitter at all, makes the game straight up unfair to players with bad connection to the server by ignoring some of their commands. I would say that the last trait makes it unusable as it would drive players nuts in any kind of player versus player scenario, haven't actually tested it.
- Extrapolate the entity state to current ST at synchronization. In (server 1) step, each entity is marked with the time of its last update, when syncing, use the dt = current_ST - last_state_update_time compute the position as last_state_position + last_state_velocity * dt. [pros] Works well for movement in the straight line or when dt is reasonably small. [cons] Any time dt>0 you are sending merely approximation of the future entity state and not the entity state that will be later computed after additional UCs are simulated. This would make stuff like aiming unfair. Additionally it breaks down for any by very small dt and entities start warping violently.
- When synchronizing don't send state at current ST but at t = ST - sync_offset, ie: synchronize past. If t > last_state_update_time, use state history to interpolate smoothly, otherwise you may use the the latest state or extrapolate from last_state_update_time to t the same way as in (2). [pros] You enlarge the time window at which UC is accepted into current sync tick. [cons] Will introduce additional complication hit detection and range queries down the road (in addition to how CI/CE already complicates it). You can't pick a too high sync_offset or you make a player play too much 'in the past' allowing noticeable 'hits from the past'. Additionally it does not solve the problem for players with otherwise high ping (packet drop will almost always lead to extrapolation or using the last state anyway as the retransmission delay is so high).
- Synchronize client-controlled entities on their own timer, using last_update_time not sync ST. In server (server 1) we save states into a state history for each entity. These states are tagged with actual ST they correspond to. When synchronizing client controlled objects, send the last available state but tagged not with current ST but the actual ST that corresponds to that state. If no new state was computed since last sync, just send a special message indicating such case. If a new UC arrives, and we could not synchronize last sync tick, send a state update immediately for that entity after processing that UC (even if its not time for usual synchronization yet). [pros] Every entity only synchronizes actual position at actual time, no approximations. Its a sort-of best effort model to maintain constant spacing between entity state timestamps synchronized with clients. CI delay gives you time to deliver the data in time to be used for interpolation given reasonable client latencies. Haven't tried this out yet, first thing on my schedule though. [cons] There are definitely some which will show up when I try to implement it. ;]
What are your thoughts on these solutions? Have I possibly missed something crucial in the basic network setup that makes the jitter problem more pronounced that usual? Which one would you pic? Have you used perhaps used any of these in the past or do you have good experience with some other solution?
Medal: for the display of your tenacity that was getting all the way down here, seriously, I adore you. Cheers ;]