Advertisement

Client/Server sync - reasoning and specific techniques?

Started by May 06, 2020 05:25 AM
13 comments, last by hplus0603 4 years, 6 months ago

Hello, I'm working on networking for a 2D MMO and have researched enough for my system design thoughts to start coalescing. However, I'm having a hard time piecing together the various approaches to decide what my client/server synchronization should look like.

Some potentially relevant decisions (that could be changed, I'd appreciate a sanity check):

  1. I'm using a fixed timestep.
  2. I'm counting simulation ticks instead of using wall time.
  3. The client postdates “input" messages (with the server giving feedback if the date is too early or late).
  4. The client buffers received “NPC state" messages and processes ones from some amount earlier than its current tick.
  5. The client saves inputs and replays them over every received “player state” message from the server.

Many of these rely on the relation between the client tick and the server tick, but it's hard for me to conceptualize what the important points of that relationship are.

Reasoning questions:

  1. Must they always be “in sync", as in, processing the same tick at the same real-world time? What would happen if the server sent the client its tick number during a handshake and the client just worked “in the past”, postdating its messages enough that the server saw them as valid?
  2. What's the benefit of pinging to find latency in ms? When should I do it, and what do I use it for?

On top of that conceptual understanding, I have some specific questions about the technical approach:

  1. I saw on hplus's site (http://www.mindcontrol.org/~hplus/time-offset.html)​ a method for using the wall time to establish sync. Are there more techniques that I should know about? Any good resources for these?
  2. I saw it mentioned in a post somewhere that the server's feedback on the client's postdate should be enough to maintain sync, but how do you know that the ticks are still in sync and you haven't just compensated for them being off?
  3. What if the client is frozen and doesn't process ticks for a long while, then comes back? How do you detect this, and do you need to re-establish sync using the initial method?

There's an important realization in simulation: It doesn't matter if the events happen at the exact same real-world time, as long as they happen in the correct order! This realization underpins what's called a “Lamport Clock," which you can read about on Wikipedia.

Generally, games use systems a bit fancier than a plain Lamport clock, by advancing some kind of “goal clock” in sync with wall-clock time, so the raw Lamport algorithm isn't sufficient, but it's a good theoretical thing to understand. Also, games don't generally use raw vector clocks. Those solve a different problem. Again, however, the concept is useful to understand.

You don't need to separately “ping” to find the latency to the server – you can derive the network transmission and queuing time by inspecting timestamps of sent and received packets. Deriving the ping is useful to estimate how far in advance you need to send your command packets (if you use queued-commands,) and how far into the future to extrapolate states for received other-entities (if you use extrapolation.) It's also useful to display to the user, to give them some indication of their connection quality, to set gameplay experience expectations.

For frozen clients that wake up (happens with mobile network disconnects, or suspended laptops, and such) you typically want to have a system where a client detects if wall clock jumps too much at once (say, more than 10 seconds) and automatically disconnect it on the client. You similarly want to have a system that detects a client being gone for too long (again, say 10 seconds) and disconnect them. Whether you can tolerate 100 milliseconds or 1 second or 10 seconds or 10 minutes of outage, depends on your specific gameplay. If the disconnect is for too long, your session cookie/authorization token may even have expired, so the client needs to be prepared for being rejected and display a reasonable error message to the user.

enum Bool { True, False, FileNotFound };
Advertisement

@hplus0603 Thanks for the info, really helpful!

I'm about to add timestamps into my messages, and I have some additional questions:

  1. What is the recommended data to put over the wire when it comes to establishing and maintaining sync? I've seen wall time (both sides sending "S is my current clock, and your last message was received timestamped your clock Y and my clock C"), simulation tick number, and server-calculated tick offset adjustment all as independent approaches, and I'm having trouble understanding which are alternatives to eachother and which complement eachother.
  2. When the client first receives the simulation tick number of the server, should it attempt to calculate what tick the server is currently on? (add the server's processing time and ½RTT) Or should it just run with whatever number it gets and account for the difference through the offset?

(For any future people following along, here are the posts that I'm currently considering):

https://www.gamedev.net/forums/topic/609269-synchronizing-server-and-client-time/4854213/?page=3

https://www.gamedev.net/forums/topic/704579-need-help-understanding-tick-sync-tick-offset/5416883/

https://www.gamedev.net/forums/topic/609269-synchronizing-server-and-client-time/4854092/?page=3

@Archduke “What is the recommended data to put over the wire when it comes to establishing and maintaining sync? ”

Five, duh!

Whoa, Lamport Clock!!! nu uh!

None

Lamport Clock: Yeah, uh!

Data in the packets: the minimum amount of data you need to include is “the server tick number was X when this packet was sent.” (You can even compress that down to smaller amounts of bytes for most updates, if you send a baseline checkpoint every once in a while. On the client, when the small (say, one-byte) value wraps over, you just assume that the rest of the clock value has incremented by one. The savings here are maybe not always worth it.)

If you want to be fancy, you can also include other pieces of data: The server-clock-time when the packet was sent, the server-clock-time when your last message was received, and the server-timestamp-time when your last message was received. These will let you calculate whether you're early/late for the server, how much of your “ping” time is server processing delay versus network, and so on. Exactly whether this is needed or not is up to you.

The somewhat-fancy version of this would be four bytes:
Byte 1: if bit 0x80 is set, this is followed by N bytes (4 or 8) that include the full server tick number; else it is the lowest 7 bits of the current server tick number. If it “wraps over" to a much smaller number than you last saw, this means the server clock incremented by 128 and then set the lower bit to that value.
Byte 2: if the value is 0x80, this is followed by N bytes (4 or 8) that include the full delta between your-tick-as-sent, and what-the-server-thought it should be. Else, this byte is a signed int8 which measures the delta from your tick number as sent and the tick number the server needed. Positive numbers mean you can send packets later; negative numbers mean your tick arrived too early.
Byte 3 and 4: If bit 0x8000 is set, this is followed by N bytes (typically 4 or 8) of the server-clock value, truncated to something like 1/10th of a millisecond units. Else, the low 15 bits of this value are the low 15 bits of the server-clock (at 1/10th millisecond units) at time of packet send.

The server would start by sending the full values when you first connect, until it's gotten a few good packets from you, and then move over to the smaller delta encoding, perhaps including the full values every 200-1000 packets or so, just to make sure you can assert out on the client if you lose sync.

enum Bool { True, False, FileNotFound };

Alright, I'm back with more questions! I'm quickly learning just how much systems architecture design has to go into networking code to keep it from becoming an unmaintainable mess.

So, I'm sending messages from the client → server. These messages contain a tick number that is the tick that the client was on when the message was sent + the offset that the client is maintaining to keep messages arriving ahead of the tick that the server is processing. I then have the server looking at the received tick number and telling the client if it needs to be adjusted (currently targeting receiving messages 3 ticks before they're processed).

The issue is in double-sending adjustments, such as in this scenario:

Client sends input update
Server receives input update
Server sends state update including adjustment
Client sends input update
Client receives adjustment
Server receives input update
Server sends state update including (same) adjustment

As you can see, the server sent the same adjustment twice. This causes overshoot, where the adjustment is ping-ponging between -1 and 1 instead of settling at 0. Are there any elegant ways to solve this?

The only thing I've been able to come up with is adding the wall-time timestamp triple to every message ("time you sent last message, time I received last message, time I sent this message") so that I can calculate RTT, and using that to set a “grace period” on sending adjustments.

Advertisement

Yes, the “adjust twice” problem is a well known gotcha! (I don't recall if I mentioned it in the above, but in other threads, it comes up.)

The simplest solution to that is to include a measure of your current amount of adjustment in the packets you send up, and then include that back again when the server sends it back down.

So, the client will send “this is for tick A, and my current adjustment is B”

The client will then send “this is for tick A+1, and my current adjustment is B”

The server will send back “when your adjustment was B, your packet was off by an amount C.”

The client will adjust clock and send “this is for tick A+2, and my current adjustment is C”

Then the server will send back “when your adjustment was B, your packet was off by an amount D”

Because the clients current adjustment when it receives that packet is not B, it will ignore that adjustment.

enum Bool { True, False, FileNotFound };

hplus0603 said:

Yes, the “adjust twice” problem is a well known gotcha! (I don't recall if I mentioned it in the above, but in other threads, it comes up.)

Ah, of course! I didn't remember seeing it elsewhere, but sure enough I found this post from last year https://www.gamedev.net/forums/topic/704579-need-help-understanding-tick-sync-tick-offset/5416883/

If I was to use a “tick generation” like in that post, is there a reason to start it at the client and loop it back instead of the server managing it and only sending it in the one direction? It seems like that would be sufficient for avoiding this specific problem, but I might not be considering something else.

Found the answer--the client need to be the one that increments the generation counter, since the server receives a delayed view of when an adjustment has been accepted! Thus, the client needs to loop the current generation through the server.

I finally understand enough to grok the final problem people have been running into in all the threads I've been going through. Big problem, though.

The illustration below shows a client and server at tick 150 (already connected). The client is sending messages containing data and the tickNum that it should be processed on (currentTick + offset). The server then tells it to adjust the offset -1:

The server ends up having two messages for the same tick:

Both sides have enough data for their sims to proceed as normal, it's not like we hit a lag spike or something, but this shifting of tick number relative to eachother seems like it forces us to drop a message, or process 2 messages in 1 tick (desync due to holding inputs for different amounts of time on client and server). The opposite issue happens with adjustment in the positive direction.

From reading through threads, it seems like the solution is some sort of speed-up or slow-down of the client's sim. That makes sense to me with other synchronization approaches, but this approach specifically has the offset as the thing that's moving around, with the client's actual sim proceeding as normal. I'm not sure if I'm failing to understand some aspect of this approach, or if it just doesn't work as I thought.

This topic is closed to new replies.

Advertisement