Advertisement

How Best to Handle Snapshot Deltas and Serialization?

Started by May 21, 2024 06:21 AM
7 comments, last by frob 5 months ago

I want to delta compress snapshots before sending them over the wire unreliably (UDP) for a client/server game. The goal is to be able to serialize and compress an array of components into a snapshot struct, generate a delta between two snapshot, apply that delta on the other end and finally, interpolate between (any given) two snapshots on the client. This probably sounds familiar to most of you.

tl;dr: there are a couple of questions embedded in this post: 1. Should snapshots be stored in their snapshot buffer in binary or deserialized form? and 2. What libraries or mechanisms are recommended for actually generating a delta and applying interpolation generically, or at all?

A few naive approaches to this, and some concerns:

  1. If snapshots are stored in the snapshot buffer in a serialized state, I.e. byte array, I can use FossilDelta or a similar lib to generate and apply the deltas directly in binary, and a fast and simple to use lib such as MemoryPack to serialize. However, it means deserializing snapshot binary data when ready to interpolate on the client — that or try to build something that can interpolate without deserializing via reflection which sounds like overkill. this Won’t work completely as I would need some data from the snapshot on reception, such as server tick. This still seems like the best solution if it can work.
  2. If the snapshot buffer maintains an array of deserialized snapshots, then interpolation is no longer a concern, but now I have to serialize snapshots before generating or applying deltas. Which, is obviously wasteful, as would be storing both binary and object versions of a snapshot.
  3. I could instead of using MemoryPack and FosilDelta, use something like NetCode lib which is a BitWriter that allows you to serialize (and deserialize) directly against a baseline object. I no longer need FossilDelta lib since NetCode’s BitWriter supports delta compression during serialization. This would give me the best of both worlds but I now have to write all the serialization code by hand using BitWriter for all classes and structs I want serialized.
  4. Another approach, may be to use FlatBuffers via FlatSharp and the corresponding delta lib for C# FlatSharpDeltawhich solves for #2 since it provides accessors I could use to iterate a snapshots components and data for client snapshot interpolation without deserializing. This however means I now have to manage a separate FlatBuffers schema and effectively duplicate component structs just for the sake of serialization, accessors are cool, but I don’t need a schema versioning anyway and so it feels like overkill.

I also need to send reliable data in the form of larger nested object arrays, and so ideally, I can stick to a single serialization library.

In general there's no “best” and no “should”, there are a bunch of tradeoffs and decisions that can work in some scenarios and not work well in other scenarios. You might look at refining your questions a bit after reading this.

So focusing on your questions you did ask:

Sending differences of snapshots can work well for some types of data, not for others. To understand your questions a little better, what are you synchronizing? Do you need complete synchronization in lockstep, a rough equivalent of transmitting the state of a board game on turn 1, then the state of the game on turn 2, then on turn 3, or were you considering more about view-dependent or context-dependent synchronization common in FPS games where the position of a creature about to attack directly in front of you or standing in the sniper crosshairs should be updated with much higher priority than a something standing outside the visible area on the map? Both have ways to stay in sync, but the process they follow is often quite different.

Assuming you know really do need to work with uniform snapshots rather than continuous prioritized updates, there are a variety of approaches. One is to create blocks of data, whether that's kilobytes of array space or perhaps as small individual memory-mapped files, take a binary difference of the snapshot, compress it, and send that over the wire for restoration on the other side. For some scenarios it works great. If the size of encoding is an issue (it may or may not be depending on your game's details) then reducing encoding with tools like bit writers or tools like protobuf come with the benefits and drawbacks you suggested, you're sending the minimum number of bits to encode specific things but are doing significant processing work to minimize bandwidth. For some scenarios that's an important and necessary tradeoff. There's the full spectrum in between of options. To understand it, what are the tradeoffs you're considering as important? How much data are you synchronizing, how much is different, and how often are you synchronizing it? If you can pack data down to 200 bytes of differences sent 20 times per second it is quite different from updating 200 KB of differences sent 20 times per second, different again from 200 KB of differences synchronized every 20 seconds. What are the details of your “larger nested object arrays” that you're updating? Do they need to be updated all at once or can they be incrementally improved, think there in terms of streamed video that has progressive refinement when few things are moving but blur when many things are moving, are there tradeoffs that might fit your model?

The questions about UDP and unreliable transport make me think you're working with ideas of 1994 rather than 2024. It is rare for developers to work at Transport level communications like UDP, where there is no data security or encryption, there is difficulty crossing IPv4 and IPv6 networks, there is exposure for DDOS attacks, and more. Certainly you can implement those details, library developers have to work with them, and there is nothing wrong with programming with UDP sockets if that's what you want to learn, but that's not where most games operate these days. Instead games today generally operate at a Session layer, relying on libraries to establish sessions between machines securely and then the developers simply queue content to go over the session. The way you talk about unreliable also suggests you've got more to learn, just because the UDP protocol is unreliable (packets may be lost, reordered, and duplicated) doesn't mean that using the protocol must be unreliable, it's a quite common pattern to implement reliable data channels over UDP. If you're building snapshots where you are moving as deltas from previous states, you're going to have to implement something that implements a degree of reliability so you know the prior state is present on the opposite side. What are you considering for data transport? What's your reason for wanting to work at the UDP Transport layer?

You mention C# and C# libraries, why? Are you working within Unity or building something in your own on .NET? Is this about limited familiarity with other tools or ecosystems or constraints that you have?

Advertisement

Thanks for the detailed follow up. I tried to keep my post relatively brief and so skipped some context that should addres your callouts:

1. The game is a 2D top-down action with up to 5 players (4v1) on a small procedurally generated map, hence the need to transmit what I expect to be small packets with high frequency, e.g. ≤ 1400 bytes 20 times p/sec. I will transmit only what is in view for any given player, e.g. within x tile radius and only the delta since the last snapshot. I expect data such as position/velocity, state, etc., from the server. Only inputs will be sent from the client.

2. I’m aware of at what layer of the OSI model I need to be focusing, and no, I’m not building a low level protocol over UDP from scratch. I’m leveraging as of now, a library such as LiteNetLib that provides API abstraction over a UDP socket for connection, reliable and unreliable sending of byte arrays. Neither am I looking to rewrite the wheel for serialization, hence the mention in my post of libraries such as MemoryPack, FlatBuffers, or simply, a BitWriter.


3. It’s 2024, but assuming one is not trying to reinvent the wheel and build a UDP protocol from the ground up, UDP is still the way to go — since I do not care about packet loss. Speed is a priority over reliability. I’m working with MonoGame (C#), not Unity.

4. Finally, regarding the ‘reliable‘ data I will be sending. since it’s a procedural map, the server is responsible for sending said map (tile map) to clients after the initial connection. If this were hypothetically sent as uncompressed JSON strings it would be around 30kb per packet over 3 or 4 packets. Snapshots are only for game state and are sent unreliably. In retrospect this point could have been omitted from my post. I mentioned it only to drive the point home that a one size fits both (map data and delta snapshots) solution for serialization would be the preference where possible.

I hope that adds some clarity. For the sake of a discussion focused on snapshot delta implementation specifically as opposed to abstract design, or the why, Its more productive if we assume that snapshot delta is the system that matches the brief. I believe the four potential approaches I mentioned somewhat line up with my needs, and are somewhat viable, and so the intent of the post was really to get some feedback from someone more knowledgeable in this area — especially where there may be some idiomatic way of approaching this that I’m not seeing.

Okay, I've used that before on a couple professional projects.

I'd start with #4 first then, and by extension #3. Your library already supports this, so likely you need to learn how to use it. Just because you're using UDP doesn't mean messages need to be unreliable. In fact, for undergraduate network programming courses it's a fairly common student exercise to implement a reliable channel over a lossy, unreliable virtual network. There is zero reason to ever have an unreliable message if you don't want it.

Your library calls sets a DeliveryMethod flag: Unreliable, ReliableUnordered, Sequenced, ReliableOrdered, and ReliableSequenced. You can use ReliableOrdered to effectively turn the channel into a perfect sequential I/O stream. That means any discussion about both unreliable and even the use of the letters “UDP” become moot: You can go as deep as you want discussing it but it is ultimately meaningless in practice. Just use your library properly and you've got fully reliable, sequential communications.

Next, you write you're sending under 1400 bytes 20 times per second, 28 kBps max, up to 5 players so 140 kBps for all players. Why are you bothering to do anything with it? You could serialize the entire state without any delta encoding or compression, mark it for ReliableOrdered, and that is still below most FPS games that are often around 30-50kBps these days.

If that 28 kBps limit is too big, I'd run your NetSerializer data through the .net compression library in System.IO.Compression. From a few seconds on Google, those are:

    MemoryStream output = new MemoryStream();
    using (DeflateStream dstream = new DeflateStream(output, CompressionLevel.Optimal))
    {
        dstream.Write(data, 0, data.Length);
    }
    return output.ToArray();

and

    MemoryStream input = new MemoryStream(data);
    MemoryStream output = new MemoryStream();
    using (DeflateStream dstream = new DeflateStream(input, CompressionMode.Decompress))
    {
        dstream.CopyTo(output);
    }
    return output.ToArray();

While the details will depend on your actual data, for reasonably large data streams it's often around 20% or better, which would bring you below 6kBps without doing any significant work.

You also mention JSON a few times, don't use it. It was designed for a completely different purpose, and it artificially inflates your communications size by orders of magnitude. There is no reason to convert it into text. Just use a NetDataReader and NetDataWriter to work with byte arrays for your data.

Thanks for the feedback, much appreciated.

Re delta compression: Am I on the right track in your opinion by focusing on executing a binary diff between two serialized snapshots (see FossilDelta lib referenced in my initial post), or, is there a more appropriate or common way of actually calculating the snapshot delta? I’m a little hung up on this point as it seems to be a little difficult finding information on how this is commonly done in modern games in modern languages such as C# without turning to runtime reflection.

I think you have difficulty finding it because it is relatively rare in games.

Yes, differences are sent, but it is usually less about synchronized world state and more about the degree of staleness and relative differences. Different types of data have different needs: some are high priority and others lower priority. Some must be exact like score values and others may be quantized like world positions rounded to centimeters to save bits.

The flow of data in a shooter is different from the flow in a brawler and both are different from a JRPG or a board game. Large worlds require significant effort around interest management and location-specific updates that doesn't exist in the world of a brawler level, for example.

Keeping a synchronized world state like you described is a better fit for board games, anything turn based, and the data tables in JRPG style games or old-school tile-based real-time strategy games, and less appropriate for an action game. Action games are usually about sending events and the scale of events, with rare periodic updates for the few things that were modified outside of the event-driven changes.

Advertisement

Thanks again for the reply this is a bit of a delayed follow up.

I perhaps have not been clear enough, but the approach I’m taking is quite common, i.e. Quake 3 state snapshots. I’m not syncing the whole world state, just a diff of the state for every connected client based on what they see, e.g. send all updates to client A for entities within a 5 tile radius.

Re: original question of how best to compute and apply the delta, I tested delta size based on a few approaches:

Given two snapshots A and B, both containing a Dictionary<Entity, List<Components>>:

1. Serialize the whole dictionary via BitWriter and diff the two serialized snapshot byte[] outputs via XOR, compress the output with LZ4

2. As above but compute diff with Fossil algorithm.

3. Serialize components Individually first, iterating over and XOR’ing each. Write each component byte[] for an entity as well as the entity ID to a delta byte[] only if there is a diff

What I learned:

1. XOR performs well producing a small output when RLE or LZ4 compressed but once the collections change, e.g. add/remove entity or entity’s component, the delta can end up as large as the original (after LZ4 compression) snapshot. This is a real problem.
2. Taking the collections out of the equation and XOR’ing the individual components (test #3) didnt help with the above issue. This was a surprise since I figured the initial issue was the effect of XOR’ing entity/component collections defeating the delta — given a single component add/remove would offset bytes in the byte[]. This was also a pain to implement vs. just XOR’ing the whole snapshot byte[]’s.

3. Alongside the above issues, as per my original concern, to create or apply the delta via XOR both client and server snapshot ring buffers would need to hold both a byte[] as well as the deserialized snapshot for the purpose of interpolation on the client — likely use cases exist for peaking at component data in any given snapshot on the server too. This seems like a big memory concern (doubling the ring buffer size).

Unless there is a better way of handling this via the XOR/binary diff approach, the only solution I can think of now is to have component SerializeDelta and DeserializeDelta methods that take the same component type as a parameter and only serialize changed fields (1 for changed, 0 for unchanged).

Honestly, I do find the XOR method to be much less of a hassle and allows me to plug in any serializer if I wasn’t bothered about quantization and bit packing — or the solution provided it) — much cleaner, but the above memory concern is significant.

Anyway, I may post this separately but a related question:

Edit: Posted here https://www.gamedev.net/forums/topic/717066-should-an/

understand the delta should only contain changes — makes sense.

However, if a snapshot delta no longer contains some component x, how does the client know if it’s because there is no state change or because the server removed the component? Same for entities, an entity may just not have any state change over the last few ticks, or, it may have been destroyed on the server, perhaps it was killed but we missed that state update.

Now, of course I could use a separate command system to relay key entity or component events e.g. entity x destroyed/spawned, etc., but that seems like overkill if it could be handled implicitly via state updates. I could also send all (in view) entity ID’s and component types in any given snapshot delta even if the corresponding components are zero'd out (no change).

Examples:

P2 -- standing in view of P1 -- goes AFK for a few seconds and so the last x snapshots for P1 didn't contain the P2 entity. P1 doesn't know if P2 is dead and so it’s entity needs destroying, or it's actually just doing nothing. The reverse is also true, I.e. P2 could have been killed and the server destroyed that entity, but P1 missed the state update that indicated the death.


Karatakos said:
I perhaps have not been clear enough, but the approach I’m taking is quite common, i.e. Quake 3 state snapshots.

The approach WAS quite common in the late 90's 25 years ago, yes. It was also an era when 4 players was hard and 16 players was extreme and AAA, and a “high bandwidth” connection was 250Kbps.

While still sometimes used, it largely fell out of favor for action and shooter games about 20 years ago, and engines like UE3 followed a different approach.

The model changed about that point in the early 2000's, no longer were games running in lockstep simulations, but every game client simulating a subset of the world and periodically syncing up the minimal content they needed. Machine memory ballooned from 64MB machines to 512MB, even 1GB of memory. Dual core and quad core machines appeared. Dialup vanished and "broadband" became typical (though consumers still don't understand the word), and games adapted to leverage the new technology. The size of game worlds EXPLODED, not just a few players and their weapons but full open worlds.

Karatakos said:
Given two snapshots A and B, both containing a Dictionary>:

How big? Is it large enough to matter?

We're no longer living in 1999 where 250Kbps DSL was pretty common, 1.5Mbps was growing in popularity and common cost, and machines were commonly 16MB or 32MB of main memory.

All this talk about compressing and encoding is meaningless unless you also talk about actual sizes.

Many gamers are at 300Mbps bandwidth, some are at 1 gigabit, 2 gigabit, and services like Google Fiber offer 8 Gbps to home users in some cities.

Consumer games these days are often in the 30-50kBps / 500kbps range, some at 1Mbps, and that's often pushing the limit of meaningful data the game has to send, not the player's bandwidth. Game studios don't need to hire experts to minimize the number of bits traveling the wire any more, most games can send all the data they want and more besides.

This topic is closed to new replies.

Advertisement