Advertisement

Data serialization and datagram sizes

Started by March 26, 2016 05:40 PM
8 comments, last by hplus0603 8 years, 7 months ago

Hello. I am new to multiplayer networking and I have two basic questions.

First a brief synopsis:

I have created a series of multiplayer game components in C#. They are

- Generic TCP/UDP client

- Generic TCP/UDP server

- Game server, matchmaking server, chat server

To get through my prototyping, I used JSON serialization with easy success. But I feel uneasy about JSON's ability to keep networking messages small. For example, I am developing a sports game with Unity to be a prototype for the aforementioned products. This game can be up to 8 players, but there's not too much data that needs to be sent. In reality just the two vector3s for each player (position and trajectory) and one for the game object (ball). With JSON and a modest approach, I think I can keep the size of each message to around 1kb. But I have no idea if this is a lot of data, or nothing at all.

The clients would send much less than that to the server, at least twice per second.

The server would send this ~1kb message to each client, at least twice per second (so 16kb/s from the server for 8 players).

Is this a lot?

If it is, I was interested in using Google's ProtBuffs, but it looks like using that with Unity is a bad idea. Are there any other techniques, or would I be OK with JSON? Or should I write my own? The other issue with JSON is that it's a lot of string parsing for the server (imagine 1000s of clients).

If you keep the size of the packet to 1 KB and the send rate to 2 packets per second, that's not so bad.
If you want low-latency reaction time for actions the user takes, though, you will find that 20-30 packets per second is more common.
And 30 packets per second, times 1 KB, to each of 8 players, means 240 KB/second of upstream data from your server for each game instance.
That starts feeling like a lot, although exactly how much "a lot" that is depends on your specific situation.

That being said: JSON repeats the key names for each structure, which is generally totally unnecessary, and uses text serialization, which is in general 2x the size of binary serialization.
If you know that there are 8 players and 1 ball, and you need six floating point values for each, then the theoretical packet size is (4*6*(8+1)) == 216 bytes.
If you then start playing tricks with quantizing resolution and knowing the maximum range of each of the values, you can probably squeeze that into < 150 bytes.
enum Bool { True, False, FileNotFound };
Advertisement
If you're data is that small and uncomplicated, I don't really see the benefit of JSON or some other complex custom protocol.

Secondly, 16 kb/s is tiny. The target bandwidth for XONE TCR is I believe, 256Kb/s or there about (used to be 64Kb/s on the old 360).

Thirdly, don't forget to take into account the packet overhead, which is quite chunky around (48 bytes for UDP/IP) if you do fast updates, or P2P (sending to multiple targets).

The MTU (maximum transmission unit, or basically, how big your packets can be before being fragmented or dropped) is typically 1420 bytes. 1KB packet target is fine.

And finally, you can compress (quantize) your vectors to optimise bandwidth usage, but that's only worth considering if it is actually a problem.

This is from a C/C++ perspective, I'm not sure what C# can do to help you with serialisation / reflection and all that kind of jazz.

Everything is better with Metal.

If you keep the size of the packet to 1 KB and the send rate to 2 packets per second, that's not so bad.
If you want low-latency reaction time for actions the user takes, though, you will find that 20-30 packets per second is more common.
And 30 packets per second, times 1 KB, to each of 8 players, means 240 KB/second of upstream data from your server for each game instance.
That starts feeling like a lot, although exactly how much "a lot" that is depends on your specific situation.

That being said: JSON repeats the key names for each structure, which is generally totally unnecessary, and uses text serialization, which is in general 2x the size of binary serialization.
If you know that there are 8 players and 1 ball, and you need six floating point values for each, then the theoretical packet size is (4*6*(8+1)) == 216 bytes.
If you then start playing tricks with quantizing resolution and knowing the maximum range of each of the values, you can probably squeeze that into < 150 bytes.

Thanks. How can I learn more about this? What should I research?

Thanks. How can I learn more about this? What should I research?

About what part?

The first part is direct math. The numbers have changed over time but the math is the same.

Figure out the target speed of the connection. If people are on modems from the 1980s and 90s, perhaps a 1200 bps or 2400 bps modem, or roughly 120 or 240 characters per second accounting for internal communication controls. If they're on 56Kbps dialup, that's about 5000 characters per second. Or you may want to plan on 1Mbps or 10Mbps or 50Mbps or faster.

If you've got a 1KB block (or 10 kilobits) to communicate to 8 players that gives 80 kilobits to send. If the player is expected to be on a 2400 bps modem, that's going to take over 30 seconds to send. If they're expected to be on a 56Kbps modem, that's about one and a half seconds. If they're expected to be on a 1Mbps connection, a tenth of a second. On a 50Mbps, about two milliseconds.

Are those speeds acceptable? It all depends on your details. Something played over a 3G mobile phone is going to have different throughput than a cable modem or fiber connection. A chess game has different acceptable communication rates than a first person shooter.


The second part, about the content of JSON format, can be seen through direct observation.

Storing textual versions of names is long, and some formats have clear redundancy. Opening and closing tags, spaces between blocks, quotations, duplicate names, many times there are ways to eliminate that.

For using shorter representations, that is something that takes knowing your data.

You can see a string like "1758483662" requires ten bytes even though it is a 31-bit number, so you may use 4 bytes on the value instead of 10. The notation "3.14159e0" 9 bytes, but the 32-bit number could be represented with 32 bits.

If you know your number is a 16 bit number you can use 2 bytes rather than 4. If it is an 8 bit number you can use a single byte. If a variable will only have values 0-3 you might design your system to send just those two bits. Since most network systems are built around 8-bit bytes this means you need to pack as many of these smaller values together, but if you need to reduce space this is a method.

I have nothing to add since other people explained very well, but I want to suggest reading this tutorial, wich I found to be useful: http://gafferongames.com/game-physics/networked-physics/

Hey, have a look at my Telegram channel about programming: www.telegram.me/theprogrammingart
Advertisement

Thanks to both of you.

This is a great site Matth posted: http://gafferongames.com/networking-for-game-programmers/

I'm still having a difficult time trying to decide how to serialize my data. I cannot find any information on how to write a custom serializer. I would rather even leave it to the experts, but because I will be dealing with custom types (Unity Vector3, for example) I have to have something that is from scratch or easily customizable.

I can't think even how I would send a message that says

2 | 15 | 10 | 12 | 4

----------------------------------------------------------------------

data type | length of data | prop1 | prop2 | prop3

I mean I would just send a text message of 21510124 but I think that defeats the purpose. I have dealt with these messages before, having to byte shift around to extract data, but I had documentation and I only need to extract, not build.

Typically, you serialise your data into a 'stream', writing into a binary container (a memory buffer), and you can do either bit stream (everything is compressed and indexed through bit position, e.g. adding a bool will require a single bit), or you can do it through bytes streams (everything is aligned to a byte. Adding a bool will require a whole byte).

It's similar to a i/o stream if you will. Like when you do cout << position.x << position.y << position.z; Note that the stream can be either in binary form, or text form, which can be useful, for example debugging (text), or smaller size (binary).

so you'd do stream << data_type << length_of_data << prop1 << prop2 << prop3.

then send(stream.getByteBuffer().getData(), stream.getByteBuffer().getSize());

and do the reverse process on the receiving end.

So basically, you could use the standard c# stream interface, then grab the buffer, and send that buffer as a raw packet, not worrying about endianness and that kind of stuff. Your protocol may require more thoughts, but that's the general idea.

For starters, I also wouldn't worry about optimum packing efficiency, endianness, ect... Just keep it simple.

If you really want something flexible, use a serialisation / marshalling library. But writing your own custom, 'flexible' protocol will take a long time (it's one of those 'reinvinting the wheel' type of dilema). And using an external library usually requires a lot of setup time. But you can whip up something simple (like the stream example above), that can be very quick to do.

Everything is better with Metal.

packet overhead, which is quite chunky around (48 bytes for UDP/IP)


The packet header for IP is 20 bytes; the packet header for UDP on top of that is 8 bytes. For TCP, it's 20 bytes on top of IP.
IP options and TCP options may add some bytes, depending on situation.

I'm still having a difficult time trying to decide how to serialize my data.


Think of it as a file on disk. You're writing the data to a "file" (data packet.)
You need to write enough data that you can read it back in again and understand it, but no more.
Serializing is nothing more than generating bytes into a byte array, as a way of "writing," plus some set of rules for how you do that so the other end knows how to undo it.

Serialization formats are typically classified along three axes:
- how much metadata is included
- how is data itself encoded
- what is the external state acting on the data

So, for example, JSON encodes the full name of each key, each time you add it. It also encodes the type of the data, because strings start with ", objects start with {, etc. It encodes the actual data values as UTF8 text.
This is highly robust and easy to reconstruct, but also highly redundant and uses lots of space.
An encoding like ASN.1 BER (a k a "asinine 1") encodes, per piece of data, the data type, and the data itself, each as a binary value. This is less metadata, and a more compact encoding, and thus takes less space.
I think it still wastes space, because the recipient needs to know in what order the data elements are encoded, in order to decode them (only the type, not the "name" of each field is included,) and thus the recipient might also know the type so that doesn't have to be encoded.
Google Protocol Buffers encodes both type, and "name" of each field, yet uses bit packing to make the encoded data no bigger than ASN.1. It's a pretty good encoding for robust interchange, where the protocol versions may be different at both ends.
Straight up binary data serialization is most often used by games. Copy the fields you need, at the precision you need, into a byte stream. Do the reverse on the other end. You know the order of fields, and the data type of fields already, so you don't have to encode those. A 32-bit integer is written as 4 bytes.
Finally, the semantic layer can help you compress. For example, the "Quake III delta compression" mechanism knows what the last acknowledged state of an entity is from the other end, and only sends properties that have changed since then, and a bitmask up front for which properties are actually included in the message.

What's right for you? Depends. Learning about these things happens by doing them, many times, in many contexts. You're at the beginning of an exciting road :-)
enum Bool { True, False, FileNotFound };

This topic is closed to new replies.

Advertisement