Advertisement

Data compression/optimization strategies

Started by August 25, 2013 03:38 AM
8 comments, last by wodinoneeye 11 years, 5 months ago

For games where data throughput and size is an issue, what do most people use for compression? Is it generally worthwhile using different strategies for say floating point data versus other stuff, or do most just compress everything with something like zlib and call it a day?

The specific case I have is an mmo client/server. I'm working on the scale of say something like Guild wars II or Eve online as far as requirements. Right now I'm able to track several thousand objects on the server side and do efficient neighbor queries on them, it's bandwidth that's becoming an issue. Sending the coordinates of 1000 players to the client is a lot of data, and that's just movement, I haven't even gotten into combat and other stuff that will bump it up even more.

So far this is what I'm doing to optimize location tracking:

- Use integers instead of floats. I don't generally need the precision for x,y movement coords.

- Send local grid coordinates not world coordinates.

- base 62 encoding of string id's

- don't send data if it hasn't changed since last update

- partial coordinates. Only send coordinate values that have changed by a certain threshold. So a client might get a vector with x being empty, which means use the last known value.

If anyone has ideas on best compression algorithms, or other tricks to keep bandwidth usage down, let me know!

Chris

- Use integers instead of floats. I don't generally need the precision for x,y movement coords.

Both are 32 bits, unless there is something you are not telling us.

- Send local grid coordinates not world coordinates.

How is that smaller?

- base 62 encoding of string id's

Why base 62? Your goal is to reduce the number of bits sent, that would increase the number.

- don't send data if it hasn't changed since last update

Excellent.

- partial coordinates. Only send coordinate values that have changed by a certain threshold. So a client might get a vector with x being empty, which means use the last known value.

Excellent.

If anyone has ideas on best compression algorithms, or other tricks to keep bandwidth usage down, let me know!

Only two of those items above actually reduces the number of bits. Do more things that reduce the number of bits.

When you encode and decode your larger packets you should generally have some form of bit packer. If there are only 3 options don't add a full int (32 bits) to the message, you can represent it with just 2 instead of 32. If your integer has a small numeric range, reduce the number of bits to those that contain the range. If your number can be reasonably represented by a 16-bit float, use that instead of a 32-bit float. Delta-encoding is another way to reduce the number of bits; if you know something can only have moved up to /n/ distance in the most extreme case, subtract the new value and store only the necessary log2(n) bits rather than 32. Repeat the process of stripping out unused bits for each of the messages that you need to send.

If your messages are STILL too large, use a compression algorithm. For TCP you can use a stream-based general purpose compression algorithm, for UDP you might be able to use a stream algorithm, but you sadly might need to compress each packet individually. Stream-based may require take more data to go through but can give better overall results. Message-based compression is hard because the algorithms only have a few bytes to encode, and hopefully they are high-entropy due to the encoding already mentioned so the compressor will struggle to improve it.
Advertisement

Ya should have said I'm using bit packing already via protocol buffers, which has variable integers.

Base 62 is the smallest encoding you can get for upper and lower case alphanumeric.

Local grid coordinates means the current grid within the spatial partitioning scheme I use. So a local grid might just be 100 x 100, where the world grid is in the thousands. You can use less data by sending the local grid coordinate plus the grid number. The calculation to get the world coordinate from that is fairly cheap.

I'm using udp. I didn't even stop to think that only being able to compress a packet at a time would hurt, but ya that makes sense. Seems like the best win is to just stick with application specific ways to keep size down.

Chris

Base 62 is the smallest encoding you can get for upper and lower case alphanumeric.

Ah. Usually encoding like that, such as base-64 encoding or uuencoding, converts binary data into values that can fit into plain text newsgroups and other textual places. It encodes six bits as an ascii value, expanding it to 8 bits in length. So it gives an instant 1/3 growth in size, plus a tiny bit of overhead for header, footer, and padding at the end of the message.

I've never heard text-to-binary conversion called 'base62', probably because it is a new thing and because Base X already has a defined meaning different from that. You apparently are going the other way; accepting only 62 ascii values and turning them into six bits --- with two extra values left over for your own nefarious purposes. :-)

But yes, if you are looking for size the best bet is to use your own knowledge of the data to remove all unnecessary bits. If your data is still large a more traditional encoder might work for you, or might not if your data is small or if it has high entropy.

The most compact data is the data that you do not send.

Work very hard on not sending data.

Why send position every update when you can send position when starting out, and then just send player (or object) inputs, and run the simulation on the client, for example?

When it comes to "string ids," presumably those are known in advance, and can be sent as a small index into a look-up table.

If they are not, then you can build a dynamic string table (per connection.)

When sending a string, first look it up in a hash table. If there is a "previous ID" for that string, then send only that ID. Else, allocate a new ID, and send a command saying "here's a string and here's the ID I will use when referring to it again."

Strings that contain ever-changing data (like object IDs or timestamps or coordinates) are a bad match for this pattern, though.

enum Bool { True, False, FileNotFound };

I would recommend utilizing a static huffman compression to further reduce the size of your packets, probably about as much as they can be after higher-level protocol optimizations (eg: "not sending data" is definitely crucial).

Try to allow the client to make as many assumptions as possible with as little data coming from the server as possible to update it. Objects on the periphery of the player's perspective of the game world don't need updates as quickly, or with as much precision, as those closer to the player.

[ deftware.org ]

Advertisement

Thanks a lot guys some great ideas here.

I'm fairly new to this particular problem, although I think it's critical to solve well.

What I do know is that I've never seen a game handle this problem 'well' from a player perspective. Eve online literally just slows the game clock at a certain point. Guild wars II basically stops sending anything other then location updates and auto attacks with just a hundred or so players in visible range.

Chris

The most compact data is the data that you do not send.

Work very hard on not sending data.

This, this, a thousand times this.

The art of multiplayer game development is knowing what not to send, and how often to not send it. That is where the craftsmanship comes in.

Connecting machines together with well defined APIs is not a difficult task. Basic housekeeping tasks like matching up network ports and game identity are not hard things to master.

There are two arts to master. One is how to hide or design around latency (since if you have a work-around for the speed of light, you have bigger fish to fry). The other is how to maximize the efficient use of bandwidth. The former is fundamentally a design issue (although technological mistakes can make it worse). The latter is fundamentally an engineering issue (although design mistakes can make it worse).

If you aren't making sure your networking and game architecture makes it easy for the network developers to easily route traffic such that nothing unnecessary hits the wire, all of your bit packing efforts are fundamentally just optimizing a bubble sort.

Ya should have said I'm using bit packing already via protocol buffers, which has variable integers.

Base 62 is the smallest encoding you can get for upper and lower case alphanumeric.

Local grid coordinates means the current grid within the spatial partitioning scheme I use. So a local grid might just be 100 x 100, where the world grid is in the thousands. You can use less data by sending the local grid coordinate plus the grid number. The calculation to get the world coordinate from that is fairly cheap.

I'm using udp. I didn't even stop to think that only being able to compress a packet at a time would hurt, but ya that makes sense. Seems like the best win is to just stick with application specific ways to keep size down.

Chris

Thats UDP with a guaranteed delivery scheme built into it (otherwise delta compression becomes problematic)

A fundamental for the base issue (sending least possible) is also efficiently handling the overhead in your packet driver.

If you can group packets into known counts/blocks preassembling them before sending (to make it more like a file transfer) ACK processing can be optimized.

Checking ACKs in a pass for all the packets in the 'file' and resending any missing ones can simplify the number of ACKs sent (even over other windowing schemes) and overhead in the individual packets (some).

---

The change threshold update you may have to periodicly send a syncing update to correct minor changes (which still might be significant client visibility-wise)

which then leads into a round-robin scheme to space those out timewise.

---

Fixed value floats (like incremental object facings where 256 are enough and a BYTE value is decoded by a table lookup on the other end when its used or translated to populate the client local data)

---

Classifucation of objects as near vs far, significant vs trivial and having differentiating update schedules - the more important ones get updated much more frequently (basically a Level of Detail scheme possibly with many more than 2 degrees)

Part of that may include App level throttling if the network does start bottlenecking that the most critical data for the player tries to get thru if possible.

---

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

This topic is closed to new replies.

Advertisement