Advertisement

Optimizing multiple field encoding

Started by July 18, 2018 10:33 PM
6 comments, last by lawnjelly 6 years, 4 months ago

So I'm working in C# and normally I use protocol buffers and heavy usage of varints, but the best existing library boxes value types.  So for a handful of our high traffic messages I created a simple code generation based encoder.  To signify which fields to include in the stream I use a bit per field.  Then encode those bits as varints and they go in the header.  I use 32 bit chunks for this (BitVector32 as it just makes bit twiddling easier).

An obvious more efficient approach is to just create more granular messaging so you don't have a lot of fields with default values, or leverage other things that are context specific.  I'm trying to find the perfect middle ground.  It seems that there should be a way to use less then a bit per field.  Like some type of mask that takes less, maybe some type of lookup table, even if it does use a bit of memory.

You could write field indices / IDs, followed by values. When there's a huge number of fields but a small number of changes, this will be much smaller. However, when there's a large number of changes, or very few fields, it will be larger. So you could use a single bit in the header to specify which of the two strategies (bit per field, or field indices) is being used. 

Advertisement

You could also recursively encode which bits in the flag field are used. Send four bits, for whether bits 0-7, 8-15, 16-23, and 24-31 have any data. Then send those particular bit fields.

But, more importantly, is this a problem worth solving? How big are your updates? How many of them do you pack into each packet? How many packets do you send per second? If you could send 0 bits instead of 32 bits, saving 4 bytes, would that actually matter at all?

enum Bool { True, False, FileNotFound };
6 hours ago, snacktime said:

To signify which fields to include in the stream I use a bit per field.

Protobuf lets you completely omit fields if you don't want to deserialize them.  Do you mean something else?  Are you trying to eliminate the field+wiretype bytes completely?  Did I misunderstand and you're NOT using protobuf anymore?

If you have regular patterns that you're sending, the ultimate technique is memoization of the entire packet.

You are asking us about compression of some data, without telling us specifically what the data is (or at least some kind of idea). There are 2 aspects to this:

  1. Encoding what fields are in the stream (packet?)
  2. Encoding the data within the field

By far the biggest 'win' is usually from (2), and given that you are already using bitfields for (1), the question arises what is the data and how are you compressing it?

The best way of compressing something is tightly linked with what that data is, and what it statistically is most likely to be.

Ya I should have said the reason to skip fields is default values.

It's really more of an experiment.  Since I had to revisit this code anyways it seemed like a good time to re evaluate optimization of the value per field approach.  We use the protobuf packed approach for high frequency stuff.  Flatten the data into array per type.

In our case we are pretty much always on the edge, so even less frequent messaging I'll go further to optimize then most games, although there are limits.  High frequency messaging a couple bytes is a big deal.  Lower frequency stuff some of it in burst it can be a big deal.  So if I can find a good way to keep fairly normalized messaging and have it almost as good as packed, that would be ideal.

Advertisement

It sounds like you are already using some fairly sensible compression methods. If you want to squeeze out the last bits this will totally depend on the data, and it will be very hard for anyone to do anything but throw random compression methods at you, without knowing what that data is. We don't even know what type of game it is. You haven't even confirmed whether this is for packets over the interwebs.

I suspect at this stage you may be better off concentrating your effort on optimizing when and what you do send, and interpolating what you don't have, rather than trying to compress the data more. This could offer orders of magnitude improvements, whereas from what you have said you may only see small improvements by working on the compression itself.

This topic is closed to new replies.

Advertisement