Dynamically discovering and transmitting the format of your data is likely always going to require more representation than an implementation that takes advantage of static knowledge. That's the price you pay for dynamicism.
However, the good thing about dynamic solutions is that they adapt to do the best they can with whatever input data is available; this means that a dynamic algorithm is likely to out-perform a poorly tuned static algorithm, and it will likely do a good job on new data without needing re-tuning.
When it comes to specific representation, how about a byte stream of edit commands? The decoder would work something like:
while( data ) { b = nextbyte; if( b < 0 ) { copy 1-b bytes from source increment pointers by 1-b } else { copy b+2 bytes from the next data in the packet increment pointers by b+2 }}
This allows you to send as much data as you want, where the overhead is, worst case, one byte for every 129 bytes of state, and the best possible savings is sending only a single byte per 129 bytes of state. The adjustment-by-2 comes from the fact that stopping an emitted data stream just to copy a single byte is actually a loss; you only want to copy chunks greater than one byte (or, depending, two bytes).