Advertisement

How do I use zlib to compress/decompress a char array?

Started by January 04, 2022 11:49 PM
6 comments, last by SkipD 2 years, 11 months ago

I've finished a very large project to make stories/some types of games with, but I want to add compression…..
My software has a Data folder where all the stuff you use in projects is placed, and you run a tool to

pack everything into a single file.

it would be so much better if all the files were written to this file compressed, and then later decompressed as loaded so the resulting packed file could be smaller

for use……..but I just can't understand how zlib works….

I've seen examples of things but its usually operations on single files or vectors. i just want a single char buffer

i can compress, add to my file, then load from my file and decompress.

when i did find stuff that did this, the size of the compressed buffer is larger than the source. that wouldn't work for

saving those buffers inside the equiv of a zip file, would it?

I don't think i can understand it properly? can someone please show me an example of how to do this?

Thanks!

First, you're going to be better off in the long run designing your system to operate on streams as opposed to requiring the whole file to be loaded into memory first.

Second, it's important to know that the DEFLATE format produced by zlib is different than .zip file format (though .zip uses DEFLATE compression). zlib doesn't handle packing multiple files together into a file system, only the compression part.

To decode using zlib, you need to have an input buffer (e.g. 32 KB) that you fill with the compressed data stream. This buffer gets refilled from the compressed data whenever it becomes empty. Then, when you want to read some decompressed data, you pass the current position in the input buffer into the avail_in and next_in members of z_stream, then call inflate(). The z_stream should also be populated with a pointer to an output buffer in the avail_out and next_out members. Once you call inflate(), there will be avail_out bytes available (avail_out may be less than what you passed in). Then you can just copy the decompressed data block elsewhere. This continues until you run out of compressed input data.

To encode using zlib, it works in a similar way, but rather than calling inflate() whenever the input buffer is empty, you call deflate() when the input buffer is full of uncompressed data. After calling deflate(), the compressed data will be located in the output buffer starting at the avail_out position.

I recommend reading the header documentation of inflate(), deflate(), and z_stream for further details.

Advertisement

@Aressera Uh oh…..do i have a worse problem? I use al_fopen(), (using allegro 5) read the index of files, sizes, offsets, then i seek to yhe start of the data item i want, and grab the data tuen close the file…..maybe thered be a problem if the packfile is very large?? How would i use streams for that?? I almost feel compression is now just a luxury…? As if i should find out this streaming first then consider compression?

C++ file streaming i looked up such as fstream doesnt seem to have the functionality i want, like seeking and stuff??

Edit: I forgot to mention i am not using zip or any archive; its my own encrypted archive design

I also see pc games using gigabytes of ram so im guessing it wouldnt be big deal anyway?

Oops……man am I dumb! Usimg compression on PNG, OGG those kinda files would be useless anyway!

Wow was i confused today. Ill keep my file access the same too; im not sure its even loading the whole file into memory and even if it does its not worse than how much memory big PC games use today by far

Yes, the approach you are using is not especially ideal.

Games use compression all the time, but you must recognize the tradeoffs you are making.

The ideal is when the compressed object is used directly. Graphics formats like ASTC and DDS are good examples, they are compressed in a format that can be directly used in graphics cards. They are not as tightly compressed as png or JPEG formats, which have an expensive step to decompress requiring both processing time and significant memory.

Games also use streaming compression as described above. Resources are quite often loaded sequentially so streaming it in can decompress on the fly. There are many libraries that focus on efficient decoding, paying a higher cost to encode and generating a better data dictionary and better results so decoding takes less memory and less processing.

Random access in files as you describe approaches the worst case for compressed files. Formats can provide options to estimate where blocks begin and end, but it is not a direct location in the data stream. A single byte location may encode three data tokens that happens to expand to twelve bytes, the next 8 bits may happen to expand to two bytes, and the next may encode less than a single byte. It is difficult to seek to a position in the decompressed file without actually decoding the data before it. Block formats are a little easier, you can seek to a block but still need to decode starting at the beginning of the block and going until you reach the desired offset.

Games use gigabytes because it is available to them. While some games waste it, many leverage all they can and will still be able to use more.

Finally, there are libraries out there that try to bridge the gap for you. PhysFS generally works like a file system but abstracts the compression and packaging away. It can do the work, following the efficient patterns and also allow you to access the data with inefficient patterns with relative ease. It still does the work, but does it behind your back.

Text compression is somehow uselsess depending on the size of text you have, unless you have a specific algorithm which is especially for text. A simple one can be to encode the text into UTF8 style result which is smaller than the standard UTF16 encoding used by Windows internally.

A more generic compression algorithm may create larger compressed streams as they try to always give the best result, which may depend on the data but usually is a more random byte distance (the difference between two blocks of binary data) then text. So an image for example has a different “footprint” than a string. This is why GZIP or LZ4 give better results for generic binary data.

I worked on a city building game a few years ago on an indie studio. That game has it's maps compressed with GZIP too and I had to add a bunch of random bytes to it in order to have GZIP work properly. We compressed our raw data of 100MB into less than 10MB just by adding the random bytes.

This said, games usually use a couple of different compression algorithms to solve different problems. Asset packages are then loaded into memory as chunks, usually using memory mapped files and the operation system puts the file then into a couple of memory pages which are swapped in/out as needed. You then get a pointer to the data in memory and can write a wrapper around that pointer which maps it into a stream. The data position and length is usually stored in a dictionary or header which is placed in it's own chunk apart from the data.

We wrote our own packaging tool that does exactly this. It creates a header with the Asset type, size and a location ID. That ID is the chunk number and offset from the beginning of the chunk to the data. The tool then tries to find the best matching of the data so that data isn't split to more than one chunk (except it exceeds one chunk) and fills everything else with trash bytes. A chunk is 64k which is the usual disk cache size, while a memory page is usually 4k so a full chunks is loaded into 8 memory pages. Our tool however does a few more steps to also digitally sign the Asset package

Advertisement

Yeah turns out i was a little confused. Compression wont work on the filetypes I use, and also, while the resource archive is accessed, only the objects i ask to load are loaded as images, sounds, etc.

As this project isnt a game at all, it doesnt use anything but PNG images, OGG/WAV for sound etc.

It was a really dumb idea to think i could compress such things so it'll be fine as is.

Thanks everyone for the help though!

This topic is closed to new replies.

Advertisement