Advertisement

Creating and Using Custom Binary File Formats in Java

Started by March 14, 2022 02:27 PM
12 comments, last by frob 2 years, 7 months ago

Foreword:

I understand that this is a very popular topic and there are several forums online that try to detail what one would do to create custom binaries for game data. However, I have come to the conclusion that most of the answers are out of context or misleading. So please try to provide answers that are intuitive and lead to an end result rather than answers that simply explain something without a followable example. Also, please try to refrain from making suggestions to other solutions, lets try to stick with custom binary file formats and keep things straight forward :D

Description:

I as a game programmer, currently is working on version 2 of a prototype engine that curates knowledge from myself and others with great experience. For this particular version I aim to tackle some of the bad implementations that make my engine inflexible. One such implementation is the misuse of Java's serialization and Serializable. After reading chapter 12 of Effective Java Third Edition by Joshua Bosch, I came to a conclusion that there must be a way for me to properly organize and protect my game resources such that they are easily accessible, difficult to tamper with, easily independently updateable and perhaps efficient. Research has brought me to custom binary file formats which I am interested in for the java platform.

I would like to better understand:

  • How I can define a custom format with Java code as an example
  • How I can convert my data in to a file that uses this format and back with Java
  • What should I watch out for

Boss Fight, Boss Fight, Boss Fight!

properly organize and protect my game resources such that they are easily accessible, difficult to tamper with, easily independently updateable and perhaps efficient

Instead of storing vertices in a sequential order binary buffer, xyzxyzxyzxyzxyzxyz, you could store them as xxxxxxxxxyyyyyyyyyzzzzzzzzz as one step in obfuscating your data. Another easy way is to change the header of a common format to your favorite uint32 magic number. No matter what you do, someone dedicated enough will rip your assets. Also, it adds an extra step for you or your artists.

Since you are asking how to read and write binary data to a file in Java, I recommend using the ubiquitous file formats (PNG, OBJ, MP3, etc.) until you gain more experience.

  • Related: MPQ file format
Advertisement

@Ninja Boss Fight You can use the java.io.FileInputStream and java.io.FileOutputStream classes to read/write binary files.

Exactly what you put into the files depends on what you want the files to do. It's usually a pretty good idea to start with some magic number that lets you quickly screen out files that are written by some other writer, and then some version number, and then some table of contents or sequence of “type, size, data” chunks.

Check out the description of various other file formats, like how RIFF files work, how ZIP files work, and so on, to get a better idea of what typically will be in a file format. Then it's up to you to read and write the appropriate data, using the available file I/O classes.

There's also the question of byte order – you need to marshal your integers, floats, and strings, to byte arrays in a fixed order, before you write those bytes into the file. There are tons of marshaling packages available, so you can pick one that suits your particular needs, or you can roll your own if you're comfortable in low-level byte representations. (Built-in functions like floatToIntBits may also help here.)

enum Bool { True, False, FileNotFound };

To get some idea how things work, I'd suggest you try loading an uncompressed BMP file https://en.wikipedia.org/wiki/BMP_file_format

It's a quite simple format and well documented.

Once you can do that, create such a file and load it into whatever paint program you have. That should give you a lot of understanding about file formats and how to deal with them.

Ninja Boss Fight said:
I came to a conclusion that there must be a way for me to properly organize and protect my game resources such that they are easily accessible, difficult to tamper with, easily independently updateable and perhaps efficient. Research has brought me to custom binary file formats which I am interested in for the java platform.

I would like to better understand:

  • How I can define a custom format with Java code as an example
  • How I can convert my data in to a file that uses this format and back with Java
  • What should I watch out for

So reading that slowly, I see you are looking for 7 items.

1> organize and protect my game resources so that they are easily accessible

Organize in a structure that works well for your game. Exactly how you organize your assets will depend on your game.

You might keep them organized by object, so perhaps a per-creature basis, with textures, sounds, animations, and other resources in a directory per creature. Or you might do it by level, the ice level having all their textures, sounds, animations, and other resources, the fire level having its own directory, the water level having its own directory, and similar.

Many tools have organizational patterns that are common, and well documented within those tools. For example, Unity has frequently-used patterns that keep animations, audio, materials, models, textures, scenes, etc., all in their own trees. Unreal has it's own frequently-used patterns for the content folder, often with maps, characters, environment object, sound, vehicles, weapons, etc., which were popularized from the ShooterGame template.

2> difficult to tamper with

By whom? Anyone running your game will have the ability to view the decoded files eventually. Sooner or later you must send models and textures to the graphics hardware, you must send audio to the audio card, and both of those can be intercepted. You must decode whatever animation data you have, you can never overcome the analog hole.

Some companies in history have invested fortunes in a cat-and-mouse game trying to protect assets. Most people recognize it is futile and pointless beyond a legal requirement, so we stick with basic encryption when required. If your game is popular whatever protections you create will be inadequate. If your game isn't popular nobody will bother.

Most good formats incorporate techniques like checksums or CRCs, and cryptographic hashes. Those detect most data corruption. Simple password protection on the zip file can protect against basic naive tampering in addition to CRCs.

3> easily independently updatable

What's your distribution method?

Steam does it automatically for you, and they do a great job. Just update your files in the directory tree, publish the patch, and they do the hard work.

If you're building installers you're going to need to figure that out before you get too far along. If you don't have fancy tools that can to incremental patches, I recommend compressed archives that can be swapped out wholesale with updates. Unity and Unreal can both help you prepare patched package files, but they take some time to get it right. But you're not using those if you're asking about Java.

4> perhaps efficient

Efficient in what way? For spinny disks and SSD drives it is historically faster to use compressed formats. The time to read from disk required far more time than the decompression time. A few NVMe drives have broken that history, but they take some seriously high speeds to do it.

Compress it, or use a library that compresses it.

Big game engines use compression algorithms optimized for rapid decoding. Smaller projects use zip files.

5> How I can define a custom format with Java code as an example

There are tons of tutorials, like this paid one. You might be familiar enough with the language to skip the first few, but from your questions most of them have useful content.

That chain of examples starts with simple serialization from plain text, to simple text data formats, grows to a directory tree containing XML and PNG images, and grows to showing you how to use the io classes to write your own arbitrary binary files.

You can also use official tutorials like this one about automatically generating the data files. Creating a ZipInputStream (with optional password protection) and using JAXB to automatically process it is straightforward with tons of examples and documentation.

6> How I can convert my data in to a file that uses this format and back with Java

That's the same topic of serialization. It's a huge topic. I recommend AGAINST implementing it yourself unless you have to. Do you want to spend your time making a game, or spending your time rewriting serialization libraries?

Graphics cards take well-established formats, like DDS, ETC, PVRTC, ASTC or similar. Artists can work directly with them in tools like Photoshop, cards use them directly.

At this point it's far easier to use Java's built in serializers. Add a few annotations and you get XML for free. Use the tutorial above and the XML is kept in zip files for free, which has CRC protections built in. You get it all for free, or you can spend months (or years) re-inventing the wheel.

7> What should I watch out for

Be careful that you spend your time doing what is most valuable to your goal. Don't waste your time on things you don't actually need. Is your goal to make a game?

If you are making a game, you DO need the ability to easily serialize data to and from storage. You DO need a way to organize data for distribution. You DO need data formats that can be directly used by your tools and your game.

Fortunately, as mentioned above, those are all provided for your in Java's own serialization libraries. As mentioned, the built-in zip streaming (with optional password protection) and automatic XML serialization with JAXB and annotations are all built in,

But if you are making a game you DO NOT need to write your own encryption, you DO NOT need to write your own obfuscation, you DO NOT need to invent your own graphics formats, you DO NOT need fancy encodings that save a few bytes of data on machines that have terabytes of storage and gigabytes of ram. Those are certainly interesting research topics if your goal is to research them, but they're completely unnecessary if your goal is to make a game.

@euthyphro perhaps I need to rephrase. It should not matter what data I want to store. I want to understand a detailed process in creating a custom binary file format in Java. Its like asking how to drive, doesn't matter what car. There is a process to it. What is the process one would follow to create a format, the details I can figure out later. Makes sense?

Boss Fight, Boss Fight, Boss Fight!

Advertisement

@frob Obviously, I have my own goals on why I want to know this information. Making a game is just the context. Your answer is great but, it doesn't really provide me with the information i need. I just need to know the process to creating a binary format

Boss Fight, Boss Fight, Boss Fight!

@alberth this does not really show me how to make custom binary formats. This just seems like file I/O for existing formats

Boss Fight, Boss Fight, Boss Fight!

Ninja Boss Fight said:
I want to understand a detailed process in creating a custom binary file format in Java.

I described exactly how to do that, except I didn't, like, write the code. That's up to you!

I again want to repeat that it's a really good idea to read the file format specifications for a variety of file formats – google “TGA file format” and “WAV file format” and “ZIP file format” and, if you're brave, “ELF file format” and “MP4 file format,” and then write some kind of decoder program that opens the file and prints information about it, without necessarily decoding whatever the payload is on the inside, to get a feel for how the pieces go together.

enum Bool { True, False, FileNotFound };

Ninja Boss Fight said:
this does not really show me how to make custom binary formats. This just seems like file I/O for existing formats

It doesn't indeed, but it brings you into close contact with binary data formats and using them to read/write real binary data (in this case a file). Reading some format descriptions and translating them in working code gives a lot of hands-on knowledge about how to work with these data formats, what you can do, what things you can express. It shows you what a description contains and how description relates to the real bits in the file. It also gives you experience in “how to do it in java”. Last but not least it also gives an idea of the kind of things you can express in a data stream, how you typically organize that in a file, and how it can be used.

I believe you cannot write smart data formats if you never wrote a few encoders and decoders yourself against raw sequential byte access. You need that experience to judge whether a format that you design makes sense (can be built and will work).

In words the process to write a data format is really simple. You take whatever data you have, and you figure out a way to squeeze all its information into sequence of bytes (or bits if you prefer), such that you can at a later point in time extract all the information from the sequence again.

Reproduction of the data is the easiest case. However, good data formats allow you to extract just the information you need without reading the entire file. You really don't want to read the 5TB database file to get one number from it. They may allow for different storage options, so you can store for minimal size or for fast access, etc.

In all cases, you need practical knowledge how to convert integers, floats, strings, arrays, pixels, tables, sets, lists, trees, and whatnot into a sequence of bytes in various ways, and what the trade-offs are. You also need some knowledge of how you can design it so you can jump around in a file and just grab the bootstrap code from a disk or get that one integer from the 5TB data. Existing file formats and some practical experience gives you the handles to visualize how a programmer can implement an encoder or decoder for the format. If you don't know, chances are you invent a format that looks good on paper but it difficult or even impossible to implement.

This topic is closed to new replies.

Advertisement