Advertisement

Serialization/Deserialization

Started by January 03, 2018 12:02 AM
7 comments, last by Danpaz 6 years, 10 months ago

I hope this is the right place to put this.  I am not a newbie developer, and I made several text-based games in my late teens, but this is my first attempt at a more modern (yet retro!) game.  I have been doing quite well so far, but now I seem to have hit a wall.  I am using MonoGame, and at this point, I have the following:

A nice custom ECS I am very happy with
A very fast custom QuadTree for broad-phase collision and queries
Some tutorial-driven code for narrow phase collision
Basic animation
Basic keyboard, mouse, and gamepad controls
A custom editor for generating metadata about my assets

Where I've hit a wall is moving the data between my editor and the game.  I've written a couple half-solutions, but neither of them felt quite right.  The first was some manually constructed and read XML.  The second was a reflection-based serializer/deserializer that can handle almost any non-recursive data structure.

I really want to like the reflection-based approach, but having to avoid recursion and references is a real pain.  For performance reasons, I need references stored in my classes, and yet that would prompt my naive serializer to duplicate data all over the place.  Initially, this led me to create a bunch of duplicate "definition" classes that only held ID values and whatnot, until I realized what a terrible idea that was.  If I'm going to manually translate my classes to/from a definition class, I might as well manually translate them to/from XML, after all.  Anyway, how do I reconcile this?  Create something like RefFieldAttribute and incorporate the idea of a reference into my serializer?  Did I just answer my own question?

Regardless of that, how would you handle this?  There will be a lot of configuration data.  The game will be highly procedural requiring a lot of metadata, and I want full support for modding.  I'd also like to re-use a good chunk of the code for save games and such.  Any ideas and/or advice would be greatly appreciated.

You can use json, it is less verbose than XML and has good library support as well. Have class with the state information that you want to serialize and convert the objects to json or get object from json. A quick google search will give you examples and libraries that help you with this. Also take a look at http://json-schema.org/implementations.html if you are interested in creating schema. It also allows for recursive schema definition e.g.

 https://stackoverflow.com/questions/24989786/recursive-json-schema

https://stackoverflow.com/questions/20752716/json-schema-recursive-schema-definition

Advertisement

Thanks, the JSON.NET serializer looks very interesting.  I will definitely check it out as it just might solve some of my problems.

Any higher level, structural, or strategy tips?  I have two applications sharing the same data.  One is the game, and the other is my editor.  The editor has a lot of extra fields and properties to support the UI that simply are not needed in the game, and I was thinking of going classic MVVM with the model belonging to the game and the viewmodel belonging to the editor.  But if there's an easier way, I'd love to hear about it.

 

Edit:

Yes, JSON.NET looks like exactly what I need for the nitty gritty details.  I'm cloning the repo as we speak.  If it does everything it says it does, I should only need to worry about the higher level, structural stuff now.  Thanks a ton for the link and advice.

I don't know if JSON.NET intrinsically can handle object->object references.

I have implemented my own reflection-based serializer before and it handles arbitrary objects, including cyclic object references (what I assume you meant by recursion).

The way it works is fairly simple:  Do two passes.  On the first pass, traverse all reference types and make a unique ID each time you run into a new one.  Skip any you've already seen.  On the second pass, actually serialize data:  the number of reference types, followed by the Type of each reference type (so you can support polymorphism), and finally followed by the values for all of the fields in each object.  When a field is a reference type, serialize the ID (reserve a number like 0 or -1 for null).  When a field is a value type, serialize its entire contents directly.

Deserialization then uses FormatterServices.GetUninitializedObject (or equivalent if you're not using .Net) to create all of the reference types first, then when fields are being deserialized, references to other objects can be assigned immediately.

This supports polymorphism and arbitrary graphs of objects, which as far as I know is everything that can be represented in any data structure.

The underlying data can be binary, JSON, XML, or whatever you want.  You can optimize each of the individual pieces of data as much as you feel like: well-known type IDs instead of names, zigzag integer encoding and field IDs/sizes like protobuf has for versioning compatibility, etc.

The JSON.NET website claims it does support references, and I am testing it out now.  Its settings object allows you to specify a "ReferenceResolver" and a "ReferenceResolverProvider", implying that I might actually have a bit of control over how it's handled.  For me, that's a plus.  That said, my first test was attempting to serialize my editor's main ViewModel, and that resulted in 2 GB memory consumption and a failure to load.  Definitely related to cyclic references, and I need to figure out how to configure around it.

Thank you for your in-depth response.  I had considered multiple passes and id placeholders, and my main concerns were where to put the actual object data as to be modder friendly and how to decorate my classes to make such things clear to the serializer without being too messy for me.  Keeping in mind I want this to be modder friendly, I can't have a giant list of reference type data in the output, linked together with nothing but integer ids or even names.  Some reference type data will need to stay right where it is defined in the hierarchy so modders aren't having to jump around the XML/JSON/YML for every little thing, and other reference types will need a sensible "home" in the output that's easy to find and reason about.  I hope that makes sense.  I also can't base the choice of class or struct on how I want the data to be serialized, which is why I mentioned the possibility of attributes--opt-in serialization referencing at the type and field levels rather than automatic.   It would also give me a chance to provide additional information about how it should be stored.  This would all be much easier if I were not intent on making things easy for modders. 

Anyway, I have not given versioning too much thought, yet.  I started to, but then I realized I was going to overload myself.  Figured I'd start by getting something functional, first.  But I do know I will need it later.

We needed something similar while we wrote our own database system so we have had the same issues how to handle object references and decided for the following:

Our serializer first generates a class modell from a type that is stored in the database as plain .Net string to later make a Type-Object from. From the Type-Object we gather all fields and so get a template from that type to proceed on. This template then serializes the data in a way that small/plain types like strings and structs but also arrays and dictionaries are handled by there own small serializer classes while anything else is handled by its class template.

Object references are skipped in the first step but put into the data as placeholder (by unique Id) and also stored in a list to fix that placeholder later with the id (position in the data chunk). Other serialization processes now will also take a look at the list and check if that object reference is always there, does it have an Id then proceed, is it a new reference or missing id register for later fixup and proceed.

This keeps our data small and clear and thats what is a database used for

Advertisement
3 hours ago, Danpaz said:

Some reference type data will need to stay right where it is defined in the hierarchy so modders aren't having to jump around the XML/JSON/YML for every little thing, and other reference types will need a sensible "home" in the output that's easy to find and reason about.  I hope that makes sense.  I also can't base the choice of class or struct on how I want the data to be serialized, which is why I mentioned the possibility of attributes--opt-in serialization referencing at the type and field levels rather than automatic.

You could definitely control "inline" vs "use reference id" with attributes.  If it makes the most sense, you could even have inline be the default.  You could automatically switch to reference ids only if a cycle is detected.  Things like that.

I've been all out of creativity the last couple days from work, so I have not worked on the code at all, but I have a solid plan now, thanks to your inputs.  I will explore JSON.NET to see if it can meet my requirements and reduce the load, and if it cannot, I will go back to homebrew serialization with attribute decoration.  In my editor, I will treat the serialized/deserialized classes as a model in MVVM style, and my game will simply use them as-is.  Thank you for your time and engagement.

This topic is closed to new replies.

Advertisement