Pupping - a method for serializing data

Published August 31, 2016 by Daniel Randle, posted by EarthBanana
Do you see issues with this article? Let us know.
Advertisement

Introduction

Serialization is the process of taking structures and objects along with their states and converting them to data that is reproducable with any computer environment. There are many ways to do this - it can be a struggle to figure out something that consistantly works. Here I will talk about a pattern or method I like to call pupping, and it is very similar to the method boost uses for its serialization library.

Packing and Unpacking

Pup stands for pack-unpack, and so pupping is packing/unpacking. The ideas is this; rather than create serialize/de-serialize functions for each object type and/or each type of medium (file, network, GUI widgets), create pupper objects for each type of medium bundled with a set of read/write functions for each fundamental data type. A pupper object contains the data and functions necessary to handle reading/writing to/from the specific medium. For example, a binary file pupper might contain a file stream that each pup function would use to read/write from/to.

Explanation

This pattern is fairly similar boost serialization, though I was using it before hearing of boost. It is useful in any case to understand it and possibly use a custom implementation so that no boost dependency is needed. The "pupper" is somewhat equivalent to the boost archive, and pup functions are equivalent to boost serialize functions. The code presented here is more simple than boost, and does not overload operators as boost does. It is as non-invasive as possible, and not template heavy. The idea is that any object, no matter how complex, can be serialized to a stream of bytes by recursively breaking down the object until reaching fundamental data types. Any fundamental data type can be directly represented by bytes. The process of saving/loading/transmitting the raw data is separable from serializing objects. It is only necessary, then, to write the code to serialize an object once and anything can be done with the raw data. The pupper pattern differs from most serialization methods in a few ways: 1) Read and write operations are not separated except at the lowest level (in the pupper object) 2) Objects that need to be serialized do not need to inherit from any special classes 3) Can be implemented with very small overhead, using no external libraries, while remaining extendable and flexible 4) Writing class methods, virtual or otherwise, is largely not necessary If polymorphic serialization is required, a virtual method is needed in base classes. CRTP can be used to aid in this process. This case is covered later. Instead of creating a class method in each object to provide for serialization, a global function is created for each object. All global functions should have the same name and parameters, except the parameter for the object that should be serialized. Making any object "serializable" is then just a matter of writing a global function. These functions can be named whatever as long as they all have the same name, but I find "pup" fitting. Some examples of pup prototypes for stl containers are shown below.

template pup(pupper * p, std::map & map, const var_info & info); template pup(pupper * p, std::vector & vec, const var_info & info); template pup(pupper * p, std::set & set, const var_info & info);

The pupper pointer and var_info reference parameters will be explained later. The important thing is that the serialization work is done in a global function, not a member function. The pup pattern is easiest shown by example. In this article pupper objects for binary and text file saving/loading are coded, and a few example objects are saved/loaded using them. An example is given using the pupper pattern along with CRTP to serialize polymorphic objects. Also, a std::vector of polymorphic base objects is saved/loaded illustrating the flexibility this pattern allows when using other library defined types (std::vector). So without further ado, take a look at the pupper header file.

#define PUP_OUT 1 #define PUP_IN 2 #include #include #include struct var_info { var_info(const std::string & name_): name(name_) {} virtual ~var_info() {} std::string name; }; struct pupper { pupper(int32_t io_): io(io_) {} virtual ~pupper() {} virtual void pup(char & val_, const var_info & info_) = 0; virtual void pup(wchar_t & val_, const var_info & info_) = 0; virtual void pup(int8_t & val_, const var_info & info_) = 0; virtual void pup(int16_t & val_, const var_info & info_) = 0; virtual void pup(int32_t & val_, const var_info & info_) = 0; virtual void pup(int64_t & val_, const var_info & info_) = 0; virtual void pup(uint8_t & val_, const var_info & info_) = 0; virtual void pup(uint16_t & val_, const var_info & info_) = 0; virtual void pup(uint32_t & val_, const var_info & info_) = 0; virtual void pup(uint64_t & val_, const var_info & info_) = 0; virtual void pup(float & val_, const var_info & info_) = 0; virtual void pup(double & val_, const var_info & info_) = 0; virtual void pup(long double & val_, const var_info & info_) = 0; virtual void pup(bool & val_, const var_info & info_) = 0; int32_t io; }; void pup(pupper * p, char & val_, const var_info & info_); void pup(pupper * p, wchar_t & val_, const var_info & info_); void pup(pupper * p, int8_t & val_, const var_info & info_); void pup(pupper * p, int16_t & val_, const var_info & info_); void pup(pupper * p, int32_t & val_, const var_info & info_); void pup(pupper * p, int64_t & val_, const var_info & info_); void pup(pupper * p, uint8_t & val_, const var_info & info_); void pup(pupper * p, uint16_t & val_, const var_info & info_); void pup(pupper * p, uint32_t & val_, const var_info & info_); void pup(pupper * p, uint64_t & val_, const var_info & info_); void pup(pupper * p, float & val_, const var_info & info_); void pup(pupper * p, double & val_, const var_info & info_); void pup(pupper * p, long double & val_, const var_info & info_); void pup(pupper * p, bool & val_, const var_info & info_);

A var_info struct is declared first which simply has a name field for now - this is where information about the pupped variable belongs. It is filled out during the pupping process, and so a constructor requiring field information is made so that it isn't later forgotten. The pupper base class defines the set of methods that any type of pupper must implement - a method to handle reading/writing each fundamental data type from/to the medium. A set of global functions named "pup" are declared and defined, establishing the fundamental usage of the pupping pattern. The idea is to be able to call pup(pupper, object, description) almost anywhere in code in order to serialize/de-serialize any object (that should be serializable). Creating a new pupper object type includes implementing a pup method for each fundamental data type. These methods are then used by the pup global functions, which in turn are used by pup functions for more complicated types. No matter how many new pupper types are created, the pup functions to serialize each object need only be written once. This is exactly what makes this pattern useful. To make all objects serializable to file in binary, create a binary file pupper. To make all objects serializable to file in text, create a text file pupper. To make all objects serializable to a Qt dialog, create a Qt dialog pupper. Some types of pupper objects may require additional information about the variables. For example, there are multiple ways a double can be represented in a GUI - a vertical slider, horizontal slider, spin box, etc. The var_info struct allows new information about variables to be added. Any pupper object that does not need that information can just ignore it. With the Qt example, a flag could be added to the var_info struct and used by the Qt pupper object. The objects that need to be shown in a GUI would then need to set the flag, and all pupper objects that don't have use for the flag ignore it. By making the destructor of var_info virtual, the var_info struct can be extended. This is useful, again, if creating a library that others will be using. It allows the user to create their own pupper object types and add any necessary data to var_info without needing to edit the library source code. There are a few reasons for using pup(pupper, object, description) instead of pupper->pup(object, description) or object->pup(pupper, description). The reasons for not using pupper->pup(object, description) are: 1) The base pupper class would have to be extended for every new type of object. If creating a library with extendable classes, the user of the library would have to edit the base pupper class for every class they extended in which the library is still responsible for serializing 2) The pack/unpack code would be separated from the object making it prone to bugs when changes are made to the object And the reasons for not using object->pup(pupper, description) are: 1) You cannot easily extend third party library objects (such as std::vector) to include a pup function - they would require a special function or wrapper class 2) Since many objects would not include a "pup" function, there would be inconsistencies with the pup usage. This is purely an aesthetics/convenience argument, and is of course an opinion. But I would argue that writing:

pup(pupper,obj1,desc1); pup(pupper,obj2,desc2); pup(pupper,obj3,desc3); pup(pupper,obj4,desc4); //etc...

is both easier to understand and remember than:

obj1->pup(pupper,desc1); pup(pupper,obj2,desc2); obj3->pup(pupper,desc3); pup(pupper,obj4,desc4); //etc...

If the same pup function format is used for everything, writing pup functions becomes trivial because they are just combinations of other pup functions of the same format. Creating concrete pupper objects can be easy - binary and text file pupper objects are included as an example. The definition code for them is boring so it won't be shown here - but the declarations are below.

//binary_file_pupper header #include "pupper.h" struct binary_file_pupper : public pupper { binary_file_pupper(std::fstream & fstrm, int mode); std::fstream & fs; void pup(char & val_, const var_info & info_); void pup(wchar_t & val_, const var_info & info_); void pup(int8_t & val_, const var_info & info_); void pup(int16_t & val_, const var_info & info_); void pup(int32_t & val_, const var_info & info_); void pup(int64_t & val_, const var_info & info_); void pup(uint8_t & val_, const var_info & info_); void pup(uint16_t & val_, const var_info & info_); void pup(uint32_t & val_, const var_info & info_); void pup(uint64_t & val_, const var_info & info_); void pup(float & val_, const var_info & info_); void pup(double & val_, const var_info & info_); void pup(long double & val_, const var_info & info_); void pup(bool & val_, const var_info & info_); }; template void pup_bytes(binary_file_pupper * p, T & val_) { if (p->io == PUP_IN) p->fs.read((char*)&val_, sizeof(T)); else p->fs.write((char*)&val_, sizeof(T)); } //text_file_pupper header #include "pupper.h" struct text_file_pupper : public pupper { text_file_pupper(std::fstream & fstrm, int mode); std::fstream & fs; void pup(char & val_, const var_info & info_); void pup(wchar_t & val_, const var_info & info_); void pup(int8_t & val_, const var_info & info_); void pup(int16_t & val_, const var_info & info_); void pup(int32_t & val_, const var_info & info_); void pup(int64_t & val_, const var_info & info_); void pup(uint8_t & val_, const var_info & info_); void pup(uint16_t & val_, const var_info & info_); void pup(uint32_t & val_, const var_info & info_); void pup(uint64_t & val_, const var_info & info_); void pup(float & val_, const var_info & info_); void pup(double & val_, const var_info & info_); void pup(long double & val_, const var_info & info_); void pup(bool & val_, const var_info & info_); }; template void pup_text(text_file_pupper * p, T val, const var_info & info, std::string & line) { std::string begtag, endtag; begtag = "<" + info.name + ">"; endtag = ""; if (p->io == PUP_OUT) { p->fs << begtag << val << endtag << "\n"; } else { std::getline(p->fs, line); size_t beg = begtag.size(); size_t loc = line.find(endtag); line = line.substr(beg, loc - beg); } }

The template functions are there as a convenience - all of the pupper methods use them. The pup_text template function fills in the string "line" with the variable being read if the pupper is set to read mode, but if it is set to write mode the variable is written to the file stream and the line is left empty. The pup_bytes function is self-explanatory (I know it's not multi-platform safe). Writing pup functions to serialize objects using the pupper objects requires no specific knowledge of the pupper object; it just needs to be passed along. Take a look at the header and definition file for an example object (obj_a).

#include "pupper.h" #include "math_structs.h" class obj_a { public: friend void pup(pupper * p_, obj_a & oa, const var_info & info); void set_transform(const fmat4 & tform); void set_velocity(const fvec4 & vel); const fvec4 & get_velocity() const; const fmat4 & get_transform() const; private: fmat4 transform; fvec4 velocity; }; void pup(pupper * p_, obj_a & oa, const var_info & info) { pup(p_, oa.transform, var_info(info.name + ".transform")); pup(p_, oa.velocity, var_info(info.name + ".velocity")); }

The pup function responsible for serializing obj_a calls the pup functions responsible for serializing fmat4's and fvec4's. Take a look at the code defining fmat4 and fvec4.

struct fvec4 { fvec4(float x_=0.0f, float y_=0.0f, float z_=0.0f, float w_=0.0f); union { struct { float x; float y; float z; float w; }; struct { float r; float g; float b; float a; }; float data[4]; }; fvec4 operator+(const fvec4 & rhs); fvec4 operator-(const fvec4 & rhs); fvec4 & operator+=(const fvec4 & rhs); fvec4 & operator-=(const fvec4 & rhs); }; void pup(pupper * p_, fvec4 & vc, const var_info & info) { pup(p_, vc.data[0], var_info(info.name + ".x")); pup(p_, vc.data[1], var_info(info.name + ".y")); pup(p_, vc.data[2], var_info(info.name + ".z")); pup(p_, vc.data[3], var_info(info.name + ".w")); } struct fmat4 { fmat4(fvec4 row1_ = fvec4(1.0f,0.0f,0.0f,0.0f), fvec4 row2_ = fvec4(0.0f,1.0f,0.0f,0.0f), fvec4 row3_ = fvec4(0.0f,0.0f,1.0f,0.0f), fvec4 row4_ = fvec4(0.0f,0.0f,0.0f,1.0f)); union { struct { fvec4 rows[4]; }; float data[16]; }; }; void pup(pupper * p_, fmat4 & tf, const var_info & info) { pup(p_, tf.rows[0], var_info(info.name + ".row1")); pup(p_, tf.rows[1], var_info(info.name + ".row2")); pup(p_, tf.rows[2], var_info(info.name + ".row3")); pup(p_, tf.rows[3], var_info(info.name + ".row4")); }

The pup function for fvec4 calls the pup function for floats four times, which was defined in pupper.h. The fmat4 pup function calls the fvec4 pup function for fvec4 four times. Notice that no matter what concrete pupper is used, none of these functions change. It is easy to write pup functions for other library types also. As an example, take a look at the pup function for a std::vector.

template void pup(pupper * p_, std::vector & vec, const var_info & info) { uint32_t size = static_cast(vec.size()); pup(p_, size, var_info(info.name + ".size")); vec.resize(size); for (uint32_t i = 0; i < size; ++i) pup(p_, vec, var_info(info.name + "[" + std::to_string(i) + "]")); }

There are some disadvantages with this particular function - mainly it won't work for types of T that don't have a default constructor. There are ways to write this function so that it will work - like with parameter packs - but they aren't needed here. With a pup function to handle std::vector, it can be used to pup any vectors contained in an object. Take a look at the obj_a_container, which owns the obj_a's it contains.

#include "pupper.h" #include "derived_obj_a.h" #include struct obj_a_desc { obj_a_desc(); int8_t type; obj_a * ptr; }; struct obj_a_container { ~obj_a_container(); void release(); std::vector obj_a_vec; }; void pup(pupper * p_, obj_a_desc & oa_d, const var_info & info) { pup(p_, oa_d.type, var_info(info.name + ".type")); if (oa_d.ptr == nullptr) { // This is a bit of a cheat because I don't feel like writing factory code if (oa_d.type == 1) oa_d.ptr = new obj_a; else oa_d.ptr = new derived_obj_a; } pup(p_, *oa_d.ptr, info); } void pup(pupper * p_, obj_a_container & oa, const var_info & info) { pup(p_, oa.obj_a_vec, var_info("obj_a_vec")); }

By creating a pup function for an obj_a *, memory can be allocated if need be. The vector pup function will call the pup obj_a* function on every element - which allocates memory if the pointer is null. But how does the pupper pattern handle polymorphic class types which should still be serialized by the library? For example, if it should be possible to derive from obj_a and still have obj_a_container be able to pup the derived object given a pointer to the base object. Well, it may be apparent that something strange is going on in the pup pointer to obj_a function - there is this obj_a_desc struct wrapping each obj_a pointer along with a type field that is checked against 1. This is a makeshift way to allow the obj_a_container to allocate derived_obj_a's (which will be defined shortly). Normally this would be done with some type of factory - but that's not the focus of the article. Instead, the description struct is serialized so that the pup function knows which object to allocate. It's serialized using a - you guessed it - pup function. After allocation, the pup function for obj_a is called - there is no special pup function for derived_obj_a and no pointer casting is done. To accomplish this a bit of curiously recurring template pattern (CRTP) code is needed. First, a virtual pack/unpack method needs to be added to obj_a. Instead of requiring every derived object to implement this method, a template class is created which inherits from obj_a and implements it. The derived classes then inherit from the template class, with their type as the template parameter. For clarity the virtual method will be called pack_unpack instead of pup. The new header file for obj_a is shown below.

#include "pupper.h" #include "math_structs.h" class obj_a { public: friend void pup(pupper * p_, obj_a & oa, const var_info & info); void set_transform(const fmat4 & tform); void set_velocity(const fvec4 & vel); const fvec4 & get_velocity() const; const fmat4 & get_transform() const; protected: virtual void pack_unpack(pupper * p, const var_info & info) {} private: fmat4 transform; fvec4 velocity; }; template class puppable_obj_a : public obj_a { public: puppable_obj_a(T & derived_): derived(derived_) {} protected: void pack_unpack(pupper * p, const var_info & info) { pup(p, derived, info); } private: T & derived; }; void pup(pupper * p_, obj_a & oa, const var_info & info) { oa.pack_unpack(p_,info); pup(p_, oa.transform, var_info(info.name + ".transform")); pup(p_, oa.velocity, var_info(info.name + ".velocity")); }

The oa.pack_unpack(p, info) will call the derived pack_unpack function, which in turn will call the correct pup function for the derived type. This means that the pup function for derived_obj_a will be called first, followed by the pup functions to serialize transform (fmat4) and velocity (fvec4). The code for derived_obj_a is shown below.

class derived_obj_a : public puppable_obj_a { public: derived_obj_a(float health_=100.0f, float max_health_=100.0f): puppable_obj_a(*this), health(health_), max_health(max_health_) {} float health; float max_health; }; void pup(pupper * p, derived_obj_a & oa, const var_info & info) { pup(p, oa.health, var_info(info.name + ".health")); pup(p, oa.health, var_info(info.name + ".max_health")); }

The derived class, as stated earlier, inherits from the template class puppable_obj_a with its own type as the template parameter, which in turn inherits from obj_a and overwrites the pack_unpack method. Doing this allows the correct pup function to be called, pupping the health and max_health fields of derived_obj_a. From the outside looking in, only a pup call is needed to serialize obj_a - even if it is a polymorphic pointer which actually points to derived_obj_a storage. An example program is included illustrating this fact. The read and write to file functions show the pup pattern in action.

void read_data_from_file(obj_a_container * a_cont, const std::string & fname, int save_mode) { std::fstream file; pupper * p = nullptr; std::ios_base::openmode flags = std::fstream::in; if (save_mode) { flags |= std::fstream::binary; p = new binary_file_pupper(file, PUP_IN); } else { p = new text_file_pupper(file, PUP_IN); } file.open(fname, flags); if (!file.is_open()) { std::cout << "Error opening file " << fname << std::endl; delete p; return; } a_cont->release(); pup(p, *a_cont, var_info("")); std::cout << "Finished reading data from file " << fname << std::endl; } void write_data_to_file(obj_a_container * a_cont, const std::string & fname, int save_mode) { std::fstream file; pupper * p; std::ios_base::openmode flags = std::fstream::out; if (save_mode) // if save mode is one then write in binary mode { flags |= std::fstream::binary; p = new binary_file_pupper(file, PUP_OUT); } else { p = new text_file_pupper(file, PUP_OUT); } file.open(fname, flags); if (!file.is_open()) { std::cout << "Error opening file " << fname << std::endl; delete p; return; } pup(p, *a_cont, var_info("")); std::cout << "Finished writing data to file" << fname << std::endl; }

Saving all of the obj_a's within the obj_a_container is as simple as creating a pupper object and calling pup passing in the pupper object, the object to be serialized, and a var_info describing the object. The pupper object can be as simple or as complicated as it needs to be. The binary pupper, for example, could make sure (and probably should) that endianness is consistent for multi-byte chunks. It could also hold a buffer and only write to file once the buffer was full. Whatever - do what you gotta do. There is definitely some flexibility though.

Conclusion

Serialization can be daunting, but if it is broken down in to small enough pieces, then its easy to handle. The main benefit of using the methods presented in this article is that serialization code only needs to be written once, and different puppers can be created as needed. Doing this using global functions is non-intrusive and easy to understand. Ousting templates (mostly) makes for easier error messages when a pup function is forgotton. Lastly, the code presented here has virtually no dependencies and can be extended with minimal or no changes to existing code. I hope this article has provided something useful. I look forward to hearing your feedback and thoughts. All source code is attached along with a CMakeLists.txt for easy compilation. Cmake should make the compilation process painless. Even without cmake - no external libraries are needed so just add the files to a project and it should be good to go. Feel free to use, copy/paste, or redistribute any of the source code in any way desired. Here is the git repo, the zip is also attached.

Credits

I did not invent this serialization method - it was introduced to my by my simulations class professor Orion Lawlor at University of Alaska Fairbanks. He showed this to me after watching me give a presentation where I had painfully hand-created dialog fields to edit attributes of a particle system. There were a lot of fields, and after class he showed me some code examples he'd written using the method and suggested I use it instead of manually making such dialogs. I followed his advice and tweaked the method for my own use, and it has been great! I'm hoping to get a link for the original code samples he gave me soon.

Cancel Save
2 Likes 4 Comments

Comments

Oberon_Command
I like where this is going. I use similar serializations in my own code and some code I've worked with professionally uses a (vastly more complicated) scheme to do something similar. I've also worked with multiple systems that do this under the hood, but wrap it in a declarative syntax to make specifying "PUP" structures less error-prone.

If I had one critique I would say that your code snippets could use some cleanup as the spacing is a little inconsistent and some of the more "psuedo-code-y" bits aren't in code tags.
August 26, 2016 11:38 PM
EarthBanana

I like where this is going. I use similar serializations in my own code and some code I've worked with professionally uses a (vastly more complicated) scheme to do something similar. I've also worked with multiple systems that do this under the hood, but wrap it in a declarative syntax to make specifying "PUP" structures less error-prone.

If I had one critique I would say that your code snippets could use some cleanup as the spacing is a little inconsistent and some of the more "psuedo-code-y" bits aren't in code tags.

Yeah you really can make it arbitrarily complicated depending on what's needed - for example I used this method to write network code for integrated systems that ended up being much more complicated than what is shown here.

I am trying to edit the code to clear up the spacing issue but I'm having troubles - it shows correctly in edit mode so I think some invisible formatting stuff has carried over from copy pasting - I might end just going through line by line and have the auto indent in the gamedev editor do the indenting.

Thanks for the feedback!

August 29, 2016 04:35 PM
Oberon_Command

I am trying to edit the code to clear up the spacing issue but I'm having troubles - it shows correctly in edit mode so I think some invisible formatting stuff has carried over from copy pasting - I might end just going through line by line and have the auto indent in the gamedev editor do the indenting.


If all else fails, I sometimes paste the whole thing into Notepad and edit the post in raw BBCode, then post it that way.
September 02, 2016 03:50 AM
efigenio

Thanks a lot for your insights, it has been very usefull for developing my projects!!!

November 21, 2016 06:23 PM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!

A look in to a useful pattern for serializing data which separates the serialization process from the resulting data. This pattern is very similar to boost serialization without the boost dependencies and templates. It is also easy to understand and expandable as needed.

Advertisement
Advertisement

Other Tutorials by EarthBanana

EarthBanana has not posted any other tutorials. Encourage them to write more!
Advertisement