Banshee Engine Architecture - RTTI & Serialization

Published February 07, 2015 by Marko Pintera, posted by BearishSun

Do you see issues with this article? Let us know.

Series introduction

This article is part of a larger series describing implementation details of various systems in Banshee Engine. If you are not familiar with Banshee make sure to check out the previous article: Banshee Engine Architecture - Introduction

Introduction

Run time type information system with serialization support was one of the first systems to be developed for Banshee. It is something that has wide ranging consequences on many high level modules and classes and is hard to do properly as an afterthought. The system had to be general enough to work for most objects in the engine, from resources and configuration files to entire hierarchies of scene objects and components for saving entire levels. For those not familiar, run time type information allows you to query information about types during program execution. For example, you may find out exactly which fields (i.e. variables) a class contains, get class name, determine actual type of a polymorphic object or even determine class base and subtypes. This has many uses but a really useful one is serialization. With enough information you can serialize entire objects or hierarchies of objects into a stream of bytes without needing to write special serialization code for specific object types. As long as the object has RTTI it can be serialized using the same code while in the past you might be forced to write save/serialize/write (and their read equivalents) methods for each type. C++ comes with RTTI support built into the compiler, however it is a very limited form of support and used primarily for finding out of an object is of a certain polymorphic type and if can be safely cast to another type. This form of RTTI also applies to every single type whether you need it or not (unless compiler is smart enough to optimize certain cases out, but exact control is lacking regardless). Higher level languages like C# come with a more advanced RTTI support allowing you to query pretty much every single little detail about classes, fields, methods and most other language constructs. Banshee needed a lot more control than C++ RTTI system offers, but without implementing a full reflection-type system like the one present in C#. When creating it I had this set of requirements in mind:

Ability to determine out exact polymorphic type from an object pointer
Ability to create a new empty instance of an object with just a class name or RTTI identifier
Serialization that handles object references so complex structures may be saved (e.g. entire game levels)
Serialization that supports versioning so that adding new fields or deleting old ones doesn't break previously serialized data (e.g. player has a save file and after you patch the game that save file must still work even if serialized class structure is no longer the same)
Serialization that works automatically with inheritance hierarchies
Support for data transformations during serialization/deserialization. Sometimes what you keep in memory is not the same as the thing you want to serialize (e.g. you might want to compress image data before saving it)
Serialization that natively supports arrays of data
Serialization needs to play well with external references (e.g. references to resources, which are saved separately)
Serialization needs to play well with the scripting engine, as the scripts have a more relaxed serialization scheme which is still built on top of the same system
Ability to write serialized data in various formats (e.g. binary, text, xml, etc.)
Binary serialization had to produce compact results

To give you a taste here are some examples of what the finalized RTTI and serialization system will allow you to do. Serialize an in-memory object to a file and then restore it later, while keeping all its references intact. This will even save complex hierarchies like game levels and external references to resources.

 MyObjectPtr myObject = ...; FileSerializer fs; fs.encode(myObject, "C:\myObject.obj"); IReflectable myRestoredObject = fs.decode("C:\myObject.obj");

MyObject class can even change in-between encode and decode calls without breaking the serialization. If a field was removed the deserialization will throw away the saved data, or if a new field was added it will be initialized to the default value. This means you can safely store your levels/resources/save games and not worry about versioning or converting them if your types get changed (during development process or due to a patch, for example). You may check if an object is of proper type using the rtti_is_of_type helper method: IReflectable* myObject = ...; if(rtti_is_of_type(myObject)) ... Check if a class derives from some other using the rtti_is_subclass helper method: if(rtti_is_subclass(myObject)) ... Create a new instance of an object just from the type name or ID using the rtti_create helper method: myObject = rtti_create(TID_Texture); Helper methods for converting many standard types to/from bytes are also provided. Aside from being useful when writing to disk, you may use this for encoding data to send over the network.

 Map> myMap; ... fill myMap with data ... UINT32 size = rttiGetElemSize(myMap); UINT8* buffer = bs_alloc(size); rttiWriteElem(myMap, buffer); ...

It is also immensely useful to be able to iterate over all fields of a class. For example, imagine you wanted to iterate over your entire level hierarchy and create a list of all resources used by the level (or just by a specific game object). A very simplified example where we collect resources only on a single component myComponent would look something like this:

 RTTIType* type = myComponent->getRTTI(); UINT32 numFields = type->getNumFields(); for(UINT32 i = 0; i < numFields; i++) { RTTIField* field = type->getField(i); if(field->isReflectableType()) // Resource handles are IReflectable value types { RTTIReflectableFieldBase* reflField = static_cast(field); if(reflField->getType() == ResourceHandle::getRTTIStatic()) { HResource handle = reflField.getValue(myComponent); ... // Save the handle to some list of dependencies } } }

Now that you have an idea what it can do, let's see how it does it.

RTTI implementation

Banshee uses a manual approach for defining RTTI data. That is, you must manually specify fields and classes you wish to have RTTI information in a separate C++ file. Other engines often use a similar approach, although most seem to prefer using macros in the source class' header itself. This has the advantage of programmers less likely to forget to add a field to RTTI when a class changes, but also pollutes the header with RTTI information that I would rather keep external. Aside from that, macros are harder to read and often confuse IDEs, especially when attempting automatic refactoring. Banshee also requires you to specify additional information along with each field that helps it handle all the complex cases mentioned above, which would further pollute the header. On top of that, data transformations (e.g. compressing a texture before saving) cannot be handled easily with macros which would require a special case to deal with, as well as introducing even more RTTI data in the source class' files. In the end this is a personal preference more than anything else. The ideal approach to handling RTTI data is programatically using code generation. You would have an external tool that preprocesses your files before compilation, parsing C++ code and some optional parameters you specify and generating RTTI C++ files from that information. This isn't the approach I have seen in any current C++-based engines, but I do believe this was due to lack of a simple way to parse C++ files and generate the necessary information. Recently with the appearance of libraries like libclang creating a fairly extensive C++ RTTI system should be fairly doable. However this was out of the scope of what was needed for Banshee. RTTI definitions in C++ are less of a problem with Banshee than with other fully C++ based solutions as it is expected you will write most of the high level code using C# scripting which provides a fully automatic serialization (based on top of the same system described here). Additionally even with an automatic system you would still have to manually specify serialization information in case of data transformations and other special cases, therefore it was not something I felt was worth the effort, but I thought it was still worth mentioning. In Banshee, to create a class that supports RTTI you first must ensure it implements IReflectable interface. It is a minimal interface that requires the class to implement a couple of methods that retrieve a RTTIType object. Returned RTTIType object contains all the needed RTTI data. For a very simple Texture class (with no actual data) the implementation would look like this:

 class Texture : public IReflectable { int width, height; static RTTITypeBase* getRTTIStatic() { return TextureRTTI::instance(); } virtual RTTITypeBase* getRTTI() const { return Texture::getRTTIStatic(); } };

getRTTI method allows you to retrieve RTTI data from an object instance and getRTTIStatic is a static method for when you don't have an object instance. TextureRTTI is a RTTIType implementation specific for the Texture class. It holds all the RTTI information, therefore keeping the source class clean from RTTI and serialization-specific code aside from the simple interface implementation. We will cover how to create your own RTTIType in the next section.

RTTIType

RTTIType allows you to provide various information about the source type, including its name, place in inheritance hierarchy and field definitions, along with optional logic to trigger during serialization/deserialization. TextureRTTI class mentioned above might look something like the following.

 class TextureRTTI : public RTTIType { int& getWidth(Texture* obj) { return obj->width; } void setWidth(Texture* obj, int& value) { obj->width = value; } int& getHeight(Texture* obj) { return obj->height; } void setHeight(Texture* obj, int& value) { obj->height = value; } TextureRTTI () { addPlainField("width", 0, &TextureRTTI::getWidth, &TextureRTTI::setWidth); addPlainField("height", 1, &TextureRTTI::getHeight, &TextureRTTI::setHeight); } const String& getRTTIName() { static String name = "Texture"; return name; } UINT32 getRTTIId() { return TID_Texture; } std::shared_ptr newRTTIObject() { return bs_shared_ptr(); } };

RTTIType is a template and its template parameters allow us to know the source class the RTTI information is provided for, along with its base class. This information is contained in its template parameters RTTIType(Texture, IReflectable, TextureRTTI) where the first parameter is the source type, the second parameter is the base class of the source type (usually IReflectable) and finally the RTTI type itself. The type of the class itself is not needed strictly for RTTI purposes but is there instead to allow generation of some repetitive code you would otherwise need to write yourself. All RTTIType implementations will get registered with the runtime automatically when the application is started or when the dynamic library containing the type is loaded. This means all you need to do is implement the interface and it will be usable on next compile. In the class itself you will find field definitions for source class members you wish to include in RTTI. In this case we have width and height members with their getter/setter methods. Those methods need to follow a certain format and once declared getter/setter methods need to be actually passed to one of the add*Field methods to register them with the type. We will talk more about field definitions in more detail later. Following the field definitions are methods for retrieving source class unique name and ID. The class name can usually be the same as the C++ class name and the ID must be an unique integer, in this case provided in the form of an enum. It is usually good to ensure that all IDs in a library start far away for other library IDs in order to avoid conflicts, although the system will warn you if you accidentally use the same ID for multiple types. Finally newRTTIObject() method is used for creating a new empty object of the source type. This means that you will normally want to have a parameterless constructor for classes that support RTTI. You may make the constructor private as making the RTTI type class a friend of the source type class is usually a good idea so you may more easily access its members. If the source class is an abstract class that should never be constructed this method may return null. Returned value is always wrapped in a shared pointer to ensure proper cleanup. With complex hierarchies the serializers will often need to construct dozens or hundreds of objects and leaving the deallocation as a worry to the user is not practical. Those that find the use of shader pointers too heavy weight (which should only be extreme cases) can use the simpler serialization technique described later. This concludes the example. The RTTI type shown above is fully functional - more complex types will require additional fields and different field types, along with possibly some data transformation and additional initialization logic, but the basic concept remains as shown here. Following chapter will cover creation of field information which we just skipped over in this section.

RTTI fields

In the previous example you saw the use of addPlainField method for registering a field with the RTTI system. This is just one of three available field types:

Plain fields - used when you just want to (or need to) use a memory copy for serializing. This is used for all primitive types like int, float, bool and similar, but may also be used for complex types in case you don't want any advanced serialization. We'll touch on how to define serialization for such types later.
Reflectable fields - used when referencing another object that implements IReflectable. Object will be serialized inline by value (if multiple objects reference it, each will have its own copy).
Reflectable pointer fields - used when referencing another object that implements IReflectable via a pointer. If multiple objects share pointers to the same IReflectable that connection will be preserved when serializing and restored when deserializing. This allows you to serialize complex hierachies.

Each method for registering fields follows the same basic format, so lets use addPlainField as an example and we will visit all the specific methods later. void addPlainField(String name, UINT32 uniqueId, CallbackGet getter, CallbackSet setter, UINT64 flags); First parameter is the field name, normally corresponding to name of the member the field describes. It doesn't have to be unique. Second parameter is an unique ID for this field. Each field must have its own unique ID and you will be warned at runtime in case that is not true. Field IDs allow the versioning system to work - this allows you to serialize a certain set of data, modify it, and still be able to deserialize and read the original fields. Each newly added field should have an unique ID, and removed field IDs should never be reused. Additionally, if you change data type of a certain field you should update the ID (think of it as adding a brand new field and removing the old one). Both the name and ID are used with all field types (addReflectableField, addReflectablePtrField, etc.) so I will not be mentioning them again when we cover those fields. Next two parameters are function callbacks that actually assign and retrieve data from the member variable the field represents. Those callbacks need to follow a certain signature which is different based on field type they're used with so I will describe them in their own separate sections. And the final parameter is an optional set of flags that you can use for custom data. It is not used by the RTTI system directly, but rather may provide additional information to the serializer or other systems using RTTI. All field types except for managed data fields also come in array variants. Array variants are similar to normal fields but also allow you to specify an index in getter/setter plus provide a getter/setter for array size.

 void addPlainArrayField(String name, UINT32 uniqueId, CallbackGet getter, CallbackGetSize getSize, CallbackSet setter, CallbackSetSize setSize, UINT64 flags);

All array field types accept common setSize and getSize callbacks which follow this format: UINT32 getSize(SourceType* obj); void setSize(SourceType* obj, UINT32 size); Where SourceType is the class of object we are creating RTTI for (e.g. Texture), and size is the size of the array. Plain fields Plain fields contain data types that don't implement IReflectable. This includes primitives like int, bool or float and complex types that you either cannot modify like Vector or String, or are sure their structure will not change during development. Data in plain fields gets serialized using memcpy, which means if you are using plain fields for complex types like classes you lose the versioning feature of the RTTI system. That is, if you modify that class later your saved data will most likely be broken. Therefore you should only use it for types that you are sure will be static, like the ones mentioned above, or when the data doesn't have to persist for a long period of time (e.g. network transfer). Plain field getter and setter callbacks must follow this format: DataType& getter(SourceType* obj); void setter(SourceType* obj, DataType& value); And for arrays: DataType& getter(SourceType* obj, UINT32 idx); void setter(SourceType* obj, UINT32 idx, DataType& value); While the rest of the parameters follow the same format as described in previous section. DataType above must specialize RTTIPlainType that tells the RTTI system how internal data is to be serialized. The default version uses a memcpy for the entire type, but you might need something more complex (e.g. when serializing std::string). I will talk more about RTTIPlainType in a bit. Reflectable fields These fields contain types that implement IReflectable interface. The objects are stored by value which means each time the field is serialized and deserialized a brand new copy is made. This is in contrast with IReflectable pointer fields mentioned in next section. You do not need to do anything special with reflectable fields as long as the type properly implements IReflectable interface as described earlier. You may use addReflectableField and addReflectableArrayField methods in RTTIType to register a new reflectable field. Getter and setter callbacks for those methods follow this format: DataType& getter(SourceType* obj); void setter(SourceType* obj, DataType& value); And for arrays: DataType& getter(SourceType* obj, UINT32 idx); void setter(SourceType* obj, UINT32 idx, DataType& value); Essentially these are the same parameters as with plain fields except that DataType must implement IReflectable otherwise you will get a compiler error. Reflectable pointer fields These fields also contain types that implement IReflectable interface. However unlike the previous field type the reference held is not by value and instead by pointer. When such an object is serialized and deserialized it will remain a single object and the pointer references will be properly saved and restored in all objects that reference it. This allows you to serialize complex hierarchies or webs of objects while ensuring all the connections remain intact. You may use addReflectablePtrField and addReflectablePtrArrayField methods in RTTIType to register a new reflectable field. Getter and setter callbacks for those methods follow this format: std::shared_ptr getter(SourceType* obj); void setter(SourceType* obj, std::shared_ptr value); And for arrays: SPtr getter(SourceType* obj, UINT32 idx); void setter(SourceType* obj, UINT32 idx, SPtr value); Where DataType must implement IReflectable interface and SPtr is just a Banshee shorthand for shared pointer. This is the final field type but before continuing I want to focus on plain field type specialization I mentioned earlier. Plain type specializations Plain type specializations allow you to control how a plain data type is serialized. This is used for types that cannot implement IReflectable interface (e.g. primitive types or types from standard library) or for types you know will not change. Plain type serialization can also offer you more complete control over the serialization of your object and is more lightweight than using an IReflectable interface. The downside is that all the advanced features like field versioning or pointer saving/restoring offered by IReflectable will not be available. Plain types are implemented by specializing the RTTIPlainType template. This template allows you perform serialization in a more traditional way by using memory copies. Banshee comes with RTTIPlainType specializations for all basic types and many standard library containers, but you may also create your own specializations. See below for a very basic specialization of RTTIPlainType for String type.

 template<> struct RTTIPlainType { enum { id = 20 }; enum { hasDynamicSize = 1 }; static void toMemory(const String& data, char* memory) { UINT32 size = getDynamicSize(data); memcpy(memory, &size, sizeof(UINT32)); memory += sizeof(UINT32); size -= sizeof(UINT32); memcpy(memory, data.data(), size); } static UINT32 fromMemory(String& data, char* memory) { UINT32 size; memcpy(&size, memory, sizeof(UINT32)); memory += sizeof(UINT32); UINT32 stringSize = size - sizeof(UINT32); char* buffer = (char*)bs_alloc(stringSize + 1); memcpy(buffer, memory, stringSize); buffer[stringSize] = '\0'; data = String(buffer); bs_free(buffer); return size; } static UINT32 getDynamicSize(const String& data) { UINT64 dataSize = data.size() * sizeof(String::value_type) + sizeof(UINT32); return (UINT32)dataSize; } };

Each specialization must provide an unique type ID, similar to IReflectable. Optionally you can control whether the type's size can be calculated via sizeof or is dynamic per-instance via hasDynamicSize property. Types without dynamic size take up less space in serialized form as their size doesn't need to be written in a separate block, while types with dynamic size need to implement getDynamicSize that calculates number of bytes taken up by that specific type instance. For example, String type has dynamic size because each instance of the type can have a different size, while float has static size. Provided methods should be self-explanatory, toMemory writes the object instance into a stream of bytes, fromMemory restores the object from a stream of bytes and getDynamicSize returns the amount of bytes the object will require in the memory buffer when serialized. Once an object has RTTIPlainType specialization implemented you can use it in plain fields as described earlier. In case your object is really simple and you want it to be entirely serialized by a single memcpy (e.g. Vector3, Matrix4) you can use the shorthand macro: BS_ALLOW_MEMCPY_SERIALIZATION(DataType); Which will generate the default RTTIPlainType specialization. Aside from allowing you to use data types for plain fields in RTTIType this type of specialization also allows you to use a set of helper methods:

 char* rttiWriteElem(const DataType& object, char* buffer); char* rttiReadElem(DataType& object, char* buffer); UINT32 rttiGetElemSize(const DataType& object);

These methods can be used by other more complex RTTIPlainType specializations. For example, if you were to specialize Vector data type in its toMemory method it could use rttiWriteElem to easily write any child objects in the container to the memory buffer. This type of serialization is very fast and as lightweight as you want it to be (depends on your RTTIPlainType specialization) which makes it perfect for performance-intensive scenarios. It is also very useful for serializing data types for network transfers as these transfers generally do not require any advanced features provided by IReflectable. As Banshee already comes with many of these specializations built in, you can serialize fairly complex containers with no problem:

 Map> myMap; ... fill myMap with data ... UINT32 size = rttiGetElemSize(myMap); UINT8* buffer = bs_alloc(size); rttiWriteElem(myMap, buffer); ...

Binary serializer implementation

Now that RTTI type is defined we will focus on a serializer class that uses that RTTI information for saving and loading objects. Banshee currently only has a binary serializer but different types of serializer (text, xml, json) can be relatively easily implemented following the same example as the binary one. When using binary serialization you have an option to output data to memory using MemorySerializer or to a file using FileSerializer. Both of those are just very simple wrappers around the BinarySerializer class. As an example, if you wanted to save/load a class that implements the necessary RTTI interfaces to/from a file you would do:

 FileSerializer fs; fs.encode(myObject, "C:\myObject.obj"); IReflectable myRestoredObject = fs.decode("C:\myObject.obj");

This will save a previously loaded object into a binary format, and then immediately load that same object. This can be used for saving entire levels, user save games, cloning objects, undo/redo functionality and other uses, and all you need to write is those few lines.

Encoding

Encoding involves parsing an existing object instance and encoding its data to a stream of bytes which can later be output to a memory buffer or a file. It is performed by calling encode method on BinarySerializer. Its signature looks like this:

 void BinarySerializer::encode(IReflectable* object, UINT8* buffer, UINT32 bufferLength, int* bytesWritten, std::function flushBufferCallback);

FileSerializer and MemorySerializer wrap the complexities of BinarySerializer so you usually don't need to worry about most of these parameters but in this section we will explain them. First off you have an object you wish to serialize, followed by a pointer to a block of memory where the encoded object data will be written to and a size of that block (buffer and bufferLength). bytesWritten is an output parameter that will hold the number of bytes used to encode the entire object once the process is complete. Finally you have flushBufferCallback that will be called whenever buffer gets full. In that callback you usually want to save the contents of the buffer and return a new pointer to a free block of memory, or terminate the encoding. Normally when this is called you would write the buffer data to a file or a larger block of memory and then return the start of the buffer to be re-used for further encoding. The process of encoding involves these steps:

The top level object starts getting encoded
All RTTI types of an object are retrieved and iterated over (there will be only one if object is not polymorphic)
For each RTTI type we retrieve all fields
Plain and IReflectable fields are serialized directly into the output buffer
- Plain fields are encoded by directly accessing their RTTIPlainType implementation
- IReflectable fields are encoded by accessing their RTTI types and fields by calling encode recursively
IReflectable pointer fields are marked for later and given an unique ID. The ID is encoded in the output buffer.
After we loop through the top level object and all of its IReflectable children we start serializing IReflectable objects that were referenced by pointers. Their serialization proceeds by recursively calling encode essentially repeating the whole procedure.

Whenever a serialization of a particular sub-type starts and ends we call RTTIType::onSerializationStarted and RTTIType::onSerializationEnded. You may override those in your RTTIType implementations in case you need to prepare some data on serialization start, or perform cleanup once it ends. Objects that implement IReflectable by default also have a mRTTIData field which has Any type. As the name implies you can use that field to store any kind of data. Usually it is used for temporary data created in RTTIType::onSerializationStarted and freed in RTTIType::onSerializationEnded. Most RTTI types will not require these methods, but they can prove useful with complex types that require special handling.

Decoding

Decoding process involves reading the binary data and creating object instances from that data. It is performed by calling the decode method on the BinarySerializer: std::shared_ptr BinarySerializer::decode(UINT8* data, UINT32 dataLength) Decoding algorithm will iterate through the provided buffer, detect objects, their fields and their data. Each time a new field or a new object is reached its type is looked up in the RTTI system. If the field or type cannot be found (we could have removed it since) it is skipped. Otherwise a new instance of that type is created, either by using the IReflectable interface or RTTIPlainType template. After all objects are decoded the pointer references will be restored as a final step. The method returns an object instance to the top level object that was decoded, or null if no object was decoded. Similar to encoding, whenever deserialization of a particular sub-type starts and ends we call RTTIType::onDeserializationStarted and RTTIType::onDeserializationEnded. You can use those methods together with IReflectable::mRTTIData for pre- and post-processing operations. You will get a better insight in how decoding works by taking a look at the binary structure of an encoded object.

Binary data structure

Binary structure determines how is encoded data laid out in memory. It tries to be relatively compact while robust enough to handle all the features provided by the RTTI system like versioning and reference saving. On the highest level it is laid out as such:

Top level object
- List of one or more types (More than one in case object derived from another RTTI type)
  - Type meta-data
  - Type fields
    - Field meta-data
    - Field data
Zero or more objects referenced by IReflectable pointers
- Same structure as top level object

In that list there are three basic components to be aware of: type meta-data, field meta-data and field data, so let's continue by describing each. Type meta-data This is an 8-byte structure that describes the type that is to follow. It is also the first block you will encounter for each serialized object. This means it is the first block in the file for the top level object, and there will be at least one for every referenced IReflectable object. It corresonds to a single RTTIType implementation. A single object can have multiple types if it derives from a type that also has RTTI of its own, but in a lot of cases it will be just one. Types are laid out starting with the most specialized followed by more general ones. In its 8 bytes it encodes RTTI type ID, unique instance ID for the object and a couple of flags. Exact encoding looks like this (each character is a bit): SSSS SSSS SSSS SSSS xxxx xxxx xxxx xxBO IIII IIII IIII IIII IIII IIII IIII IIIIS - Unique instance identifier for an object instance. This is used when resolving references to objects (IReflectable pointers). I - Unique RTTI type identifier that was specified when implementing a RTTITypeB - 0 if the current RTTI is actual polymorphic type of the object we're encoding and 1 if it is a base class. Used internally to signify if we have reached a brand new object or are just parsing a base class of the current object. O - Type of meta data. 0 for field meta-data and 1 for type-meta data. Lets us know if we have reached a new field or a new object when parsing. x - Unused Each type-meta data is followed by a list of fields, if it has any. Fields consist of field meta-data and field data, described below. Field meta-data A 4-byte structure that describes a field that is to follow. It contains unique field ID we provided to one of the add*Field methods when implementing RTTIType, field type (plain, IReflectable, IReflectable pointer), size of the data to follow and a few other flags. Exact encoding looks like this: IIII IIII IIII IIII SSSS SSSS xxYP xCAOI - Unique field identifier we specified when registering the field with RTTIType. This is used for versioning - If fields are added or removed after an object was encoded we can detect it using unique field IDs. Then we can skip those that do not exist, and avoid touching those that do not exist in encoded data. S - Size of the field data. This is used for plain types smaller than 255 bytes. If you expect the size to be larger than 255 then you must set the dynamic size flag in your RTTIPlainType implementation. This is used as an optimization to avoid 4-byte size overhead for really small types. Y - Specified for plain fields that do not fit in 255 bytes. Signifies that additional 4 bytes will be allocated at the start of field data. A - Signifies that a field contains an array of values. In that case this meta-data structure will immediately be followed by a 4-byte integer containing the number of array elements. Each array element is encoded as separate field data entry. C - Field contains an IReflectable value type. P - Field contains an IReflectable pointer type. O - Type of meta data. 0 for field meta-data and 1 for type-meta data. x - Unused Field meta-data is followed either by a single field data block (in case of non-array fields) or by a 4-byte array size, which is then followed by a list of field data blocks (for array fields). Field data This block contains actual field data and its layout differs depending on the parent field type:

For plain types this is just raw data. If field has dynamic size first 4 bytes specify the size.
For IReflectable value types it is the entire IReflectable object encoded recursively.
For IReflectable pointer types it will be just an identifier signifying which object we are pointing to. The actual pointed to object will be encoded after the top level object as mentioned earlier.

This is the final data block included in the binary structure - the entire structure is formed using the three type of blocks described above.

Source code

Those that are interested in actual source code of the described systems, check out Banshee from https://github.com/BearishSun/BansheeEngine. All the code is contained in the following files: RTTI

BsIReflectable - Base interface all types with RTTI support need to implement.
BsRTTIType - Base class that contains RTTI data for a specific type
BsRTTIField - Base class used for representing a field in a type
BsRTTIPlainField - Type field implementation for plain types
BsRTTIReflectableField - Type field implementation for complex value types
BsRTTIReflectablePtrField - Type field implementation for pointers to complex types

Serialization

BinarySerializer - Converts an IReflectable into a stream of bytes, and other way around
FileSerializer - Uses BinarySerializer to serialize/deserialize directly to/from a file
MemorySerializer - Uses BinarySerializer to serialize/deserialize to/from a memory buffer

Conclusion

This concludes the article. It has shown you how to create custom RTTI types, use the RTTI data for various needs and how to easily serialize and deserialize objects that supply RTTI data. One part I haven't touched on is how Banshee handles serialization of script objects, as the scripting system is still in development. However it suffices to say that internally this same system is used, but scripting classes and fields do not require manual definitions of RTTI data. RTTI data is provided automatically by the scripting runtime, while the users may use various attributes to control RTTI information if needed. Compact representation is less of a requirement with the scripting system and focus is instead placed on simplicity. I will touch more upon this in a separate article. Hopefully you found the article informative, and join me next time when I'll talk about the design of multi-threading in Banshee, primarily focusing on multi-threaded rendering.

0 Likes 6 Comments

Comments

snake5

Interesting, though, what puzzles me most - you've created this huge system that still requires a lot of user input and yet have named essentially only one use for it - iterating over all fields of a class. Would it not be more productive to simply create template functions for serialization / dumping / iteration and let the compiler take care of the rest?


template< class T > void VisitAllFields( T& callable )
{
    callable( m_position, "position" );
    callable( m_somethingelse, "something else" );
    ...
}

February 09, 2015 12:44 PM

BearishSun

Primary uses for this system are versioning, reference restoring and serialization, not just iterating over fields :)

I could have implemented it the way you describe and while it would probably change the structure of the system quite a bit, 95% of it would remain essentially unchanged, just referenced in different places. All that it would really change is that I would no longer hold my field data in a container and instead they would be available implicitly, which feels like a trivial difference to me.

Couple of minor downsides to your approach I can think of are:

- More code generated due to templates since I no longer need one class generated per type, but I'd need NxM classes where N is number of types and M is number of operations I need to operate on the data.

- I can no longer perform quick lookup if a field exists, which is used in versioning a lot. Instead I'd need to iterate all over them to find out.

- Data transformations become hard to do as I'd additionally need to specialize "callable" per parameter name, since some require special operations. This means I need to define the parameter name in multiple places without a way to enforce that.

February 10, 2015 08:17 AM

snake5

Oh, versioning. Didn't quite notice that. Sure, it's a useful feature.

Doesn't seem to take much effort though to be done with templates. Simply serializing an extra byte for the version number generally does the trick.

More code generated due to templates since I no longer need one class generated per type, but I'd need NxM classes where N is number of types and M is number of operations I need to operate on the data.

It sounds a bit scary until it becomes apparent that N<100, M<5 and the average optimized function size is <200 bytes. Probably less than 1% of game code (unlike array templates that are almost omnipresent and take a lot more space).

I can no longer perform quick lookup if a field exists, which is used in versioning a lot. Instead I'd need to iterate all over them to find out.

That's true, however you can index your fields once, at application startup.

Data transformations become hard to do as I'd additionally need to specialize "callable" per parameter name, since some require special operations. This means I need to define the parameter name in multiple places without a way to enforce that.

You can pass additional info (like flags) to callable, same as with your addPlainField function. There's no difference here between the systems. Besides, templates allow operator overloading so you could call the callable with more arguments than usually, or with enum arguments to select the necessary overload.

P.S. Serialization essentially is the same thing as iterating over fields, just with a different callable passed. Minor differences exist in implementations but the general idea is the same.

February 10, 2015 01:49 PM

BearishSun

The system does versioning on a per-field basis, not with a single version number, so serialization ends up being a bit more complex than just iterating over the fields. Additionally the system saves/restores references and handles arrays which are all an additional layer of complexity.

These are all the features your "callable" would need to support - and with data transformations it appears I would need many different versions of "callable", each calling the high level code that handles these features, which is messy even if I encapsulate that code as much as possible since I'm relying on the child implementation to do something that should be transparent.

And if I index the fields at application start up then I end up with pretty much the same field layout I have now

You should take a look at "Plain type specializations" chapter. Since you were unaware of the high level features of the RTTI system like versioning in your original post, that is pretty much the system you are describing using templates. It doesn't support the high level features but it's light weight and more customizable. (You can add versioning by adding an extra byte as you say, although I am not a fan of that approach)

February 10, 2015 02:51 PM

snake5

The system does versioning on a per-field basis, not with a single version number, so serialization ends up being a bit more complex than just iterating over the fields.

Precisely, it's complex. And I'm not sure for what reason. Fields can be added, removed, their meanings can change as well. A single byte is capable of recording of up to 255 meaningful changes. Your approach uses more bytes and requires manual assignment of field IDs. More than one integer to manage.

Well, I won't go into much detail here since my position is straightforward - I support the approach that greatly reduces engine code at the cost of some extra user code. I'm aware it's not exactly popular.

P.S. Your 'fromMemory' function appears to have one unnecessary allocation, if the String class copies the string to its own buffer.

February 10, 2015 03:59 PM

BearishSun

It's the same versioning system that Google protobuf uses and from experience it is very simple and easy to work with compared to more traditional methods. Main advantage of it is that it doesn't require code that specially handles older versions - I wanted a very general purpose system that requires only the base definitions from the user and handles the rest, sacrificing a byte per field definition was an acceptable price to pay. Those that do not wish to pay it can use the plain type specializations.

Thanks for the heads up on the string allocation.

February 10, 2015 04:24 PM

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

Featured Tutorial

Learn about the design and implementation details of the run-time type information system used by the Banshee Engine. This article is a part of a larger series relating to the Banshee Engine.

Banshee Engine Architecture - RTTI & Serialization

Series introduction

Introduction

RTTI implementation

RTTIType

RTTI fields

Binary serializer implementation

Encoding

Decoding

Binary data structure

Source code

Conclusion

Comments

Featured Tutorial

Recommended Tutorials

Other Tutorials by BearishSun

Banshee Engine Architecture - RTTI & Serialization

Series introduction

Introduction

RTTI implementation

RTTIType

RTTI fields

Binary serializer implementation

Encoding

Decoding

Binary data structure

Source code

Conclusion

Comments

Featured Tutorial

Recommended Tutorials

Other Tutorials by BearishSun

Reticulating splines