Advertisement

C++ Metadata Reflection System

Started by June 15, 2021 11:27 PM
6 comments, last by MagForceSeven 3 years, 6 months ago

I've been mulling over the best way to implement a C++ reflection system for exposing metadata about objects and classes in order to expose class information about properties and methods to an editor interface so that it's flexible and easy to specify what properties or objects are shown by the editor when an object is clicked (currently using an ECS system so looking for a flexible solution), similar to what any other game engine like Unity or Unreal might have as part of their inspector panels and all that. Any ideas on good implementations for this?

I really love this topic because I think we had a really cool one at a place I used to work. It invoked a clang compiler and a bit of python to generate the reflection information and output it to a file. That file was then loaded by tools or the game to be leveraged for whatever was needed. I liked it because it didn't require any additional effort (it was all just based on the abstract syntax tree generated by clang) and you could do interesting things when you didn't have to have your game code compiling within your editor or other tools. I wasn't directly involved in it's development so I unfortunately don't have a lot of additional details about how to do this other than to know that it was all setup within the visual studio project settings to automatically be invoked as part of any build.

That process is pretty similar to how Unreal works. Except that Epic has written a very specific tool for it (Unreal Header Tool) and stick to dealing with their own macro markup. They do their own parsing of the source which has it's pros and cons. They get exactly the information they want, but it can be inflexible by not supporting perfectly valid C++. The markup isn't even that invasive.

The version that I've personally implemented in a hobby project involved macros, templates and pre-main initialization shenanigans. The upside was that it all happened within the single compilation pass, could easily support complex types like standard library containers and was never out of date (sort of). The downside was that it involved a separate collection of macros that had to be updated to match. It's sort of like if the UE solution split the C++ declaration from the UPROPERTY/UFUNCTION macros. The primary thing that I used it for was as part of some data loading to convert parsed json data (using a 3rd party library) into instances of the data structures that are more easily worked with. I was able to write a fairly generic loader that could work for any reflected data structure.

So for a structure that looked like this:

struct model_data : public data_definition
     std::string   display_name;
     model_size_type  model_size;
     faction_type  faction = faction_type::NEUTRAL;
     std::string   model_variant;
};

I would have to add something like

STAR_REFL_STRUCTURE_START( model_data )

	STAR_REFL_STRUCTURE_MEMBER( "display_name", std::string, display_name )
	STAR_REFL_STRUCTURE_ENUM_MEMBER( "model_size", model_size_type, model_size )
	STAR_REFL_STRUCTURE_ENUM_MEMBER( "faction", faction_type, faction )
	STAR_REFL_STRUCTURE_MEMBER( "model_variant", std::string, model_variant )

STAR_REFL_STRUCTURE_END( model_data )

in a cpp file.

It was alright for a one-man project. I could also reflect enumerations. I didn't get as far as doing anything with member functions as I switched over to Unreal 4 for work related reasons.

--Russell Aasland
--Principal Engineer
--Midsummer Studios

Advertisement

I've gone over several iterations of this and a workable approach has been to implement the following logic (the code is unchecked):

class MyClass {
	private:
		static inline const CClass& _Class = StaticClass();
	public 
		const CClass& StaticClass() { 
			static const CClass thisClass = ::detail::MakeClass<MyClass>();
			return thisClass;
			}
		const CClass& GetClass() const override { return StaticClass(); }
};

Once you hide this in a macro, it becomes quite invisible and you'll get several basic things out of it from the name of the class to the size and type traits. You can use something like ctti to help with this to some extent. The above structure basically enforces one thing: that the static fields get initialized in the correct order. _Class forces the invocation of StaticClass(), which forces the invocation of MakeClass(), which now needs MyClass to be a complete object by the time it is invoked. GetClass() is a virtual function overridden from some common base class (I use the name CKnownObject) and always needs to be present.

Code for MakeClass() gets a bit verbose but by using static storage, you can create a full map of inheritance even before main() is called.

By providing your wrapper macro with base class information you can actually update your CClass object to retain a statically linked list of CClass instances, which gives you a respectable overview of the derivation trees. The weakest link here becomes the programmer who's responsible for providing the correct macros with the correct arguments to begin with. Just remember - if you do it right, at no point do you need to use new.

Going more in-depth - the way the big engines do full reflection is by presumably invariably using external tools that run before compilation and parse and update the code after it leaves the user's eyes. In addition to transparently creating getters and setters for variables, this might mean analyzing inheritance trees and adding base classes without the user's knowledge to enable much fancier reflection mechanisms. As I see it, this is overkill unless you're writing a large commercially relevant product that people with limited programming skills and/or time need to interact with. For a homebrew project you can still do a lot with no special tools.

For instance, if you need information about certain member fields that you want to expose to your editor, you might require the programmer to expose them by using a special macro, which registers the offset of the variable (using something like offsetof) in a presumably statically managed list, which you can then use to get raw addresses to variables, which can then be updated either via a specialized cast lambda or a simple memcpy. If you look closely, you get all of this information out of something like REFLECT_FIELD(int, cats). I use a similar mechanism for more complex “properties” that are actually allocated per class and provide additional features like thread-safety.

I shed some light on how I do it in this thread.

Writing code that creates a one-time static layout of a class's members indicated by the programmer takes a bit of work, but is not that involved. For instance, I allow the programmer to use the CSEARCHABLE_ATTRIBS(…) macro to provide the names of the member fields that need to reflection. The macro simply defines a new static function, which expands each field to a name and offset (I explain in the linked thread how I go about naming). I force the function to be called before main() by creating a static inline boolean just like _Class in the above code.

After that you need a mechanism for interacting with each type. The details here depend on whether you need your fields for serialization, metadata or access from an editor.

I don't know - this is a big topic, but I hope this helped you at least a little bit!

Overall, if you want true reflection, then C++ is probably one of the silliest languages to go with. It simply wasn't designed with that in mind, so whatever you come up with is technically a hack.

@MagForceSeven Yeah my first thoughts when fishing for implementations were somewhere along the same lines, using macros and templates to basically mirror the structure of the class/struct. Obvious issue with that is it becomes less maintainable the more you use it, since you're basically duplicating effort when writing classes, and you can (but probably shouldn't) have pretty large classes that can become a nightmare with this type of system.

I'm ultimately looking for something that's more akin to what Unreal Header Tool does or what Unity property attributes do, where you simply mark a method/field/class with certain metadata attributes, but I think you're right in that it does seem like I'd need an additional syntax analysis tool or step in the build process in order to generate this information into a separate file that I can read in at startup when building with say, a --metadata flag of some sort. After a few days of research this seems like the most standardized way to do it, but I was really wondering if there's any solutions out there that cut the middle man and generate all this information on the fly with some clever template meta-programming or something similar, without the need for a separate metadata output file. Though it seems unreasonable to expect this from modern compilers unless they themselves had built in options to output metadata based on language attributes, which the C++ standards committee never seemed to prioritize.

@irreversible That's definitely an interesting way to go about it. Some sort of controlled static storage initialization code does seem like a clever solution, coupled with readable macros. I'll look into this for sure.

I tinkered on something similar for our project and it ended up with somehow combining the above solutions into one. First I added a template helper function which makes use of the

__PRETTY_FUNCTION__

and similar macros (to support MSVC, GCC and clang) to get the function signature as a string. I then have some other macros to remove all unnecessary parts in order to get the pure (compiler dependent) type name of whatever the template type is.

#if defined(__clang__)
#define SIGNATURE __PRETTY_FUNCTION__
#elif defined(__GNUC__)
#define SIGNATURE __PRETTY_FUNCTION__
#elif defined(_MSC_VER)
#define SIGNATURE __FUNCSIG__
#endif

#if defined(__clang__)
#define SIGNATURE_PREFIX "char *si() [T = "
#define SIGNATURE_SUFFIX "]"
#elif defined(__GNUC__)
#define SIGNATURE_PREFIX "char* si() [with T = "
#define SIGNATURE_SUFFIX "]"
#elif defined(_MSC_VER)
#define SIGNATURE_PREFIX "char *__cdecl si<"
#define SIGNATURE_SUFFIX ">(void)"
#endif

#define SIGNATURE_LEFT (sizeof(SIGNATURE_PREFIX) - 1)
#define SIGNATURE_RIGHT (sizeof(SIGNATURE_SUFFIX) - 1)

template<int Size> struct TypeIdentifier
{
    public:
        char value[Size];
        inline TypeIdentifier(const char* identifier)
        {
            Runtime::memcpy(value, identifier, Size);
            value[Size - 1] = 0;
        }
};
template<typename T> inline char* si()
{
    static TypeIdentifier<sizeof(SIGNATURE) - SIGNATURE_LEFT - SIGNATURE_RIGHT> identifier = TypeIdentifier<sizeof(SIGNATURE) - SIGNATURE_LEFT - SIGNATURE_RIGHT>(SIGNATURE + SIGNATURE_LEFT);
    return identifier.value;
}

The downside is that all those strings are added to the final assembly and can be scanned for, so use for production code is a potential security risk if you rely on DRM. However, our solution is extended to another function that FNV hashes the type name to get a somewhat unique type ID for every type you want.

We've also added a parser into our build tool, which processes the code and generates type information for certain classes we want them to have. Those information is relating to an attribute similar to those C# offers. It is a simple comment

//[AttributeName(ParameterList)]

which our tool is scanning for to determine which types to generate information for, how to expose them and if there should be a serializer generated in addition.

Finally all those information are put into code and compiled along the other files of the project, using some macros and a compressed, preinitialized byte array of whatever constructor, field, property or method the type has present. This allows to make a call to a template meta function “GetType” which then returns the information generated or an empty type object.

You can for sure add type information on your own as well without the need to use the code generator.

We then are able to iterate through all type information, get constructor calls in order to create a type instance from it or have methods called dynamically. We also implemented a flex data type, which is somehow an any (or object in C#) but can have methods be called or fields/properties accessed and even allows for dynamic implemented interfaces

Advertisement

@irreversible That looks a lot like mine (insert quote about great minds, right ?). My only real issue is:

GetClass() is a virtual function overridden from some common base class

This is something that I was actively attempting to avoid when working on a solution. Not that I'm saying you're wrong for going that route, it doesn't really bother my in UnrealEngine as much as I thought it could. I was also looking for a solution that would a) keep all the reflection markup in one place and b) would look the same for other setups like enumerations.

From your description it sounds like you'd add one macro to the class and then have the other macros like REFLECT_FIELD or the attribute one someplace else. Or am I wrong? do you have some mechanism whereby those macros are also able to be placed within the class declaration and be invoked by that main call?

--Russell Aasland
--Principal Engineer
--Midsummer Studios

This topic is closed to new replies.

Advertisement