Static Reflection and Serialization in C++

While C++ is extremely powerful, it lacks a lot of the creature comforts that higher-level languages provide. One of these features that is not currently supported in the current standard revision (as of C++ 20) is “reflection”.

Reflection is defined as “the ability of a process to examine, introspect, and modify its own structure and behavior.” [1] Here’s an example of reflection in C#:

// Using GetType to obtain type information:
int i = 42;
Type type = i.GetType();
Console.WriteLine(type);

With reflection, you’re able to pull data such as type and name information at runtime! By default, C++ does not save this data after compilation and is not available to a programmer at runtime. 

Reflection also allows for useful functionality such as serialization. In order to properly serialize an object, its type and name information must be known at runtime and since this information isn’t kept after compilation in C++, serialization isn’t possible without the help of some macro magic beforehand.

The following work is based on an excellent video from CppCon 2018 about the subject:

While the presenter is more focused on making an automated reflection solution, this post will focus more on integrating reflection with serialization to a JSON string in C++. Let’s explore what it takes to make that happen!

SETUP

Before we can make a serialization function, we first need to make the necessary data available to us during execution of our program. Let’s start by saving type information:

We can define a basic struct that will save this information:

struct Type {
  std::string stringName; 
  TypeName enumName; 
  size_t size;
};

In our struct, we are saving both the readable string name of our type in addition to its size. I added a custom “TypeName” enum that will allow us to more easily determine what type we’re looking at.

Once this is done, we can make a macro that will allow us to quickly instantiate new types as needed.

#define DEFINE_TYPE(TYPE) \
template<> \
Type* GetType<TYPE>() { \
static Type type; \
  type.stringName = #TYPE; \
  type.size = sizeof(TYPE); \
  type.enumName = TypeName::TYPE; \
  return &type; \
}\

Here’s how it’s used for a few primitive types:

DEFINE_TYPE(int8_t)
DEFINE_TYPE(int16_t)
DEFINE_TYPE(int32_t)
DEFINE_TYPE(uint8_t)
DEFINE_TYPE(uint16_t)
DEFINE_TYPE(uint32_t)

Now that we have some types defined, we can create another struct for member variables whose metadata we require access to at runtime: 

struct Field {
  Type* type;
  std::string name;
  size_t offset; 
};

// MAX_NUMBER_OF_FIELDS is arbitrarily large
struct Class {
  std::array<Field, MAX_NUMBER_OF_FIELDS> fields;
};

And some macros to quickly instantiate the required data:

#define BEGIN_ATTRIBUTES_FOR(CLASS)  \
template<> \
Class* GetClass<CLASS>() { \
  using ClassType = CLASS; \
  static Class localClass; \
  enum { BASE = __COUNTER__ }; \

#define DEFINE_MEMBER(NAME)  \
  enum { NAME##Index = __COUNTER__ - BASE - 1}; \
  localClass.fields[NAME##Index].type = GetType<decltype(ClassType::NAME)>();\
  localClass.fields[NAME##Index].name = { #NAME };  \
  localClass.fields[NAME##Index].offset = offsetof(ClassType, NAME);\

#define END_ATTRIBUTES \
  return &localClass; \
}\

There’s a few details here that might require further explanation:

With the exception of basic C macro functionality such as concatenation and stringification with the “##” and ‘#’ symbols respectively,  __COUNTER__ and offsetof() are relatively uncommon.

The __COUNTER__ macro creates integral values starting from zero during compilation and every time it appears in a file, it gets replaced with its previous value plus one. This is used here to create an index that is both unique to the member being defined, increments upon every new member in a class, and resets to 0 for every new class being defined in a file.

The offsetof() macro returns an integral constant equal to the value in bytes from the beginning of the object that contains it. For example, calling offsetof on the first int32_t member of a class will return 0, and on the second int32_t member declared below the first will return 4.

To summarize, we’ve created a few macros that will define a specialization for our template function that returns all the reflection data we need to serialize data contained in that specific type.

Usage will look something like this:

// Struct to reflect
struct TestStruct {
	int32_t field1;
	int16_t field2;
	int8_t field3;
	uint32_t field4;
	uint16_t field5;
	uint8_t field6;
};

// Reflection macro usage
BEGIN_ATTRIBUTES_FOR(TestStruct)
DEFINE_MEMBER(field1);
DEFINE_MEMBER(field2);
DEFINE_MEMBER(field3);
DEFINE_MEMBER(field4);
DEFINE_MEMBER(field5);
DEFINE_MEMBER(field6);
END_ATTRIBUTES

With these macros, we’re simply defining a template specialization for a function that we’ll call in a further function to provide us with all the information we’ll need about our type to properly serialize/de-serialize it.

Now we need to create another template function that can use the data we’ve defined to create a JSON string. To do this for any type, we can make some generic statements that just need the result of the offsetof() macro to find the location in memory where the data is being stored.

template<typename T>
std::string SerializeObject(T& arg) {
  const Class* objectInfo = GetClass<T>();
  rapidjson::Document document;
  rapidjson::Value key; 
  rapidjson::Value value; 

  document.SetObject(); 

  for (const auto& field : objectInfo->fields) {
    if (field.type == nullptr) break;

    key.SetString(field.name.c_str(), field.name.size(), document.GetAllocator());
    int8_t* source = reinterpret_cast<int8_t*>(&arg) + field.offset;

    switch (field.type->enumName) {
      case TypeName::int8_t:
      case TypeName::int16_t:
      case TypeName::int32_t: {
        int32_t destination = 0;
        memcpy(&destination, source, field.type->size);
        value.SetInt(destination);
        break;
      }

      case TypeName::uint8_t:
      case TypeName::uint16_t:
      case TypeName::uint32_t: {
        uint32_t destination = 0; 
        memcpy(&destination, source, field.type->size);
        value.SetUint(destination);
        break;
      }

      default:
        assert(false);
        break;
      }
    
    document.AddMember(key, value, document.GetAllocator());
  }

  rapidjson::StringBuffer buffer;
  rapidjson::Writer<rapidjson::StringBuffer> writer(buffer);
  document.Accept(writer);

  return buffer.GetString();
}

I’m using RapidJson for JSON creation this example. In it, we have a for loop that checks every valid index entry which should correspond to all the primitive variables in our struct we want to serialize. More types than this can be supported, but for this example I’ve only added support for some basic signed and unsigned integer types.

In order to access our data, we must first find the exact location in memory our variable exists in. This statement achieves that:

int8_t* source = reinterpret_cast<int8_t*>(&arg) + field.offset;

The reinterpret_cast always casts to a 1 byte type and is needed to guarantee that adding the result of the offsetof() macro to our function argument is done byte by byte from the base address of our argument instead of from the base address plus the size of the type of the argument.

((base address + offsetof()) instead of (base address + sizeof(arg) + offsetof())

Once this is done for every defined member, the resulting JSON string is returned and just like that, we’ve achieved basic reflection and serialization!

// Successful unit test of "SerializeObject" function
TEST(SerializationTest, SerializeTest) {
 TestStruct test { 1, 2, 3, 4, 5, 6 };

 auto result = SerializeObject(test); 

 EXPECT_EQ(result, "{\"field1\":1,\"field2\":2,\"field3\":3,\"field4\":4,\"field5\":5,\"field6\":6}");
}

Deserialization

For deserialization, the process is a little different but the idea is the same. Instead of returning a string, we return a reflected type with a JSON string as the function argument.

template<typename T>
T DeserializeObject(const std::string& json) {
  const Class* objectInfo = GetClass<T>();
  T result; 
  rapidjson::Document document; 
  document.Parse(json.c_str());

  for (const auto& field : objectInfo->fields) {
    if (field.type == nullptr) break;
    if (document.HasMember(field.name.c_str()) && 
      (document[field.name.c_str()].IsInt() || document[field.name.c_str()].IsUint())) {
      auto* destination = reinterpret_cast<int8_t*>(&result) + field.offset;
      auto source = document[field.name.c_str()].GetInt(); 

      memcpy(destination, &source, field.type->size);
    }
  }

  return result; 
}

In the function body, we check all the fields defined in our reflection data for the template type and copy it from the JSON string. Since we’re just copying primitive integers for this example, we don’t need to switch based on the type being copied as we can just copy the bytes directly.

// Successful unit test for "DeserializeObject" function 
TEST(SerializationTest, DeserializeTest) {
  std::string testJsonData = "{\"field1\":1,\"field2\":2,\"field3\":3,\"field4\":4,\"field5\":5,\"field6\":6}";

  TestStruct result = DeserializeObject<TestStruct>(testJsonData);

  EXPECT_EQ(result.field1, 1);
  EXPECT_EQ(result.field2, 2);
  EXPECT_EQ(result.field3, 3);
  EXPECT_EQ(result.field4, 4);
  EXPECT_EQ(result.field5, 5);
  EXPECT_EQ(result.field6, 6);
}

Unfortunately since we’re passing a string type into the function, we can’t rely on template type deduction to decide on what version of the template to invoke so it must be explicitly stated.

In conclusion, we’ve made a flexible reflection system that can be used to serialize/de-serialize a struct or class with some primitive types in C++. It can be expanded to other primitive types such as floats and even custom types with some modifications to the type setup and serialization/de-serialization function.

While not as powerful as the solution that comes build into languages such as C#, this is relatively lightweight and is more than enough to get a program written entirely in C++ serializing and de-serializing its data to remote destinations quickly and easily with JSON strings!

As always, source code can be found in my Github linked here and in the sidebar.

Sources:

[1] – https://en.wikipedia.org/wiki/Reflective_programming

1 Comment

Add Yours →

Thanks for the article, this implementation is very simple to understand and incredibly functional. I made a small modification to allow reflect private members. I encapsulated GetClass (only the dummy method without specialization) inside a “Reflector” class as static, and made a macro:
#define ALLOW_REFLECT_PRIVATE friend class Reflector;
By defining the macro inside the class to be reflected, the private members are accessible to the GetClass method.

Leave a Reply