The Unreal Engine is a game engine developed by Epic Games, first showcased in the 1998 first-person shooter game Unreal. Although primarily developed for first-person shooters, it has been successfully used in a variety of other genres, including stealth, MMORPGs, and other RPGs. Its code is written in C++ and it's used by many game developers today. Its source code is available for free from GitHub. Many amazing games were developed using this engine, it permits developers to produce very realistic renderings like this one.
What's the source code executed behind the scene to produce this realistic rendering? It's very interesting to go inside this powerful game engine and discover how it's designed and implemented. C++ developers could learn many good practices from its code base. Let's XRay its source code using CppDepend and CQLinq to explore some design and implementation choices of its developement team.
1- Namespaces
Unreal Engine uses namespaces widely for three main reasons:
- Many namespaces contain only enums as shown by this following CQLinq query, which gives us the ones containing only enums.
In a large project, you would not be guaranteed that two distinct enums don't both get called with the same name. This issue was resolved in C++11, using enum class which implicitly scope the enum values within the enum's name.
- Anonymous namespace: Namespace with no name avoids making global static variable. The "anonymous" namespace you have created will only be accessible within the file you created it in. Here it is the list of all anonymous namespaces used:
- Modularizing the code base: Let's search for all the other namespaces, i.e. neither the anonymous ones nor the ones containing only enums:
The namespaces represent a good solution to modularize the application; Unreal Engine defines more than 250 namespaces to enforces its modularity, which makes the code more readable and maintainable.
2- Paradigm used
C++ is not just an object-oriented language. As Bjarne Stroustrup points out, "C++ is a multi-paradigmed language." It supports many different styles of programs, or paradigms, and object-oriented programming is only one of these. Some of the others are procedural programming and generic programming.
2-1 Procedural Paradigm
2-1-1 Global functions
Let's search for all global functions defined in the Unreal Engine source code:
We can classify these functions in three categories:
1 - Utility functions: For example 6344 of them concern Z_Construct_UXXX functions, which are used to create instances needed by the engine.
2 - Operators: Many operators are defined as it is shown, by the result of this CQLinq query:
Almost all kinds of operators are implemented in the Unreal Engine source code.
3 - Functions related to the engine logic: Many global functions containing some engine treatments are implemented. Maybe these kinds of functions could be grouped by category, as static methods into classes, or grouped in namespaces.
2-1-2 Static global functions:
It's a best practice to declare a global function as static unless you have a specific need to call it from another source file.
Many global functions are declared as static, and as specified before, other global functions are defined inside anonymous namespaces
2-1-3 Global functions candidate to be static.
Global not exported functions, not defined in an anonymous namespace and not used by any method outside the file where they were defined. These are good candidates to be refactored to be static.
As we can observe some global functions are candidates to be refactored to be static.
2-2 Object Oriented paradigm
2-2-1 Inheritance
In object-oriented programming (OOP), inheritance is a way to establish Is-a relationship between objects. It is often confused as a way to reuse the existing code which is not a good practice because inheritance for implementation reuse leads to Tight Coupling. Re-usability of code is achieved through composition (Composition over inheritance). Let's search for all classes having at least one base class:
And to have a better idea of the classes concerned by this query, we can use the Metric View.
In the Metric View, the code base is represented through a Treemap. Treemapping is a method for displaying tree-structured data by using nested rectangles. The tree structure used in a CppDepend treemap is the usual code hierarchy:
- Projects contain namespaces.
- Namespaces contain types.
- Types contain methods and fields.
The treemap view provides a useful way to represent the result of a CQLinq request; the blue rectangles represent this result, so we can visually see the types concerned by the request.
As we can observe, the inheritance is widely used in the Unreal Engine source code.
Multiple Inheritance: Let's search for classes inheriting from more than one concrete class.
The multiple inheritance is not widely used, only a few classes inherit from more than one class.
2-2-2 Virtual methods
Let's search for all virtual methods defined in the Unreal Engine source code:
Many methods are virtual, and some of them are pure virtual:
As with the procedural paradigm, the OOP paradigm is also widely used in the Unreal Engine source code. What about the generic programming paradigm?
2-3 Generic Programming
C++ provides unique abilities to express the ideas of Generic Programming through templates. Templates provide a form of parametric polymorphism that allows the expression of generic algorithms and data structures. The instantiation mechanism of C++ templates insures that when a generic algorithm or data structure is used, a fully-optimized and specialized version will be created and tailored for that particular use, allowing generic algorithms to be as efficient as their non-generic counterparts.
2-3-1 Generic types:
Let's search for all genric types defined in the engine source code:
Only a few types are defined as generic. Let's search for generic methods:
More than 40000 methods are generic; they represent more than 25% of the methods implemented. To resume the Unreal Engine source code, mix between the three paradigms.
3- PODs to define the data model
In object-oriented programming, plain old data (POD) is a data structure that is represented only as passive collections of field values (instance variables), without using object-oriented features. In computer science, this is known as passive data structure Let's search for the POD types in the Unreal Engine source code
More than 2000 types are defined as POD types, many of them are used to define the engine data model.
4- Gang Of Four design patterns
Design Patterns are a software engineering concept describing recurring solutions to common problems in software design. Gang of four patterns are the most popular ones. Let's discover some of them used in the Unreal Engine source code. 4-1 Singleton The singleton is the most popular and the most used one. Here are some singleton classes defined in the source code:
TThreadSingleton is a special version of singleton. It means that there is created only one instance for each thread. Calling its method Get() is thread-safe. 4-2 Factory Using factory is interesting to isolate the logic instantiation and enforces the cohesion; here is the list of factories defined in the source code:
And here's the list of the abstract ones:
4-3 Observer The observer pattern is a software design pattern in which an object maintains a list of its dependents, called observers, and notifies them automatically of any state changes, usually by calling one of their methods. They are some observers implemented in its source code, FAIMessageObserver is one of them. Here's a dependency graph to show the call of the OnMessage method of this observer:
4-4 Command The command pattern is a behavioral design pattern in which an object is used to represent and encapsulate all the information needed to call a method at a later time. Four terms always associated with the command pattern are command, receiver, invoker and client. A command object has a receiver object and invokes a method of the receiver in a way that is specific to that receiver's class. Here's for example all commands inheriting from the IAutomationLatentCommand:
5- Coupling and Cohesion
5-1 Coupling
Low coupling is desirable because a change in one area of an application will require fewer changes throughout the entire application. In the long run, this could alleviate a lot of time, effort, and cost associated with modifying and adding new features to an application.
Low coupling could be acheived by using abstract classes or using generic types and methods.
Let's search for all abstract classes defined in the Unreal Engine source code :
Only a few types are declared as abstract. The low coupling is more enforced by using generic types and generic methods. Here's for example the methods using at least one generic method:
As we can observe many methods use the generic ones, the low coupling is enforced by the function template params. Indeed the real type of these parameters could change without changing the source code of the method called. 5-2 Cohesion
The single responsibility principle states that a class should not have more than one reason to change. Such a class is said to be cohesive. A high LCOM value generally pinpoints a poorly cohesive class. There are several LCOM metrics. The LCOM takes its values in the range [0-1]. The LCOM HS (HS stands for Henderson-Sellers) takes its values in the range [0-2]. A LCOM HS value higher than 1 should be considered alarming. Here are how to compute LCOM metrics:
LCOM = 1 - (sum(MF)/M*F) LCOM HS = (M - sum(MF)/F)(M-1)
Where:
- M is the number of methods in class (both static and instance methods are counted, it includes also constructors, properties getters/setters, events add/remove methods).
- F is the number of instance fields in the class.
- MF is the number of methods of the class accessing a particular instance field.
- Sum(MF) is the sum of MF over all instance fields of the class.
The underlying idea behind these formulas can be stated as follows: a class is utterly cohesive if all its methods use all its methods use all its instance fields, which means that sum(MF)=M*F and then LCOM = 0 and LCOMHS = 0.
LCOMHS values higher than 1 should be considered alarming.
Only some types are considered as not cohesive.
6- Immutability, Purity and side effect
6-1 Immutable types
Basically, an object is immutable if its state doesn't change once the object has been created. Consequently, a class is immutable if its instances are immutable.
There is one important argument in favor of using immutable objects: It dramatically simplifies concurrent programming. Think about it, why is writing proper multithreaded programming a hard task? Because it is hard to synchronize threads access to resources (objects or others OS resources). Why is it hard to synchronize these accesses? Because it is hard to guarantee that there won't be race conditions between the multiple write accesses and read accesses done by multiple threads on multiple objects. What if there are no more write accesses? In other words, what if the state of the objects accessed by threads, doesn't change? There is no more need for synchronization!
Another benefit of immutable classes is that they can never violate LSP (Liskov Subtitution Principle) , here's a definition of LSP quoted from its wiki page:
Liskov's notion of a behavioral subtype defines a notion of substitutability for mutable objects; that is, if S is a subtype of T, then objects of type T in a program may be replaced with objects of type S without altering any of the desirable properties of that program (e.g., correctness).
Here's the list of immutable types defined in the source code:
6-2 purity and side effect The primary benefit of immutable types come from the fact that they eliminate side-effects. I couldn't say it better than Wes Dyer so I quote him: We all know that generally it is not a good idea to use global variables. This is basically the extreme of exposing side-effects (the global scope). Many of the programmers who don't use global variables don't realize that the same principles apply to fields, properties, parameters, and variables on a more limited scale: don't mutate them unless you have a good reason.(...) One way to increase the reliability of a unit is to eliminate the side-effects. This makes composing and integrating units together much easier and more robust. Since they are side-effect free, they always work the same no matter the environment. This is called referential transparency. Writing your functions/methods without side effects - so they're pure functions, i.e. not mutate the object - makes it easier to reason about the correctness of your program. Here's the list of all methods without side-effects
More than 125 000 methods are pure.
7- Implementation quality
7-1 Too big methods
Methods with many number of lines of code are not easy to maintain and understand. Let's search for methods with more than 60 lines.
Unreal Engine source code contains more than 150 000 methods, so less than 1% could be considered as too big.
7-2 Methods with many parameters
Few methods have more than 8 parameters, most of them are generic, to avoid defining variadic functions, like the case of TCStringt::Snprintf methods.
7-3 Methods with many local variables
Less than 1% have many local variables.
7-4 Methods too complex
Many metrics exist to detect complex functions, NBLinesOfCode, Number of parameters and number of local variables are the basic ones.
There are other interesting metrics to detect complex functions:
- Cyclomatic complexity is a popular procedural software metric equal to the number of decisions that can be taken in a procedure.
- Nesting Depth is a metric defined on methods that is relative to the maximum depth of the more nested scope in a method body.
- Max Nested loop is equals the maximum level of loop nesting in a function.
The max value tolerated for these metrics depends more on the team choices, there are no standard values.
Let's search for methods that could be considered as complex in the Unreal Engine code base.
Only 1.5% are candidate to be refactored to minimize their complexity.
7-4 Halstead complexity Halstead complexity measures are software metrics introduced by Maurice Howard Halstead in 1977. Halstead made the observation that metrics of the software should reflect the implementation or expression of algorithms in different languages, but be independent of their execution on a specific platform. These metrics are therefore computed statically from the code. Many metrics were introduced by Halstead, let's take as example the TimeToImplement one, which represents the time required to program a method in seconds.
1748 methods require more than one hour to be implemented.
8- RTTI
RTTI refers to the ability of the system to report on the dynamic type of an object and to provide information about that type at runtime (as opposed to at compile time). However, RTTI has become controversial within the C++ community. Many C++ developers choose to not use this mechanism. What about Unreal Engine developers team?
No method uses the dynamic_cast keyword, The Unreal Engine team chose to not use the RTTI mechanism.
9- Exceptions
Exception handling is also another controversial C++ feature. Many known open source C++ projects do not use it. Let's search whether in the Unreal Engine source code an exception was thrown.
Exceptions are thrown in some methods; let's take as example the RaiseException one:
As specified in their comments, the exception could be generated for the header tool, but in normal runtime code they don't support exception handling.
10- Some final statistics
10-1 most popular types It's interesting to know the most used types in a project; indeed these types must be well designed, implemented and tested. And any change occuring to them could impact the whole project. We can find them using the TypesUsingMe metric:
However there's another interesting metric to search for popular types: TypeRank. TypeRank values are computed by applying the Google PageRank algorithm on the graph of types' dependencies. A homothety of center 0.15 is applied to make it so that the average of TypeRank is 1. Types with high TypeRank should be more carefully tested because bugs in such types will likely be more catastrophic. Here's the result of all popular types according to the TypeRank metric:
10-2 Most popular methods
10-3 Methods calling many other methods It's interesting to know the methods using many other ones, It could reveal a design problem in these methods. And in some cases a refactoring is needed to make them more readable and maintainable.