For this article, I'll expect you to know C++ classes, virtual methods and two design patterns known as abstract factory and observer. I also have a working, feature-complete implementation of the design discussed in this article that you can download near the end.
Game developers are often confronted with the necessity of supporting multiple graphics APIs. Whereas in the past, it was just a cool feature that at best helped offset bad OpenGL drivers for Windows users by giving them the coice to run on Direct3D, nowadays there are several systems with a significant market share that require you to use one specific API:
Xbox 360 Direct3D 9 (special) Win32 (classic windows) Direct3D 9, Direct3D 11 or OpenGL WinRT (Metro) Direct3D 11.1 Linux OpenGL PlayStation 3 OpenGL ES 1.0 or libgcm Android OpenGL ES 1.1 or 2.0 (via NDK) iOS (iPhone, iPad) OpenGL ES 1.0 or 2.0 (3rd gen)Windows 8 and Microsoft's tablet PCs are not released yet, of course, but I included WinRT because it is likely to gain a significant market share before long (polls [1], [2], [3], [4] suggest an adoption rate of around 30%). Given about 1 billion active Windows users [5], [5], [6] and conservatively writing off 90% of that number as either business PCs or living-room-corner PCs, you're still left with 10% of 30% of 1 billion, equalling 30 million users able to visit the Windows Store.
Big budget games are forced to support two graphics APIs by the presence of the Xbox 360 and the PlayStation 3, whereas for indie game developers it's the iOS, Android and likely soon Windows 8 platforms that are important due to their outreach to casual gamers.
So how do you design a game with support for multiple graphics APIs and platforms?
Platform Independence via Abstract Factories
The typical approach is to create a set of platform-neutral interfaces (abstract classes for us C++ programmers) containing the required functionality in a way that is not specific to any API. As an example, such an interface for an index buffer might look like this:
// Platform-neutral index buffer interface
class IndexBuffer {
public: virtual ~IndexBuffer() {}
public: std::size_t GetCapacity() const = 0;
public: virtual void Write(const std::uint16_t *indices, std::size_t count) = 0;
};
The game code will then work with these platform-neutral interfaces, not knowing the actual implementation (= derived class) it is actually controlling. To create new instances of the classes implementing these interfaces, an abstract factory needs to be used, often integrated into a bigger Renderer
interface that is also responsible for rendering polygons based on the graphics resources it creates.
Note: I use the term "graphics resource" for any objects the renderer needs in order to draw something, like index buffers, vertex buffers, textures and shaders.
A simple renderer interface using the abstract factory design might look like this:
// Platform-neutral renderer interface, also acts as resource factory
class Renderer {
public: virtual ~Renderer() {}
public: IndexBuffer *CreateIndexBuffer() = 0;
public: void SelectIndexBuffer(IndexBuffer *indexBuffer) = 0;
public: void DrawIndexed(std::size_t startIndex, std::size_t count) = 0;
};
This happens to be exactly what Ogre3D, Irrlicht and many of the ludicrously expensive big-budget engines do. Here are a few real-world examples of such interfaces from aforementioned open source engines:
It's a valid design that completely hides the platform-specific code and also results in generally good performance.
For the performance aficionados: given that vtable calls are a staple of any kind of software, even CPUs are optimized for the resulting memory access patterns. If you google for virtual method call benchmarks, you will likely come across a paper titled "The Direct Cost of Virtual Function Calls in C++" that estimates the overhead at an average of 2.8 CPU cycles per virtual method call compared to a direct method call. Unity developer Aras Pranckevi?ius has also put some thought into this topic and even came up with a few tricks to reduce the number of memory accesses by replacing the vtable with function pointers stored as part of the class: The Virtual and No-Virtual.
The abstract factory based design is likely the most efficient option available for abstracting a platform-specified graphics API, at least next to writing different renderers with exchangeable APIs and using preprocessor macros to switch between classes. But there are some drawbacks, too. Let's analyze the problems.
-
Reference Passing: Any utility class that wishes to work with graphics resources now needs a reference to the renderer.
Your image loading code has to use the renderer to create a texture instance to decode the image into (or use an intermediate image class involving a redundant megabyte-sized memory copy operation). The same goes for model loading, any kind of dynamic geometry creation, font caching and so on.
It's no wonder the Singleton pattern is often found in renderers following this design.
-
Destruction Order: All graphics resources need to be destroyed before the renderer terminates.
Failure to do so might result in crashes during shudown (if the resource tries to activate a context in order to destroy its graphics API objects) or in the graphics API not shutting down completely (if reference counting or another form of shared ownership is used).
-
Resource Replacing: When the renderer is replaced or the renderer's internal graphics device needs to be reset, your code has to seek out any and all graphics resource owners and tell them to first destroy and then rebuild any graphics resources they were referencing.
This pushes a lot of additional complexity into any class that accesses graphics resources instead of hiding it away. Games often go for a compromise, letting the graphics resource classes (
Direct3DIndexBuffer
andDirect3DVertexBuffer
in the above diagram) handle a device reset internally, but this of course doesn't help if you're switching from Direct3D 9 to Direct3D 11 or OpenGL.In case you wondered why changing detail settings is commonly a non-issue for games but switching to a different graphics API requires a restart, now you know.
-
Renderer Mismatches: This is not a common issue, luckily, but from a design perspective, nothing prevents you from attempting to select an
IndexBuffer
created by theDirect3DRendererAndFactory
into anOpenGLRendererAndFactory
.A more likely environment for this to happen in is a level editor: you have 4 different views. Depending on the graphics API you use and whether your editor is based on an in-game GUI or a windowing toolkit, you will have to create 4 separate renderers. The resources created by these renderers have to be kept strictly separated.
Catching such mismatches early on already requires 2 separate checks: checking if a selected graphics resource actually is of the required type (is the downcast safe?) and checking if it actually belongs to the renderer it is being selected into.
And then you have cool features like graphics resource sharing in Direct3D 11. With the abstract factory design, even the complexity of deciding whether to share resources or create separate ones for each view is pushed into the classes accessing the renderer.
What if there was another design that solved all these problems with just a minimal amount of additional overhead?
There is!
Platform Independence via Observers
In an observer-based design, graphics resources are not created via an abstract factory. Instead, they are concrete classes with plain constructors that do not depend on any renderer:
class IndexBuffer {
public: IndexBuffer(std::size_t capacity) :
capacity(capacity),
indices(new std::uint16_t[capacity]) {}
public: std::size_t GetCapacity() const { return this->capacity; }
public: void Write(const std::uint16_t *indices, std::size_t count) {
assert(count <= this->capacity);
std::copy_n(indices, count, this->indices.get());
}
private: IndexBuffer(const IndexBuffer &) = delete;
private: IndexBuffer &operator =(const IndexBuffer &) = delete;
private: std::size_t capacity;
private: std::unique_ptr indices;
};
When a renderer encounters a resource like the index buffer above for the first time, it will create the appropriate Direct3D ID3D11Buffer
or OpenGL IBO to be able to render polygons using the index buffer or other resource.
A design that's usable in a real-world application requires a few more things, of course. For once, the renderer should be informed when a graphics resource is destroyed so it can release the objects it created through its graphics API again. The renderer should also be informed when the observed graphics resources change, like when new data is copied into the index buffer. Finally, it should be possible the explicit let the renderer create an observer from a different thread, since some graphics APIs (Direct3D 11, for example) support free-threaded resource creation.
So what you need is a way to let graphics resources notify the renderer when they're changed or destroyed. This is where the observer pattern comes into play. Similar to a signal/slot system, which works on the method level, an observable class allows observers to be attached. These observers implement a specific interface through which they are be notified of changes occuring to the observed objects.
The common usage pattern would be to have a very low number of observers per graphics resource. If found no no clear basis for a decision between optimizing for fast attach/detach or fast notifications (most graphics resources only ever send one notification: Do something, I'm being destroyed, except for constant buffers which are regularly updated). Thus I believe an std::vector
is the best choice to store the observers: it has low overhead and a hash table or red/black tree (like std::map
) would offer no advantages.
class IndexBuffer {
class Observer {
public: virtual ~Observer() {}
public: virtual void Destroying() = 0;
public: virtual void Written(std::size_t offset, std::size_t count) = 0;
};
// ...Existing IndexBuffer methods...
public: void AttachObserver(Observer *observer) {
this->observers.push_back(observer);
}
public: void DetachObserver(Observer *observer) {
for(std::size_t index = 0; index < this->observers.size(); ++index) {
if(this->observers[index] == observer) {
this->observers.erase(this->observers.begin() + index);
break;
}
}
}
protected: void OnDestroying() {
for(std::size_t index = 0; index < this->observers.size(); ++index) {
this->observers[index]->Destroying();
}
}
protected: void OnWritten(std::size_t offset, std::size_t count) {
for(std::size_t index = 0; index < this->observers.size(); ++index) {
this->observers[index]->Written(offset, count);
}
}
// ...Existing IndexBuffer fields...
private: std::vector observers;
};
For thread-safety, either the std::vector
needs to be protected with an std::mutex
or replaced by a lock-free list implementation (this avoids having one mutex per graphics resource and would be an ideal match since lock-free algorithms perform best in low-concurrency situations like this).
Let's see where we are now.
-
Simplified Interface: The
Renderer
interface uses less methods and graphics resources can be created directly.This means the renderer no longer needs to be passed to your image and model loading code, unit tests can easily be written for graphics resources and for algorithms working with graphics resources and there's no need for singletons to be created.
-
Independent Destruction: Graphics resources are no longer owned by the renderer and destruction order is no longer important.
If a graphics resource is destroyed while the renderer is still active, its
Destroying()
notification will be sent to the observer, allowing its graphics API objects to be destroyed as well.If the renderer is destroyed while graphics resources are still active, the renderer will simply remove and destroy its observers and those graphics resources can be destroyed in their own time.
-
Resource Replacing: Switching to a different renderer has also become a non-issue since the game can keep using the same graphics resources even though the observers backing them might change.
The game code will at no time enter an invalid state (since all graphics resource are still valid and usable) and working with graphics resources has become as easy as working with, say, an
std::string
. -
Multiple Renderers: There is no problem attaching multiple renderers to a graphics resource, so the complexity of maintaining separate resources is removed from the game or editor code.
If a graphics API supports resource sharing, the renderer can transparently support this by using a reference counted observer that simply decrements its reference count upon receiving a
Destroying()
notification.If a graphics API does not support resource sharing, using the same graphics resource in multiple renderers simply results in each renderer attaching its own observer to the resource. Thus, any combination of resource-sharing and non-resource-sharing renderers will work.
There is one concept that does not map well to an observer-based renderer design: render targets and other resources the GPU writes to. If the game calls Read()
on a render target (maybe to save a screenshot of the current view in a saved game file), which observer should be used and what if no observers are attached?
Since all attached observers should come to the same result, the answer is simply to add a Read()
method to observers and then pick any attached observer to read from.
Benchmarks
Only one question remains: what price do you have to pay to reap these advantages?
Memory-wise, it depends. A graphics library that can function without any observers attached will need to keep a system-memory copy of all resources. This is not an issue on desktop systems, but less than ideal on mobile devices. It is thinkable to drop the system memory copy as soon as an observer is attached and reclaim it when the observer detaches -- not necessarily by reading the resource back from the observer, think about providing a functor that initializes the graphics resource, allowing you to reload a texture from flash memory or to recreate a font character lookup when necessary.
CPU-wise, the is a small amount of overhead due to the additional work involved managing and notifying observers. I wrote a small benchmark that you can download below which renders 100 times 1000 frames, each using 25 dynamic and 75 static buffers to simulate the rendering of 4096 vertices (100 * 1000 * 100 * 4096 = 40.960.000.000 vertices drawn in 10,000,000 draw calls and downcasts / observer lookups). Here are the results:
Renderer Time (x86) Time (x64) Abstract Factory 60,403 ms 59,389 ms Observer 72,790 ms (x1.205) 61,963 ms (x1.043)The overhead of an observer-based renderer seems to be in the range of 4% - 21%. Naturally, this is a comparison of the pure call dispatch, which in a game only processes between a few hundred to a few thousand calls per frame, so to apply this number to the overall performance of a game (eg. frames per second), it would need to be scaled to the percentage of time the game spends in the method call dispatch code of the renderer. If this was 1%, the game would run between 0.04% and 0.21% slower with an observer-based renderer compared to a factory-based renderer.
Download
I have written a complete observer-based graphics library with constant buffers, index buffers, vertex buffers, textures, render targets, vertex shaders and pixel shaders. It currently offers only a Direct3D 11 / 11.1 renderer that works on Windows and WinRT.
There are few rough edges left (pixel format conversion, lock-free observer management, an OpenGL implementation, more testing), but it is an efficient and complete implementation of the design explained in this article that should demonstrate the viability of this design.
Important: the code uses C++11 features, so you need at least Visual Studio 2012 RC or GCC 4.7.x to compile it (the Direct3D 11 implementation obviously will not compile with GCC, only the platform-neutral classes which are ISO C++). Compiling with Visual Studio 2010 might be possible with some changes, likely involving a hand-written header and replacing some auto
keywords and std::unique_ptr
s. I decided for myself to not support Visual Studio 2010 for personal projects anymore because I don't want to litter my code with #ifdef
s to switch between C++11 threads, Boost.Thread, TBB, POCO or pthreads.
Download
Benchmark.7z (source code missing)
ObserverBasedRendererTest.7z (source code missing)