Managing Decoupling Part 4: The ID Lookup Table

Published April 24, 2013 by Niklas Frykholm, posted by gdarchive
Do you see issues with this article? Let us know.
Advertisement
Previous Articles Today I am going to dig deeper into an important and versatile data structure that pops up all the time in the BitSquid engine -- the ID lookup table. I have already talked about the advantages of using IDs to refer to objects owned by other systems, but let me just quickly recap. IDs are better than direct pointers because we don't get dangling references if the other system decides that the object needs to be destroyed. IDs are better than shared_ptr and weak_ptr because it allows the other system to reorganize its objects in memory, delete them at will and doesn't require thread synchronization to maintain a reference count. They are also POD (plain old data) structures, so they can be copied and moved in memory freely, passed back and forth between C++ and Lua, etc. By an ID I simply mean an opaque data structure of n bits. It has no particular meaning to us, we just use it to refer to an object. The system provides the mechanism for looking up an object based on it. Since we seldom create more than 4 billion objects, 32 bits is usually enough for the ID, so we can just use a standard integer. If a system needs a lot of objects, we can go to 64 bits. In this post I'm going to look at what data structures a system might use to do the lookup from ID to system object. There are some requirements that such data structures need to fulfill:
  • There should be a 1-1 mapping between live objects and IDs.
  • If the system is supplied with an ID to an old object, it should be able to detect that the object is no longer alive.
  • Lookup from ID to object should be very fast (this is the most common operation).
  • Adding and removing objects should be fast.
Let's look at three different ways of implementing this data structure, with increasing degrees of sophistication.

The STL Method

The by-the-book object oriented approach is to allocate objects on the heap and use a std::map to map from ID to object. typedef unsigned ID; struct System { ID _next_id; std::map _objects; System() {_next_id = 0;} inline bool has(ID id) { return _objects.count(id) > 0; } inline Object &lookup(ID id) { return *_objects[id]; } inline ID add() { ID id = _next_id++; Object *o = new Object(); o->id = id; _objects[id] = o; return id; } inline void remove(ID id) { Object &o = lookup(id); _objects.erase(id); delete &o } }; Note that if we create more than four billion objects, the _next_id counter will wrap around and we risk getting two objects with the same ID. Apart from that, the only problem with this solution is that it is really inefficient. All objects are allocated individually on the heap, which gives bad cache behavior and the map lookup results in tree walking which is also bad for the cache. We can switch the map to a hash_map for slightly better performance, but that still leaves a lot of unnecessary pointer chasing.

Array With Holes

What we really want to do is to store our objects linearly in memory, because that will give us the best possible cache behavior. We can either use a fixed size array Object[MAX_SIZE] if we know the maximum number of objects that will ever be used, or we can be more flexible and use a std::vector.
If you care about performance and use std::vector you should make a variant of it (call it array for example) that doesn't call constructors or initializes memory. Use that for simple types, when you don't care about initialization. A dynamic vector buffer that grows and shrinks a lot can spend a huge amount of time doing completely unnecessary constructor calls.
To find an object in the array, we need to know its index. But just using the index as ID is not enough, because the object might have been destroyed and a new object might have been created at the same index. To check for that, we also need an id value, as before. So we make the ID type a combination of both: struct ID { unsigned index; unsigned inner_id; }; Now we can use the index to quickly look up the object and the inner_id to verify its identity. Since the object index is stored in the ID which is exposed externally, once an object has been created it cannot move. When objects are deleted they will leave holes in the array. id_lookup_1.png When we create new objects we don't just want to add them to the end of the array. We want to make sure that we fill the holes in the array first. The standard way of doing that is with a free list. We store a pointer to the first hole in a variable. In each hole we store a pointer to the next hole. These pointers thus form a linked list that enumerates all the holes. id_lookup_2.png An interesting thing to note is that we usually don't need to allocate any memory for these pointers. Since the pointers are only used for holes (i. e. dead objects) we can reuse the objects' own memory for storing them. The objects don't need that memory, since they are dead. Here is an implementation. For clarity, I have used an explicit member next in the object for the free list rather than reusing the object's memory: struct System { unsigned _next_inner_id; std::vector _objects; unsigned _freelist; System() { _next_inner_id = 0; _freelist = UINT_MAX; } inline bool has(ID id) { return _objects[id.index].id.inner_id == id.inner_id; } inline Object &lookup(ID id) { return _objects[id.index]; } inline ID add() { ID id; id.inner_id = _next_inner_id++; if (_freelist == UINT_MAX) { Object o; id.index = _objects.size(); o.id = id; _objects.push_back(o); } else { id.index = _freelist; _freelist = _objects[_freelist].next; } return id; } inline void remove(ID id) { Object &o = lookup(id); o.id.inner_id = UINT_MAX; o.next = _freelist; _freelist = id.index; } }; This is a lot better than the STL solution. Insertion and removal is O(1). Lookup is just array indexing, which means it is very fast. In a quick-and-dirty-don't-take-it-too-seriously test this was 40 times faster than the STL solution. In real-life it all depends on the actual usage patterns, of course. The only part of this solution that is not an improvement over the STL version is that our ID structs have increased from 32 to 64 bits. There are things that can be done about that. For example, if you never have more than 64 K objects live at the same time, you can get by with 16 bits for the index, which leaves 16 bits for the inner_id. Note that the inner_id doesn't have to be globally unique, it is enough if it is unique for that index slot. So a 16 bit inner_id is fine if we never create more than 64 K objects in the same index slot. If you want to go down that road you probably want to change the implementation of the free list slightly. The code above uses a standard free list implementation that acts as a LIFO stack. This means that if you create and delete objects in quick succession they will all be assigned to the same index slot which means you quickly run out of inner_ids for that slot. To prevent that, you want to make sure that you always have a certain number of elements in the free list (allocate more if you run low) and rewrite it as a FIFO. If you always have N free objects and use a FIFO free list, then you are guaranteed that you won't see an inner_id collision until you have created at least N * 64 K objects. Of course you can slice and dice the 32 bits in other ways if you hare different limits on the maximum number of objects. You have to crunch the numbers for your particular case to see if you can get by with a 32 bit ID.

Packed Array

One drawback with the approach sketched above is that since the index is exposed externally, the system cannot reorganize its objects in memory for maximum performance. The holes are especially troubling. At some point the system probably wants to loop over all its objects and update them. If the object array is nearly full, no problem, But if the array has 50 % objects and 50 % holes, the loop will touch twice as much memory as necessary. That seems suboptimal. We can get rid of that by introducing an extra level of indirection, where the IDs point to an array of indices that points to the objects themselves: id_lookup_3.png This means that we pay the cost of an extra array lookup whenever we resolve the ID. On the other hand, the system objects are packed tight in memory which means that they can be updated more efficiently. Note that the system update doesn't have to touch or care about the index array. Whether this is a net win depends on how the system is used, but my guess is that in most cases more items are touched internally than are referenced externally. To remove an object with this solution we use the standard trick of swapping it with the last item in the array. Then we update the index so that it points to the new location of the swapped object. Here is an implementation. To keep things interesting, this time with a fixed array size, a 32 bit ID and a FIFO free list. typedef unsigned ID; #define MAX_OBJECTS 64*1024 #define INDEX_MASK 0xffff #define NEW_OBJECT_ID_ADD 0x10000 struct Index { ID id; unsigned short index; unsigned short next; }; struct System { unsigned _num_objects; Object _objects[MAX_OBJECTS]; Index _indices[MAX_OBJECTS]; unsigned short _freelist_enqueue; unsigned short _freelist_dequeue; System() { _num_objects = 0; for (unsigned i=0; i.id = i; _indices.next = i+1; } _freelist_dequeue = 0; _freelist_enqueue = MAX_OBJECTS-1; } inline bool has(ID id) { Index ∈ = _indices[id & INDEX_MASK]; return in.id == id && in.index != USHRT_MAX; } inline Object &lookup(ID id) { return _objects[_indices[id & INDEX_MASK].index]; } inline ID add() { Index ∈ = _indices[_freelist_dequeue]; _freelist_dequeue = in.next; in.id += NEW_OBJECT_ID_ADD; in.index = _num_objects++; Object &o = _objects[in.index]; o.id = in.id; return o.id; } inline void remove(ID id) { Index ∈ = _indices[id & INDEX_MASK]; Object &o = _objects[in.index]; o = _objects[--_num_objects]; _indices[o.id & INDEX_MASK].index = in.index; in.index = USHRT_MAX; _indices[_freelist_enqueue].next = id & INDEX_MASK; _freelist_enqueue = id & INDEX_MASK; } }; Reprinted with permission from The Bitsquid blog.
Cancel Save
0 Likes 6 Comments

Comments

belte-shazzar

Interesting read, though I assume your "Object" type is not a base root class in a class hierarchy as this would not allocate the correct storage for all descendant "Object" based classes that added new fields.

It means you would be limited to just some sort of non-inheritable C struct type Object unless you make the System class a template that accepted other types and used multiple lookup tables on a per-type basis.

Is that you are using it in your game system, i.e multiple lookup tables on a per-object type basis?

April 24, 2013 01:48 PM
metsfan

Interesting read, though I assume your "Object" type is not a base root class in a class hierarchy as this would not allocate the correct storage for all descendant "Object" based classes that added new fields.

It means you would be limited to just some sort of non-inheritable C struct type Object unless you make the System class a template that accepted other types and used multiple lookup tables on a per-type basis.

Is that you are using it in your game system, i.e multiple lookup tables on a per-object type basis?

It is not explicitly stated in this article, but I have implemented such systems before. I always did this sort of thing, as you suggested, on a "per object type" basis. This avoids the needs to dynamic_cast every object coming out of the system, and also just seems to make a bit more sense to me (for instance, in a relational database, you have a separate ID sequence for each table).

I will say though, that if he had used pointers in that vector, it would alleviate any allocation errors for derived classes.

April 24, 2013 02:57 PM
tivolo

Given the background that this is used in the BitSquid Engine and follows a data-oriented design approach, you can safely assume that "Object" in this implementation is simply meant as a placeholder for a Mesh, Shader, Effect, etc. class.

If you were to put generic "Object"s (or pointers-to-Objects) into the packed array, you would also have to either call a virtual function, or cast to different structs based on the type of an instance. In either case, you would loose all the benefits of storing them tightly packed in a cache-friendly manner - so why do it in the first place?

Subsystems usually should be responsible for dealing with a certain kind of data (e.g. meshes, effects, ...), and are the only ones working on that data. Therefore the cache-friendly packed array, and multiple lookup-tables.

I would also advise against storing pointers to instances in the vector, because that leads to an unnecessary indirection when accessing that instance's data. Additionally, you would have to make sure those allocations themselves come from a pool allocator or similar, and are not stored all over the heap. Otherwise you'd suffer even more cache-misses.

April 25, 2013 10:40 AM
Koradrell

Hi,

What happens if you try to get an object that has previously been deleted by another object? Do you have to check previously if the system has the object through the "has" function and then look up it? What would be a typical use of a system such the one you proposed?

Thank you,

Arnau.

April 30, 2013 07:33 AM
rpiller

I get all this and have read some about data-oriented design, but given the above, what's the point of using C++ then and not just plain C? If the powerful features of C++ add slowness that games don't need then why use it? There seems to be a push to go back to the C days of game development because of trying to squeeze out every last bit of cpu. I sometimes feel like it's forgotten why C++ has become the main game programming language, and it's surely not because of speed. There are other benefits and advantages of using all that C++ has to offer. Data-oriented design seems to "trash" those ideas because they are slower. I think we all know it's slower but we get other benefits.

May 06, 2013 01:39 PM
Koradrell

I get all this and have read some about data-oriented design, but given the above, what's the point of using C++ then and not just plain C? If the powerful features of C++ add slowness that games don't need then why use it? There seems to be a push to go back to the C days of game development because of trying to squeeze out every last bit of cpu. I sometimes feel like it's forgotten why C++ has become the main game programming language, and it's surely not because of speed. There are other benefits and advantages of using all that C++ has to offer. Data-oriented design seems to "trash" those ideas because they are slower. I think we all know it's slower but we get other benefits.

IMHO, I think it depends on the layer you are working on and what are the requirements of that layer. For low level subsystems, where performance is a critical point, you might sacrifice some of the benefits of using C++ and get the others of using pure C, because that layer will be accessed only internally.

On the other hand, if you are programming the interface that will be used by the users of your tool, you might prefer using a more object oriented design, which might be safer although you sacrifice some performance. I think that, in the end, one have to use the most proper tools in every situation depending on the needs.

Arnau

May 11, 2013 12:50 PM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!

Looks deeper into the advantages of using an ID lookup table and what data structures a system might use

Advertisement
Advertisement
Advertisement