Advertisement

Efficient methods for finding component within object pool using entity ID?

Started by July 04, 2021 02:35 PM
9 comments, last by Gnollrunner 3 years, 4 months ago

Hello there,

I'm currently working on an ECS engine in C. Specifically, I'm implementing a “component pool” that enables components to quickly be created / destroyed. Each component holds data along with an “entity ID” such that the component can be associated with an entity.

The requirement of my pool is that the components are in contiguous memory. However, there is no guarantee on order of the components with regard to entity ID.

For example, if I did the following:

insert_into_pool(new_component(), ENTITY_0);
insert_into_pool(new_component(), ENTITY_1);
insert_into_pool(new_component(), ENTITY_2);
insert_into_pool(new_component(), ENTITY_3);

// component_order
// Entity 0 Component
// Entity 1 Component
// Entity 2 Component
// Entity 3 Component

remove_from_pool(ENTITY_1);

// component_order
// Entity 0 Component
// Entity 3 Component
// Entity 2 Component

Note that whenever a component is removed, the last component is moved to fill the vacant slot such that active components remain contiguous.

However, this creates an issue that now I need some sort of method to retrieve a component using a provided entity ID. That is, I will want to get components out of the pool by only specifying the entity ID.

My current method is to use a simple map and would something like the following

int32_t entity_map[MAX_ENTITIES]; // gets updated in "insert" function

Component* get_from_pool(entity_id)
{
	location = entity_map[entity_id];
	
	return pool_data + location * component_size;
}

So this is a pretty simple method and is relatively fast since it's just a look up table. However, it seems to very memory inefficient.

Let's say the max entities I support is 16384. But then let's say the component type in question is not super common, and would likely only ever be used by 100 entities. It seems very wasteful to allocate a 16834 look-up array when there would only be 100 entities that use the component.

Are there any other methods for tracking entity IDs within component pools?

I mean, if you want to save memory, I think you have two options:

  1. Keep a compact array (=no emtpy spaces) in sorted order, and then do a binary search (requires storing a pair of entity_id and location and is O(log n), but worse for cache)
  2. Use a map/hash/unordered_map. Those range from O(log n) to O(1), but also typically have bad performance due to complexity of the operations/cache locality compared to a direct access. I do use this approach for my internal component access though, and its fast enough for my use-case.

Though if you want optimal performance for access, if you access components by ID frequently, using an array like in your case doesn't seem that bad. For 16384 maximum entities, you only use up 64kb of storage - thats next to nothing. Also keep of note that the map-type containers I mention, can also internally lots of memory. Its hard to find out how much exactly, and I don't know any numbers myself, but I'd be willing to say that for 100 entities, you might not save that much compared to your raw array.

Advertisement

Why do you even need the entity_map? Seems like another thing you have to manage when you remove a component. If every component is only used by a single entity, you can have an index into the component table in your entity, and a back pointer in your component. When you move the component you follow the back-pointer and update its index in the entity. You would have to do some update for your entity_map anyway.

As for your maximum component issue, you can make them grow on demand, just copy them to a lager area. Your indexes in your entities would still be good so you don't need to touch them. And your back pointers would still be good also. Alternatively you can use something fancy like VirtualAlloc on the PC (or some equivalent) and just reserve chunks of address space for each kind of component. Then you page in new memory when you need it. This is what I do for my heap implementation. Since in your case everything in contiguous you can even release memory off the end if your component number shrinks.

Ive read into robin hood hash maps and found them useful in case of performance, so we're using them every time we need something to be associated with something else. For example our reactive properties are on-demand properties whos state is unclear for as long as they're not querried. Because we use a property ID, this is somehow similar to your component system.

Access time is quite low as it uses a hash positioning algorythm paired with a limited iteration loop over a contiguous array of items in order to find the spot we're looking for

@Gnollrunner I'm not sure if I follow. My entities are simply integer IDs. They are not structs, thus I don't have a way for them to hold onto a pointer to individual components

So just to understand this, In order to find all components of a given entirely, you have to search through many tables?

Edit: ok nevermind, you just go to the positions in the tables. However to me that seems like a lot of opportunities for cache misses. In any case I think I would just organize the component list for an entity in one chunk of memory, rather than spreading it out over arrays.

Advertisement

@Gnollrunner Correct.

The thought is that all components of the same type will be in contiguous memory, which will make systems that iterate over a list of common component “fast”.

By organizing components that belong to the same entity in contiguous memory seems like it would be difficult to add / remove components as necessary. i.e. what happens if I want to add a new component but there isn't any room left? Would I need to push everything back in memory?

I think it's premature for me to be concerned about performance, but I like thinking about this stuff.

Thomas Izzo said:

@Gnollrunner Correct.

The thought is that all components of the same type will be in contiguous memory, which will make systems that iterate over a list of common component “fast”.

Yes that part I think is fine. However if you want to find all components of a single entity, it means you have to go to the components index arrays and then the components themselves, so that's twice as many possible cache misses.

Say for instance you simply put all the component indexes for an entity in one block. Then you only go to the components + 1 (for the entity itself), number of distinct locations in memory. The down side is you either have to have a max number of components in your entity, or you have to let the entity grow on demand, but that's not so hard to implement.

You could do something like have a starting number of possible components for an entity, and then if you go over that it moves it and resizes it up, kind of like how std::vector works. For keeping track of the components themselves you can have a pointer array. You can make it reusable with a free list such that if an entity goes away that entity ID is now available for the next one to be created. The entity ID is basically the index into your pointer array. If you use a separate custom heap for your entities. It will keep them roughly together in memory too.

For resizing component or entity arrays, I would either simply do a copy, or as I said before, page in memory at the end of each list using low level memory routines. However this last idea is only good on a 64 bit machine since you have a lot of address space available. To do this you reserve a max amount of address space you want to be able to use for each array. This does not allocate memory. It just reserves address space. You only have some starting memory allocated at the front of the address space. As you need more, you can ask the OS to tack it on the end for you and only then are you actually using memory.

@Gnollrunner Thank you for the feedback. I will likely be exploring this as I progress with my engine. It's still super early and I'm very much in my “proof of concept” stage.

@undefined Sure, good luck with it. It's actually a good thought exercise for me too since I've been considering if I want to add ECS to my stuff. In my case with the camera centric rendering, I'm not sure it makes sense since I don't traverse things linearly that much.

This topic is closed to new replies.

Advertisement