Greg4581 said:
Namely, what data structures to use?
It depends to a great extent on your use case. Is this a container you're going to be mainly iterating through repeatedly? Or that exists to hold info for infrequent use? Do you need random access? How often will you iterate vs remove? Do you have an upper-bound on container size? If cache misses are important, contiguous containers are a good starting point. If Big-O is important then hashmaps are a good start.
You will want to ask these questions per-container instance. There is no “best way to process entities," no “I win the performance game” button you can push that covers every possible use case, which is what I took your post as implicitly seeking. There may be a “best way to process these entities," where “these entities” are some specific thing like conveyer belts in Factorio - if you're aiming for perf I can just about guarantee you will have many different “entity” types, because trying to stuff everything in one framework will lead you into inefficiencies. Just because one entity type has a particular access pattern doesn't mean all of them will. Same goes for insert/remove patterns and so on.
tl;dr It depends.
Greg4581 said:
The downside however is that removing elements from the middle can be costly
For truly contiguous containers like vectors, that depends on when you remove it. If you're calling vector::erase (= shuffle all elements above the element to be removed down one slot) on every element when you do it, then yeah that's going to be slow, because if you're going to remove multiple elements you're in a situation bounded by O(n^2). But if you can contrive a situation in which you'd want to remove multiple elements at a particular time and you don't know which elements specifically(eg. “cull all entities whose health is below zero”), then there exists an O(n) algorithm to remove multiple elements at once, which can be done with the C++ standard library algorithms - it's called the “erase-remove” idiom. You can find the documentation for the algorithm part here: https://en.cppreference.com/w/cpp/algorithm/remove. The gist of how it works is that the find and erase “shuffle” operations are combined - for each element if we get to an element that is to be removed, we search for the next element that isn't to be removed after it in the container and move that one into the slot where the removed element is.
If the perf hit of iterating a sparse array is acceptable, you could use a “slot map” or “colony” data structure, which is sometimes used to implement a memory pool - essentially a vector, but when you remove an element you instead mark its index as free (and there are multiple ways to do that, each with their own tradeoffs…). Your iterator type would skip the free slots internally and the next time an element is allocated, it would go into the first free slot.
Greg4581 said:
I should note that the kind of engine I'm developing should be very efficient.
Everyone wants their engine to be efficient. That's not a directly actionable design goal, it's more a “value statement.”
Greg4581 said:
Think of games like Factorio.
This is much more helpful spec-wise. Never played Factorio, but I've seen enough of it that this tells me you want large collections of simple things that can interact with each other in specific ways and those interactions are player-controlled.
Greg4581 said:
This is nice because I need not process entities in a specific order or priority. Speed alone is simply the name of the game.
If speed is what you want, then processing entities in a specific order or priority IS what you likely will end up wanting. There will likely come a point where you exceed your frame time budget and start wanting to time-slice behaviours where you can, for example. If you support taking two objects in the world and having a parent-child relationship between them (as in a scene graph), that tends to require a particular order of operations that may vary between specific thing being updated (eg. you'll want to update the parent transform by the child, but maybe tick child physics before parent…). Totally unordered updates have their place, but I would not expect to be able to use them for everything and the likelihood that you start needing to sort entities for perf goes up the more complex your game gets. In fact, your “chunking” idea IS an example of processing entities in a specific priority order.
It's hard to give more specific advice without knowing the precise details of your game and the data we are dealing with, but without that knowledge my gut says you're making some unwarranted assumptions here. Including the notion that all games have “entities” in the sense I'm reading you as meaning. ?