Rewaz said:
The only issue that I see it's that I might have 5.000 +/- entities on the map loaded at once (my idea is to have something similar to WoW as a game), so I wonder if running through 5.000 entities and copying the transform to mesh with that many entities might have bottleneck in performance.
As others have pointed out, this should not be an issue. I regularly work with > 100,000 entities in an ECS and have tested variations of this problem regularly. It typically comes down to 3 things:
- If most things in the scene are moving at all times, just iterate and copy the data. It's cheap, quick and I've never seen it actually show up in profiles so I really don't suggest the following alternatives.
- If you promise that most things in the scene are not moving, you could add a “dirty” flag to the transforms and skip those which have not changed. In most cases where even 5% of entities move, this is more expensive than #1. There are many ways you can approach this but typically speaking the combination of branching code, random cache misses and potentially wasted memory put the point of diminishing returns at very low numbers.
- Like #2, if you promise most things are not moving, bypass the ECS and use a queue to create a list of entities which moved. Then just iterate that doing a single entity lookup and copy. This is inherently against the ECS style but sometimes is the right thing to do. From my findings, this is almost always faster than #2 but until the entity counts get pretty high with few moving items, slower than #1.
Now, having said the above, there are a few more things to consider:
First, if you have a transform and a mesh component in the ECS and they are both flat POD structures containing matrices, why copy at all? Remove the matrix from the mesh and when you go to render, just iterate (Transform, Mesh) splicing the data from the two components into the result you need. This is typically how you want to use the ECS when possible. Of course, for many cases the Mesh structure is supplied by a rendering engine you may not be able to, or at least don't desire to, modify. You could of course reverse this and just use the matrix from the Mesh component rather than the one you are using. The downside there will depend on the size of the Mesh component, if it is large, iteration to update that matrix could end up touching considerably more memory and slowing things down such that #1 makes more sense again. It's a balancing act here and depends on what you are using.
Going off the deeper end is always possible. Typically speaking, most rendering engines contain an “easy to use” retained scene graph between you and the rendering device. Often it is implemented very similarly to an ECS in fact. This is duplicating the functionality and data contained in the ECS and generally speaking, something you don't really want between you and the underlying rendering device. This is how my solution works, generally speaking, the difference is notable when it comes to scale. A bad comparison would be the debug visualizer I wrote against a retained mode renderer and the current solution which bypasses the retained mode and uses the same backend device code. 5000 entities (low counts because this had no culling) with the retained mode rendered at 10-15 FPS where the current code renders at over 120 FPS with ¼th the memory bandwidth. The results are identical in look but 5 times faster by simply bypassing code that is not necessary. Let me be clear, this is a totally unfair and questionable comparison as there are two huge differences: a) the renderer managed the mesh instances and they were pointers to random memory, so all contiguous memory access was out the door when pushing the transforms (aka: cache miss palooza!) and b) the ECS was executing in parallel while each of those meshes had to be protected via a mutex. Those are pretty huge differences but, unfortunately are likely to be standard fair unless a given renderer is written specifically with ECS and multicore execution in mind. All said and done, if you want to go off the deep end for performance, this is a pretty big one to consider.
Now, the reality check, keep in mind that different needs play a role in where you should be spending your time. Making a game with < 5000 entities, ignore everything above except #1. Seriously, you have better things to do with your time rather than optimize the crap out of something that is likely good enough. The primary reason for all the above though is that you will run into it again if/when you integrate a physics engine, a sound system, networking and/or many other things where middleware is probably desired. In a lot of those cases #3 solution starts to look very good….