JoeJ said:
So you could use that to model an army of soldiers. And your spells, rushing through the army, would cause a trail of corpses, forming nice persistent patterns of red blood from the magic bullet hell projectiles. Wouldn't that be a nice new experience? I'm a bit joking, and personally i prefer quality over quantity, but there were some games which used this to get attention and finally success from such features. Likely those NPCs could not run an individual script for each, but still.
Oh, definately. There are many cool examples, depending on the game. If I ever finish my 2d-rpg, I want to do something eigther like an RTS or 3D horror, so maybe then my stance changes. And also, as I said, for the case where I do have multiple millions of tiles I already have to come up with more “creative” solutions. Thats just my takeaway, that what optimization you chose not only depends on the type of problem but also in a large way the type of game/application you create in a broader sense ?
JoeJ said:
What i mean is: You either can use modern HW brute force power to do the same old things with less effort, or you can use it to try new things. Both is fine depending on your goals, but due to HW progress slowing down, the latter now becomes more difficult to achieve. And because brute force methods have been exhausted for everything, we won't discover new things anymore this way. So we need to change our minds, even if brute force seemingly is good enough at a current moment.
Yeah, that was one thing I wanted to say to that mindset. I find it similar to the discussion that sometimes pop up about using premade engine/libraries VS writing your own. If your sentiment here was true, that would mean that by people using Unity/Unreal, no new engines would be able to be made anymore, because nobody is doing it anymore. I don't think its a really valid point. You pick the tool for the job, if the tool is easy/simply so be it, if its complicated and hard then thats that. But I think I've already said that, so let me put it another way: Maybe instead of forcing innovation by using complicated methods for simple tasks that don't need them, we have innovation by people like you choosing task where “complicated" solutions are required. I think that is a better way to look at it, no? ?This would kind of be a middle ground where both our views can coexist ?
On second though, I'd say that this is probably the “right way” to go about it. Applying complex algorithms to simply problems won't really give you any insight into how those complex algorithms really perform, nor does it give you the ground to really have to push the limits. If my frametimes is twice as long it is allowed to be, I do have to come up with something, as all cost. If I save a few ns on some occassions but lose some on others, and its all in the “noise” level, then why bother? I could even end up with something worse just because there is so little to be gained and so little place to accurately measure.
JoeJ said:
Likely i should express myself differently, avoiding that ‘brute force’ word, which is not really the point. It's rather my desperate search for new things that i mean, which has no real connection to compute power at all.
Well, we did maybe mix up things in the discussion, here and there :D Do not be mistaken, I'm also driving to innovate, especially on the tool-front and on the performance of this tooling or the engine itself. Just that it requires different solutions/approaches, since its a different problem than what we were actually talking about here. To have a performant game-engine overall, including editor, its more about having a large-scale system that doesn't make operations unncessarily slow, without being able to “simply” improve O-complexity - because a lot of task already have O(1) (doing a complex task for a single asset), or O(N) (doing an operation for a cluster of assets). Thats what it often feels like with Unity, everything just takes 10x as long as it should be ?
JoeJ said:
I thought about that, but then i removed the ‘static’ because i had to clear the whole grid at the start. And if i clear the outer vector, the inner ones would be just deallocated, loosing the potential advantage? So instead, i'd need to loop over the inner vectors, clearing each of them idividually, i guess. But then i thought that's probably pointless, and hoped the reallocation likely can get the same memory somehow quicker than my manual fiddling. Would be interesting to try.
You should not clear the whole vector, but every cell. This might sound counter-intuitive at first, but it is really no overhead compared to your method. With the local vector, you have to do two tasks:
- Create the vector. This allocates (dim*dim) memory, iterates that many times and constructs new vector each time, resulting in additional allocations (due to std::vector); but at least ctors being called
- At the end of the frame/function, destroys the vector. This again requires iterating over the whole structure, destroying each individual vector and deallocating all the memory
If you made it static, this would change to (after the first frame):
- Iterate over the entire vector. Clear every sub-vector, which for pointer-elements should just mean setting the “size” to 0
See? This is way less work without any additional strings attached, also removing all allocations. Thats why I harp so much on measuring and how performance isn't always intuitve, at least not on first glance ?
JoeJ said:
Seems you know better, but i do not understand. Can you make a code example?
When you access static variables, there is a certain overhead. Mainly, since c++11 added thread-safe statics, you have to first check (in a thread-safe way) if the static is initialized, before accessing it. The access itself is only marginally slower, but you still have to tap into some global memory, possibly adding another dereference/worse caching.
To apply my solution, here's what you'd conceptually do:
static std::vector<std::vector<Object*>> grid;
auto& localGrid = grid;
for (const auto* object : objects)
{
if (...)
localGrid[coordinate].push_back(object);
}
This optimization only makes sense when accessing a static variable multiple times within the same function, preferably in a large loop, but it does provide a noticable speedup. And as far as I recall, this is not an optimization that most compiled can/will do on their own, unfortunately :( But I'd have to double check.
JoeJ said:
I'm questioning since a long time if using std::vector for performance critical code is a valid option nowadays. Seems the case, and i don't want to implement my own containers. But i'm still doubtful. I could try to compare it against the containers of the physics engine…
I do think it is a valid option. Especially if you have algorithms where you don't need to complex spatial structures, but maybe just collect a bunch of objects in one container. There are things that could be improved - for example, I added an “uninitialized” resize mode that speeds up the case where you just want to memcpy primitive data to the vector; or a “push_back_reserved”, which can be used for the case where you reserved the appropriate amount of capacity and then push back (both as an optimization and an assert - gives improvements of about 2x in debug and 5-10% in release) - this case is very common in my codebase.
The worst part about std::vector is its bool-specialization, which was the main reason I came up with my own vector. I don't really regret it, since I don't often use (stateful) allocators, it also wasn't that hard.