Back to General and Gameplay Programming

What kind of coding standard do you prefer

iko iko · 2022-12-20T20:57:39

So, other than the official answer “what ever standard client prescribes”, what kind of coding standard do you use at home in your personal projects? Here is mine: Standard is optimized for speed, simplicity and readability and fits on one page. To get the best cache performance it is wise to keep your memory sequential and compact as possible (DOP instead of OOP at least at the lower levels). ECS is main paradigm. using vectors instead of object pools. I want sequential and pre-allocated memory and even if memory isn't preallocated (empty vector) it will become so after a few frames (when enough objects get created). Maybe add another vector to track free elements (or wrap those two ones in a template called ConservativeVector - but that is practically object pool if you preallocate enough memory). Be careful with references. If you have a lot of dynamic object this might not be the best solution. using indexes instead of pointers (this goes together with a point above). In most cases unsigned short will be enough and it will be 4 times smaller than pointer so your data will fit in cache better. This also means that you absolutely don't have to write copy constructors and have any kind of memory management (or very little) - literally entire engines and editors can be written without a single new, delete, malloc and free command (at least for static data like meshes and textures). Another benefit is the code can be run on GPU and CPU all the same (if you wrap recursion). I've implemented ray-tracing octrees this way and did all the debugging on the CPU. My BVH hierarchy copies to GPU the same way. If you have some extremely complex hierarchy, debugging with indices is much easier than reading pointers (easier to draw entire hierarchy on paper and setting break points). using Memory Context class to store all those vectors that would otherwise be created as a local variables in the functions. Of course, these memoryContext objects should be reused (and passed by reference) so memory allocations won't happen on every use but only on first few usages. If you have a thread pool, have one of these objects created for every thread. Be careful with recursive functions. don't use sets or lists - they are 10-20 times slower than vectors. Migrate all your helper libraries to use vectors for internal storage (RadixTries, PatriciaTries, Dictionaries, Octrees, Radix Sort functions, geometry processing functions…) SortForEfficiency is a common function in my helper libraries. Always pass by reference. Don't return vectors, but pass them as input-output parameter. That way memory pre-allocation will be maintained. use JSON for storage and clipboard copy paste. It is simple and it will maintain compatibility within different versions of application and anyone can code plugins in any language they want. Nobody has to waste time with writing custom DSL parser every time. If you are very lazy, you can even just dump those vectors in a file and hope that format won't change but I wouldn't rely on that. greedy thread pool. If it isn't faster than OpenMP you haven't created a fast thread pool. In all my use cases I only needed a thread pool where all the threads run the same task so I've actually omitted single threaded type of task in the latest implementation. It is kind of like async but more. There are 3 levels of task priority and only first one is non blocking. If bottleneck happens all the free threads can help the ones that are still working by preforming sub tasks (there are macros that I use in all of my code that will do parallel processing if thread pool is present or just single core if not). One thing, if some thread submits a task to queue and manages to finish it before anyone helped - that is OK - there won't be any penalty like with OpenMp. I rarely use inheritance and virtual functions but good example of it is the GUI system (it is highly efficient and packs one integer per graphical element instead of entire geometry of 4 vertices which is 28 times bigger like with some other libraries). So, in some cases OOD is more elegant than DOD but I mostly prefer DOD. OOP example is using virtual functions for OnCreate and OnUpdate on your engine's objects. On the other hand, if you use DOD and composition, these function can just be properties of the object. If they are set then call them, otherwise do the default functionality. not using single file per class. I like to keep things that make sense together grouped. I also prefer header only libraries for smaller things that won't change much in the project (many times these classes are templated so there isn't much choice). Bigger classes that are updated frequently should have both H and CPP files. That is it. Almost no memory management and very fast code (for certain types of project where most of the objects is mostly static). Edit: edited for clarity.

General and Gameplay Programming Programming

Started by iko iko December 16, 2022 11:35 AM

82 comments, last by JoeJ 1 year, 11 months ago

JoeJ

4,359

December 20, 2022 10:24 AM

Juliean said:
Not in my tests. In the third reply, I implemented exactly your structure, and it did show the same characteristics that I measured for std::list, albeit faster by a small margin comparatively. So, if you are not also measuring those slowdowns with 1000 elements that I did (which already came close to 10x for just iteration), then IDK. Or maybe its because you are still measuring overall speed, probably?

I think i missed that third reply. I only remember your recent paste pin comparing std::list vs. std::vector.
But anyway, all my measures are (at least) about the whole collision detection, which is iteration, checking if the pair has been already processed, calculating intersection, appending the pair data to draw a line (preallocated fixed sized buffer for my visuals).
That's why i won't see a large difference between various iteration methods. But in practice it's always like this. You do some work on the objects you iterate, and this work causes it's own good or bad memory access patterns. And almost always the work you do costs way more than iterating the work items.

Thus, optimizing for memory access is quite difficult, because the feedback is bad. I have no profiler to tell me bandwidth, or % of cache misses, so i can only guess.
Isolated, synthetic tests as you did give good feedback, but don't tell much about practice. It may well be that in practice prefetch has enough time to catch up and compensate while you do the actual work.
And finally, even if some bottleneck shows, rearranging data structures to test out alternative access patterns, simply isn't practical with a C based language. The changes you'd need to do are always too heavy.
So you can only hope the initial design of your structs gives good performance, not a bad one. Otherwise you'll never know anyway.
The few times i did such experiments the outcome mostly showed no real difference, which is very good, considering we can't do this often due to effort.
On GPU the situation is very different, though. There it is totally worth to experiment with different data layouts. The difference is big, but unfortunately it also varies widely across different architectures, so you're fucked again.

Regarding lists vs. vectors, it's clear that a list forms just a sequence, so ofc. we'll prefer to reproduce this sequence in memory too, preferring the vector. If we can.
But once we need trees, graphs, or higher dimensions, sequential access is no longer possible no matter what.
And that's where the real question comes up: Do we accept the cache misses, or do we choose a different algorithm with worse time complexity, but better access patterns?
That's a really good question. Obviously the answer always depends. And obviously some pointless flame wars between general preferences, and perception of what people might do often wrong, may come up.
But i do think that we can give a general answer, which is to find a good mix of both. Usually that's complexity on outer loops, and brute force in inner loops.
That's the one and only general performance that i can give.

Juliean said:
So perhaps there is something about std::vector being bad here as well.

Yes. Allocations will always happen, and you have little control about when. You can try to minimize the problem, but you can't solve it. There is a price for convenience and ease of use.
So outside of my little applet code given here, i would not use it for a collision detection system of a game engine. Because idk which sorts of games and requirements will come up using it.
I would also optimize for a large N, and accept slight disadvantages for small N. For the same reasons, plus i want to achieve scaling. So i do not need to rework the system after some years, just because games became bigger again.
In this sense i see your current solution more as addressing specific game logic, but no general collision / spatial query system.
But i don't think that's a mistake, because it works. Contrary, a general system takes more effort, likely adds features you do not currently need, and worst: It still very likely turns up limited and specific in the future, e.g. once you want to expand to infinite worlds.

Juliean said:
I'm afraid I will resist being “cured”

You ignore the fact that sometimes you just need a list and have no choice. It remains a useful tool. There was a free algo. / data structures book linked on the wiki page i've looked up for definition, and the very first topic of the book was linked lists.

Personally i have used double linked lists a lot for tools in the early days. (And i did allocate each object with new.)
Nowadays i don't use it anymore at all, with some rare exceptions. But a tiled 2D game would be such exception. I would use the proposed algorithm here without worries, and i'll keep recommending it. Its the simplest after brute force, and the fastest.
My reason to neglect lists for most was not iteration performance, but rather the expansion to use trees for larger N, or graphs for higher complexity.

I won't subjugate to hardware dictatorship. :D

Juliean

7,374

December 20, 2022 10:43 AM

JoeJ said:
Thus, optimizing for memory access is quite difficult, because the feedback is bad. I have no profiler to tell me bandwidth, or % of cache misses, so i can only guess.

There are such tool, though. Its been a while since I used them, but for a server-based application that I worked on, I once used a profiler which tells you exact stats like cache-misses, even at which cache-level, so it could give you a better understanding about a practical piece of code, as well as allow you to observe differences or maybe just spot what exactly is causing which effects. But its not baseline included in IDEs like Visual Studio, I don't even remember what is was exactly, some tool from Intel I belive, so its definately not as easy.

JoeJ said:
But anyway, all my measures are (at least) about the whole collision detection, which is iteration, checking if the pair has been already processed, calculating intersection, appending the pair data to draw a line (preallocated fixed sized buffer for my visuals).

That is fair. My own reasonings/measurements are also about whole-system performance, mostly, so you are clearly not wrong for caring about the whole cost. But you did include the individual costs, and it can give some benefit to see what time is spent where, if not to know which part of which algorithm can potentially be further optimized. Part of this discussion is practical, part is theoretical, sometimes things get mixed up. Practically, nothing outside of not using brute-force really matters much, but theoretically, if you could find a way to reduce the cost of creating those vectors, or had to work on an algorithm where list needs to be created in a similar fashion, it would matter again.

JoeJ said:
Yes. Allocations will always happen, and you have little control about when. You can try to minimize the problem, but you can't solve it. There is a price for convenience and ease of use.

Not in a broader sense. Most algorithms that I use vectors with, eigther know exactly how many elements are going to go into the vector, or can reuse the same vector so that after the first few passes, no more memory will ever be allocated. Thats about as little allocations realistically you could expect, for using an explicit container. Even your list-based grid uses a vector, and that should never allocate again after the first initialization, unless grid-size changes. Again I don't know about the internals of std::vector specfically, but if you do need an explicit, general use container, vector is usually as good as it gets, when talking about allocations. Thats about as “solved” as can be in my book ?

JoeJ said:
You ignore the fact that sometimes you just need a list and have no choice. It remains a useful tool. There was a free algo. / data structures book linked on the wiki page i've looked up for definition, and the very first topic of the book was linked lists.

I mean, to be fair, you didn't really supply an example of a list that has to be used, because as has been shown, even if “list” performs marginally better here, you do have the choice. I'll give you, you should chose to use the list here if you care about performance, but still.
Though the second part I would really not give too much weigth to. The first programming pattern most unversities teach is usually singleton, that doesn't really mean that is a good choice in most cases eigther ? List at least are an interesting example to learn about datastructures, and should be included in any (theoretical) course on data-structures, IMHO, no matter on how the practical application really lookes (unlike Singleton which should go *** itself).

But, in a broader sense, you might be right. I've been thinking about what your example means to what I qualify as list. I do have examples where I have been using a similar pattern. For example, Widgets in my UI-system have a pointer to their parent, and a vector of children. So that in and of itself is some sort of list, if we go into that direction. So if we count every case of “elements linking each other” as a definition of “list”, then, sure. Didn't really mean to exclude those as viable cases, not really sure how I phrased it and what I we even orginally responding to :D But using an external vector to store parent-linkage would not be a good idea, so there's that.

Gnollrunner

474

December 20, 2022 11:39 AM

Ivica Kolic said:
Pointer vs compacted local data: In this discussion there are a lot of interesting examples that demonstrate why sequential access is so much faster than random access in many cases even when using acceleration structures.

I did my own tests at one point, and for my test cases I found that sequential was about 50X faster than random, on about a 10-year-old computer. Interestingly it was only about 5X faster than accessing random data through an array of pointers, so clearly there is some optimization there, which makes sense, since for languages like Java that's a big deal.

However, I'm often building temporary structures, and after built, I put data into an array when I know it's size. Certainly, in situations where you are running through the same data multiple times, sequential is good. However, with the procedural generation stuff I'm doing, I'm not so sure it makes sense all the time. I have voxel trees that get updated over and over and in addition the tree itself contains references to shared objects which are also updated. This makes keeping things in a sequential array pretty difficult given everything is so fluid. I do try to keep things compact. The memory manager uses free lists of objects of the same size, so holes are filled fist. I also keep all children of a node together in memory. This is manly to keep the number of pointers down, as I can use a single 4-byte pointer (with my half-pointer relative system), instead of eight 8-byte pointers. At some point I'd like to do an actual garbage collector, or rather hole collector since I'm doing reference counting for general collection.

Also, in regard to arrays vs trees, I find it again situational. For instance, for collision or physics, access time is only one part of the equation. You want to eliminate as much data as possible, so you don't have to do tests on it. Those may only be cursory tests, but it adds up if you have to do them on thousands of objects. In my case it's even worse, since I'm doing planet sized stuff. I actually have to build the collision model of the terrain right in front of the player, in kind of a JIT sort of way, because I can't store it for an entire planet.

I agree that for many, if not most cases, sequential storage is a good solution. However, I don't typically make hard and fast rules. I find the only way to know for sure is to test things. For instance, I've found through testing that using thread pools is often a significant speed up over spawning a lot of threads.

JoeJ

4,359

December 20, 2022 02:14 PM

Juliean said:
I don't even remember what is was exactly, some tool from Intel I belive, so its definately not as easy.

Maybe VTune? Never used it, since it's not free iirc.
I should at some time look those things up.
Only when working on GPU i noticed such tools are mandatory. Otherwise it's just guessing and tapping in the ark, which is what i'm doing on CPU.

Juliean said:
if you could find a way to reduce the cost of creating those vectors

Two options, already discussed: Binning or sorting. Then we can do it in constant memory and dynamically sized arrays are not needed.
I would absolutely do this in practice.
But with the linked list approach nothing of this is needed at all. And if we still want to reorder lists sequentially to memory, it's O(1) per particle. (Yesterday i had added O(grid area) as well, but this is not the only way.)

So if you want sequences in memory, temporal use of linked list is again the fastest option if object count is smaller than grid area. (Otherwise the binning example of yesterday applies and is faster.)

Elaboration of both those options in detail would bring us to confirmation about benefit of linked object lists vs. counting on grid cells, i think. Which i can do with pseudo code, if you're interested.
Notice the use of list can be temporally and just once, to avoid the need for dynamically sized arrays.
BVH build is an actual example, as said. Because we do this same thing for each tree level, so multiple times. Also the number of required std::vectors would increase exponentially with each new level, so performance sure would become terrible with larger trees.
It's also important to avoid recursion here, which is the same problem: It requires increasing counts of dynamically sized arrays on the stack, likely implemented with linked lists under the hood to deal with dynamic allocation.
This was my point for the BVH tutorial, btw. Using recursion and std::vectors is convenient and easy, but not efficient. Doing better is not easy at first, but once you are used to managing your own traversal stack, and are experienced with binning or in place sorting methods, it becomes second nature and preferred, beside being faster.
I think the BVH build code is of big educational value because of that, but sadly it's harder to read than a standard quad tree tutorial you'd find elsewhere. It's just that those easy tutorials all shit over performance, and often even only work partially.
The good thing about cumbersome GPU development is that you are literally forced to learn and do the right thing here, as there is no recursion or dynamic allocation at all possible.

I found such practices often useful, also outside the applications of spatial acceleration structures.
I really know little about sorting algorithms, but afaict, Radix Sort would be another example. Basically it's binning per bit of the input numbers, while the BVH build is binning to parent nodes quadrants in my example (which is high performance build, but eventually low quality tree, depending on what you need).
So that's really general concepts ranging over hierarchical processing, sorting, compaction, etc.

Juliean said:
or can reuse the same vector so that after the first few passes, no more memory will ever be allocated.

Does not hold here. Some object moving to a neighbor cell can cause an allocation and copy. To avoid this, you'd need to reserve object count size to each cell. That's not only too much memory, but also increases the stride between all out data, increasing complexity for cache management.
I understand what you mean. Basically you accept those fluctuations to increase convenience and producibility. And you can afford this, because your game runs at 4000 FPS. No need to waste time on premature optimizations, which wouldn't even make a difference other than code harder to maintain.
But if you're in a situation where performance is critical, such optimizations help, and are still local, not affecting overall complexity.

Juliean said:
if you do need an explicit, general use container, vector is usually as good as it gets, when talking about allocations. Thats about as “solved” as can be in my book

It's a practical tool in cases where you don't have to actually solve the problem. It's no solution to memory management at all - it is a way to ignore the memory management problem on computers which have lots of it.

That's how i look at it. Maybe a generational thing. Looking back, in 8bit, 16bit and even 32bit era such convenient options were not acceptable for video games.
Only in the 64 bit era this has changed, but is still avoided for performance critical systems, afaict.

Juliean said:
I mean, to be fair, you didn't really supply an example of a list that has to be used, because as has been shown, even if “list” performs marginally better here, you do have the choice. I'll give you, you should chose to use the list here if you care about performance, but still.

But the list approach also is the least code. It's fast and simple. So i thought this collisions thing would be a good example to illustrate linked lists are (still) useful.

Juliean said:
I mean, to be fair, you didn't really supply an example of a list that has to be used, because as has been shown, even if “list” performs marginally better here, you do have the choice.

You always have a choice. Ofc. i can't give an example where linked lists would be the only option, that's not possible, Showing one where list is faster and simpler than alternatives is all i can do.

Juliean said:
For example, Widgets in my UI-system have a pointer to their parent, and a vector of children. So that in and of itself is some sort of list, if we go into that direction.

Yes, that's the proper way to look at it. Actually you have a tree, but you can look at your tree as a generalization over linked lists, since the tree can do all the list could.
Next step is a graph, which is a generalization over trees.

Are there any other data structures? Actually not, i think. So we covered that whole book filling topic with two sentences, which gives us a really nice overview.
To me going into ‘this direction’ isn't a stretch, it is the best way to look at it at all. It's simple. Even i can memorize that. It's the root node from where i could start thinking about which kind of data structure i might need to represent some thing.

And from that perspective, it's not acceptable if somebody comes bay and says the simplest building block of all that is a bad thing and shouldn't be used.
It's the same as saying ‘You should not use data structures at all’. So i have to intervene.

(Maybe an array counts as data structure too, or even a single variable? Idk. : )

Juliean

7,374

December 20, 2022 03:10 PM

JoeJ said:
Does not hold here. Some object moving to a neighbor cell can cause an allocation and copy. To avoid this, you'd need to reserve object count size to each cell. That's not only too much memory, but also increases the stride between all out data, increasing complexity for cache management.

Even with moving objects, it should still level out. Even if you completely randomize position of elements, after a certain (short) amount of time, every cell will have reached its peak capacity. And reallocation and move also only happens when object-count doubles. So, if every list has about 10 elements, then 7 have to move to cause reallocation. After that, is has to be 33, etc….
What I'm saying is that, if you run your test for a few seconds, nothing should get reallocated or copied; so neigther of those things should affect the result. Thats why its important not to clear the entire vector.
So something about that doesn't make sense to me. I might try to get your code to run myself just to see if I find out, because this is actually bugging me.

JoeJ said:
It's a practical tool in cases where you don't have to actually solve the problem. It's no solution to memory management at all - it is a way to ignore the memory management problem on computers which have lots of it.

Don't think so eigther. If you know how many elements you get, you'll allocate just the right amount of memory. Even if you don't, you'll only ever allocate a maximum of 1.5-2x as much, depending on your fill-factor and the implementation. Thats not really wasteful in a general sense, a more specific case with vector<vector<>> grid might allocate a bit more. But even the “practical clustered” slides by naughty-dog that I referenced today mention to just “allocate worse case scenario * 2 memory” for their light-grid, so thats something that people will do in practice, even outside of “simple” 2d games ?

JoeJ said:
But the list approach also is the least code. It's fast and simple. So i thought this collisions thing would be a good example to illustrate linked lists are (still) useful.

Its only the least code with the implicit next-pointer inside the objects. I just don't like that. Too much of a purist for this kind of stuff :D If you don't mind doing something, more the power to it. I would only use this sort of stuff for local problems, where the objects and algorithm are contained inside the same class, maybe. But thats not really the case for me, mostly I tend to favour modular code etc…
Also IIRC (been a few too many pages between your code) brute force still was the simplest, even compared to implicit grid - of course with a tremendous performance overhead, but if we are talking about simples, thats that. If you want to say “simples with comparative performance”, sure. If it only weren't for that implicit pointer… but thats purely personal/opion, so need to argue about that part ?

JoeJ said:
I understand what you mean. Basically you accept those fluctuations to increase convenience and producibility. And you can afford this, because your game runs at 4000 FPS. No need to waste time on premature optimizations, which wouldn't even make a difference other than code harder to maintain. But if you're in a situation where performance is critical, such optimizations help, and are still local, not affecting overall complexity.

Well, I just want to mention one thing. I am actually in a situation where performance is critical, in its own way. For the aformentioned automated tests, the performance does matter. Before I implemented the correct frame-step mode that we talked about a few weeks ago, running the tests took 14m. Just the last few days, while I was writing the type-filter/binning, I ran those full tests maybe 10-15 times, just to see if everything still works (and it did find bugs, sometimes towards the end). So now that I have 90s, which is about 25000-30000 FPS, or 0.04ms frametime, this absolutely matters. So its not really a matter of performance not mattering. Its more a matter of fact that, the whole application is a complex machine with multiple systems, which alltogether do run very fast. I mean, cmon, an empty scene in Unity runs like what, 500-1000 FPS? So its not that performance is irrelevant for my case. Of course, end-user performance is of little regard in those dimensions, and even now, halfing the overall frametime would have a bit of a diminishing return in what it saves me. Though I still found it interesting to put some more emphasis on this fact. While your application might be technically more complex in certain ways, without such a system as I have, you still “only” have to hit the 60+ FPS mark, depending on what monitors you want to support, and whats your minimum specc of hardware. My target-framerate is actually in the range of tens-thousands (at least for the update/simulation-step, rendering not so much), with no real limit on whats pointless.
In the end, if you only care about end-user performance, whether your application does 200 or 2000 FPS on low-end hardware is of little concern. You save a bit of power, but thats it. And overall performance allows lower hardware to run higher quality-settings, and so on. But you still have a distinct frame budget to target, while me for my internal tests… not so much. There are obviously diminishing returns and I do have to weight how much time to put into this, vs how much it actually saves me, because in the end all it matters for is development-time (and a bit of me becoming annoying at having to wait for a test to complete lol). So in that regard, the entire engine perform very well. Even if single systems might have been using brute-force, even with O(N^2), it doesn't matter for the overall runtime/complexity. Maybe we have picked a bad example to even “argue” about, because thats kind of the truth - the collision-system doesn't have much noticable impact as of yet, while other systems like script-execution matter tremendously.
Which, I propably should have prefaced my original statement with that brute-force is fine, especially if done in a part of your application that doesn't have high imapct. For example, my custom exception-handler use a global search to find the handler-address, because exceptions are only used for actual errors, and are rare, so once again optimizing here by building local tables for the individual functions would be largely pointless. On the other hand, the event-execution itself matters a lot. Without the bytecode/JIT-tests would probalby take 30 minutes to complete, which would then be to slow, by my own (if somewhat unique) performance metrics ?

JoeJ said:
Yes, that's the proper way to look at it. Actually you have a tree, but you can look at your tree as a generalization over linked lists, since the tree can do all the list could. Next step is a graph, which is a generalization over trees.

Ok, sure. I really didn't mean to exclude those in my original statement, it didn't even occur to me that thats what one would consider a “list” or “tree”. Now the question is whether I'm wrong for excluding it from my definition, our you are right for assuming it was included. I'd say we stay at “who knows”, I think we've filled too many pages here already :D At least OP probably meant the same as I did, which is what I was responding. But maybe he can clarify that himself (if he still follows the book we are writing at that point ?

Juliean

7,374

December 20, 2022 03:47 PM

So, just for the lulz, our robot-overloads agree with my definition (yeah I'm aware that might extend the debate - but to give use credit, this has been the longest online-discussion I've ever seen that didn't turn hostile or resulted in just ad-hominem being spat around by one or both sides. So I don't mind having it a bit longer):

So, at the very least that was the definition that I was going by. So by that definition, what you use I would classify as more of a (arguably clever) list-like hack? Maybe more if we are talking about “algorithms”, and less “datastructures” IDK. Because:

JoeJ said:
(Maybe an array counts as data structure too, or even a single variable? Idk. : )

Yeah, if we count your example as a list, than a variable might even be an “array”. Because if we have class Foo { int x}; and do vector<X>, then the memory-layout of “x” is no different than if we did vector<int>, and it would have many of the same properties.

I guess it is all up to definition. Which is why its important to discuss such topics. I merely just wanted to shine a light on ChatGPT. That thing is black magic to me :O

JoeJ

4,359

December 20, 2022 04:51 PM

Juliean said:
Even with moving objects, it should still level out. Even if you completely randomize position of elements, after a certain (short) amount of time, every cell will have reached its peak capacity.

If that's the goal, why not reserve every vector to the measured peak? Or twice of that, using an array instead?
Because it's too much memory? Can't be the answer, if you're willing to accept it will be allocated anyway, during gameplay even.
What's the answer then? Saying many cells will be never visited? Maybe, but then your quoted argument would no longer hold, no?
But let's go from there anyway. What happens with all the memory allocated from some lone travelers, which walk some rare paths? Who cares about deallocation, and when does it happen, at what cost?

It's just a suboptimal, partial solution for all those problems. It's convenient, but that's the only advantage.
And with the alternative at hand, which avoids all those problems, and is at least equally simple to use, what is the real reason you stick at this inferior vectors approach?

I think i know what it is: ‘coding standard’
You stick to a pattern that matches your style, and you rate this higher than efficiency. And you justify the fallacy with bad arguments, basically self confirmation bias.
Consider to extend your standards instead. Or stop defending the inferior solution, luring others to the same path eventually.

Juliean said:
slides by naughty-dog that I referenced today mention to just “allocate worse case scenario * 2 memory” for their light-grid

Didn't check this, but i assume they use this like i had just proposed above: Allocate a fixed amount of memory, make sure it's usually not exceeded during gameplay, and make sure the game does not crash if so.

That's very different from what you propose by using std::vectors, because then the allocations are unbound. Memory will be wasted, game may even crash on a failed allocation in the worst case.

Now, considering the poor state of AAA releases today (bugs, stutters, bad perf., huge day one patches), i do not rule out they indeed use dynamically sized vectors in cases where it could be avoided so easily. Idk.
But this does not give you any valid argument either.

Juliean said:
Its only the least code with the implicit next-pointer inside the objects. I just don't like that.

Store the pointer in it's own vector, which has the same size then object count so won't grow or allocate.
Another option is to add a ‘user_data’ variable, which many algorithms can alias to their temporal use. Ugly and error prone eventually, but maybe more efficient.
Or you declare the functionality of spatial range queries a core functionality (which i think it is), to justify the pointer.

Juliean said:
I would only use this sort of stuff for local problems, where the objects and algorithm are contained inside the same class

You can do this with any approach.

Juliean said:
brute force still was the simplest

Which does not invalidate my argument of ‘the fastest and simplest solution’, because brute force isn't fast.

Juliean said:
If it only weren't for that implicit pointer…

What's the difference between:

pointer per grid cell and pointer per object.
std::vector per grid cell, which stores pointers to all objects, plus pointer and size to it's own memory

2 needs actually more memory, even if we ignore it's unbound. So again: ‘coding standards’
If the standard conflicts with the better solution, the problem is on the standard, not on the solution.

Juliean said:
the whole application is a complex machine with multiple systems

Most applications are. No reason to degrade system quality just because there are many.

Juliean said:
My target-framerate is actually in the range of tens-thousands (at least for the update/simulation-step, rendering not so much)

Why, btw? It's not a racing simulator?

Juliean said:
I do have to weight how much time to put into this, vs how much it actually saves me, because

We already have put MUCH more time into arguing than it took me to write this applet. That's no exaggeration, but the sad truth about nerds ;D

JoeJ

4,359

December 20, 2022 05:10 PM

Juliean said:
So, at the very least that was the definition that I was going by. So by that definition, what you use I would classify as more of a (arguably clever) list-like hack?

I use a linked list pointer in a struct which has also other data and functionality. The example too, as it has data which isn't needed to have the functionality of a linked list.

I lack an abstract interface to manage the list entries, because it would be overkill and code bloat, imo. Instead i implement those things in place where needed, but they are still there, as far as needed. Maybe, if you would have posted those too, the given answer would have been different.

I neither know nor care if this declassifies my struct to be a linked list by definition or not.
But i still claim that memory management of list entries is not included into the definition of linked list data structures. The example also spares on providing implementation of that.
And, the given example is no container, such as std::list is.

So i don't see this as confirmation about either of our differing interpretations on definitions. If i'm not allowed to call my approach a linked list, how should i call it then?

Juliean

7,374

December 20, 2022 05:30 PM

JoeJ said:
If that's the goal, why not reserve every vector to the measured peak? Or twice of that, using an array instead? Because it's too much memory? Can't be the answer, if you're willing to accept it will be allocated anyway, during gameplay even.

That was something that I was suggesting before, so I'm not sure why you are asking me :D I would absolutely see it as a possibility to solve that “problem” by just measuring the max-peak of a what a cell can have, and then reserve to that. I wouldn't suspect it to make a (positive) difference in your scenario, or even in real world, unless you care (or somehow measured) the impact of the first few frames (or occasional sudden movement of large amounts of objects). But I also wouldn't expect performance to be negatively affected by the already disjunct memory-addresses, which would already not fit into one cache-line now being further apart. But I could be mistaken.

JoeJ said:
Didn't check this, but i assume they use this like i had just proposed above: Allocate a fixed amount of memory, make sure it's usually not exceeded during gameplay, and make sure the game does not crash if so. That's very different from what you propose by using std::vectors, because then the allocations are unbound. Memory will be wasted, game may even crash on a failed allocation in the worst case.

Since they were (i belive) talking about GPU-resources on that slide, I think they did mean actual fixed memory by design. They also did mention the part about making sure by asserting and testing regularily that the game doesn't blow up.

But the way it applies to vector, is similar: You just reserve as much memory as you ever expect is needed, with some buffer. Then, no reallocations will happen. See I think you now need to be “cured” about this misconception about vectors, because you somehow seem to assume that allocations are going to happen when they are not ? So by reserving enough in one go, you would never incur any additional allocations. Ever. With the advantage that unlike fixed memory, your game wouldn't blow up if you did miraculously peak over the allocated amount. In which case you would incur one allocation+copy, after which you would have to go over the now doubled capacity to allocate again.

JoeJ said:
Store the pointer in it's own vector, which has the same size then object count so won't grow or allocate. Another option is to add a ‘user_data’ variable, which many algorithms can alias to their temporal use. Ugly and error prone eventually, but maybe more efficient. Or you declare the functionality of spatial range queries a core functionality (which i think it is), to justify the pointer.

Yeah, but if you store its own vector, the code would not be as simple anymore, to declare it the “simplest solution” it kind of requires to do exactly that, is all I'm saying. Using this for other features could be done, but I'm not sure that a range-query would work. Without knowing the first object in the “list”, you cannot perform a full range-query without doing additional lookups, or going double-linked, which would require additional lookup (kind of defeats the point) or complicate things.

JoeJ said:
What's the difference between: pointer per grid cell and pointer per object. std::vector per grid cell, which stores pointers to all objects, plus pointer and size to it's own memory 2 needs actually more memory, even if we ignore it's unbound. So again: ‘coding standards’ If the standard conflicts with the better solution, the problem is on the standard, not on the solution.

2) doesn't tie the fact that a list-based spatial structure exists to the definition of the object. The main objective difference I could give you is that if you end up having, say 10 different structures using the same object, then you'd have to store 10 pointers in the main object, each time modifying the object to achieve the next goal. If somebody with no access to your source would use this class and want to add a different container, they couldn't do the same that you did.
It is only natural that you sometimes have to make decisions vs “clean” code and “fast” code. I did a similar optimization with my script-backend where I store a back-pointer to the actual object inside a “handle”, so that I can save a few indirections+a map lookup for going handle→object which is a very common operation. I still don't like it, but it is beneficial/necessary. But still, its ok to do so, occasionally if the problem warrants it. But I don't think you should modify coding-standards or once perception of “good” code to always mean that you should use the fastest solution, especially when the alternatives are only slower by a margin.

JoeJ said:
Most applications are. No reason to degrade system quality just because there are many.

What does “quality” mean? If we are still talking about performance than actually, optimizations should be applied to the systems that have the largest overhead. Would you rather optimize something that takes 90% of your applications frame-time, or 5%? You could also say both, and 5% is still a larger number. But eventually you'll get to a point where this specific thing doesn't matter. In my case, collision-handling simply put, doesn't matter. In your case, it (probably) does.

JoeJ said:
Why, btw? It's not a racing simulator?

Its primarily due to the two factors (which both only affect my development):

I have a deterministic turbo-mode button, similar to what emulators can do. So due to determinism, speedup needs to happen by simulating more steps, not increasing dt. This is set to x16, but could go higher. So the (debug) frametime already cannot go higher than 1ms, to allow it to execute in turbo-mode without “lags” or dropped frames
The automatic tests I mentioned pretty much use the same system. They record input frame-perfectly, and replay them to run the whole game, as fast as humanly possible. The whole test-range is of gameplay is 14h as of now, and the faster it can complete, the better. Its not mandatory per se, but it provides a real benefit in time that I save just sitting around on idle, just like compilation speed does. Getting them down to the 90s I have now allowed me to run them much more often and spot problems earlier.

None of this affects the exe that the players get. But it affects how fast I can develop content, what changes/fixes I can apply to the game without the risk of bugs slipping through. I realize that probably most people don't have such systems. I tried implementing turbo-mode in Unity for work and most stuff exploded. Automated tests backed by frame-perfect inputs would be nigh impossible in such an engine. But but those systems provide an incredible advantage in terms of development, which matters more for a content-driven game than it probably does for many others.
I run real manual playtests inside the editor in a fraction of the time it would usually take due to turbo-mode, without anything misbehaving.
And I can make any change imaginable, even largescale, like refactoring/optimization/extensions and ship them to the public game, while knowing that the risk of stuff breaking is minimal due to an extensive (real life) test of the game is being performed in less than 2 minutes without me having to do anything. Also if anything does come up, say towards the end of the run, I can just fix it, and run everything again, still in the span of minutes.

So, this is why I'm saying that the framerate that I “have” to have is in those ranges. Because if I didn't, if my game barely cracked 16ms, or even 1ms, or even 0.1ms, it would cripple my development-process, because I would lose some of the features that save me large amounts of time on a daily basis.

JoeJ said:
Which does not invalidate my argument of ‘the fastest and simplest solution’, because brute force isn't fast.

Semantics, or phrasing ? For me, if you say fastest and simplest, it means that it is both a) the fastest and b) the simplest. Which it isn't, it is the fastest but only the second simples. I would expect something along the lines of “the fastest and one of the simplest”, but maybe that just my english. Anyway, since this discussion seems to have (largely) stemmed from some semantic misunderstanding about what lists means, those things are important.

Juliean

7,374

December 20, 2022 05:47 PM

JoeJ said:
I use a linked list pointer in a struct which has also other data and functionality. The example too, as it has data which isn't needed to have the functionality of a linked list. I lack an abstract interface to manage the list entries, because it would be overkill and code bloat, imo. Instead i implement those things in place where needed, but they are still there, as far as needed. Maybe, if you would have posted those too, the given answer would have been different.

The main difference I see is that your example does not provide the ability to add new elements. You can insert stuff into the list and delete them from the linkage, but you cannot add an entirely new “object”, unless there already exists an object of type “Object” somewhere. In my mind, this would not make it a “list”, but also not make it not a “list”. Its really not a data-structure at all at this points, its an optimization that effectively removes the need for an data-structure in the first place. Which is great and all (no really), but not what (I think most) people understand when such discussions about lists are being had. This is the crux of the matter BTW, depending on if most people when they read my original answer understand it the way I do or you do, is essentially whether you are right or I was right - it doesn't even matter that much if some book or whatever agrees, but depending on that outcome I eigther missed whats commonly understood by list and failed to address it, or you applied your own uncommon definition of list to my answer. To some part, that does depend on what is defined in books etc, but also a bit about what most people in the space think (which is hard to fathom, so not really sure whats absolutely the case). The least I can say that is was partially a misunderstanding.

JoeJ said:
I neither know nor care if this declassifies my struct to be a linked list by definition or not. But i still claim that memory management of list entries is not included into the definition of linked list data structures. The example also spares on providing implementation of that. And, the given example is no container, such as std::list is.

It spares both implemenetion-details as well as interface, but it does have the properties of my definition a container (being able to actually add entirely new elements or remove them). But again, for such discussions it does matter. If I say "lists are slow” and you assume that what you do falls under what I mean by list (which I didn't intend), then yeah, eigther one of us assumed something thats not “regular”. I guess the best solution, which I'll try in the future, is add a bit more context with what I mean, and I'll BTW also keep in mind some of the learnings here, how the number of elements iterated through has an impact on the overhead, etc… but I'd also say:

JoeJ said:
So i don't see this as confirmation about either of our differing interpretations on definitions. If i'm not allowed to call my approach a linked list, how should i call it then?

If you really want to use a word, I'd call it an “implicit list”, or “implicit list-like linkage of objects”. I'd probably just describe it as what it is “I use a member-variable to link objects inside a cell, so that I don't have to maintain any additional data-structure”. At least then everbody (hopefully) understands ?

And btw I'm really also open for the possibility that my understanding or definition is wrong, don't think I'm just trying to pin this on you. I'm just not really sure.

What kind of coding standard do you prefer

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

What kind of coding standard do you prefer

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines