Advertisement

What kind of coding standard do you prefer

Started by December 16, 2022 11:35 AM
82 comments, last by JoeJ 1 year, 11 months ago

So, other than the official answer “what ever standard client prescribes”, what kind of coding standard do you use at home in your personal projects?

Here is mine:

Standard is optimized for speed, simplicity and readability and fits on one page. To get the best cache performance it is wise to keep your memory sequential and compact as possible (DOP instead of OOP at least at the lower levels). ECS is main paradigm.

  • using vectors instead of object pools. I want sequential and pre-allocated memory and even if memory isn't preallocated (empty vector) it will become so after a few frames (when enough objects get created). Maybe add another vector to track free elements (or wrap those two ones in a template called ConservativeVector - but that is practically object pool if you preallocate enough memory). Be careful with references. If you have a lot of dynamic object this might not be the best solution.
  • using indexes instead of pointers (this goes together with a point above). In most cases unsigned short will be enough and it will be 4 times smaller than pointer so your data will fit in cache better. This also means that you absolutely don't have to write copy constructors and have any kind of memory management (or very little) - literally entire engines and editors can be written without a single new, delete, malloc and free command (at least for static data like meshes and textures). Another benefit is the code can be run on GPU and CPU all the same (if you wrap recursion). I've implemented ray-tracing octrees this way and did all the debugging on the CPU. My BVH hierarchy copies to GPU the same way. If you have some extremely complex hierarchy, debugging with indices is much easier than reading pointers (easier to draw entire hierarchy on paper and setting break points).
  • using Memory Context class to store all those vectors that would otherwise be created as a local variables in the functions. Of course, these memoryContext objects should be reused (and passed by reference) so memory allocations won't happen on every use but only on first few usages. If you have a thread pool, have one of these objects created for every thread. Be careful with recursive functions.
  • don't use sets or lists - they are 10-20 times slower than vectors. Migrate all your helper libraries to use vectors for internal storage (RadixTries, PatriciaTries, Dictionaries, Octrees, Radix Sort functions, geometry processing functions…)
  • SortForEfficiency is a common function in my helper libraries.
  • Always pass by reference. Don't return vectors, but pass them as input-output parameter. That way memory pre-allocation will be maintained.
  • use JSON for storage and clipboard copy paste. It is simple and it will maintain compatibility within different versions of application and anyone can code plugins in any language they want. Nobody has to waste time with writing custom DSL parser every time. If you are very lazy, you can even just dump those vectors in a file and hope that format won't change but I wouldn't rely on that.
  • greedy thread pool. If it isn't faster than OpenMP you haven't created a fast thread pool. In all my use cases I only needed a thread pool where all the threads run the same task so I've actually omitted single threaded type of task in the latest implementation. It is kind of like async but more. There are 3 levels of task priority and only first one is non blocking. If bottleneck happens all the free threads can help the ones that are still working by preforming sub tasks (there are macros that I use in all of my code that will do parallel processing if thread pool is present or just single core if not). One thing, if some thread submits a task to queue and manages to finish it before anyone helped - that is OK - there won't be any penalty like with OpenMp.
  • I rarely use inheritance and virtual functions but good example of it is the GUI system (it is highly efficient and packs one integer per graphical element instead of entire geometry of 4 vertices which is 28 times bigger like with some other libraries). So, in some cases OOD is more elegant than DOD but I mostly prefer DOD. OOP example is using virtual functions for OnCreate and OnUpdate on your engine's objects. On the other hand, if you use DOD and composition, these function can just be properties of the object. If they are set then call them, otherwise do the default functionality.
  • not using single file per class. I like to keep things that make sense together grouped. I also prefer header only libraries for smaller things that won't change much in the project (many times these classes are templated so there isn't much choice). Bigger classes that are updated frequently should have both H and CPP files.

That is it. Almost no memory management and very fast code (for certain types of project where most of the objects is mostly static).

Edit: edited for clarity.

I use the same practices for almost all points. At least for my current work on tools.

For real time and highest performance, i have some concerns:

Ivica Kolic said:
using vectors instead of object pools. I want sequential and pre-allocated memory

Isn't pre-allocated memory the primary reason to use memory pools? Vectors don't give that, if object count isn't constant over time.
And if object count isn't constant, using vectors also isn't sequential. Because if you delete objects, the memory they use in the vector remains but is unused, so you can't iterate sequentially without an compaction step and updating all your indices.

So i would say vectors are no easy solution to those memory management problems in general.

Ivica Kolic said:
using indexes instead of pointers

The problem is that with indices you now need to fetch the index and the base address, so chances of a cache miss are much higher than with using pointers.

Ivica Kolic said:
Another benefit is the code can be run on GPU and CPU all the same

That's usually not the fastest option for GPU.

Let's say we use a node data structure like this on CPU:

struct NodeBVH8
{
	vec3 boundMin;
	vec3 boundMax;
	int children[8];
};

On GPU this might be much faster (just some hypothetical example to make a point):

struct NodeBVH64
{
	vec3 childBoundsMin[64];
	vec3 childBoundsMax[64];
	int children[64];
};

Besides using a different branching factor, the important difference is the memory layout of the child boxes. We now have 64 min box coordinates sequentially.
This should be a big advantage if we use a workgroup size of 64 (or more). Because GPU threads run in parallel, the access pattern becomes horizontally in code, no longer vertically. So we should optimize for sequential horizontal access, ideally.
The difference can be like 10 times faster in my experience, so it's worth to experiment with various data layouts. (A bit similar to the SoA vs. AoS topic with CPU SIMD)
Sadly, to me this means porting CPU code to GPU requires different data structures, and even different algorithms on the low level. The effort is always big if we want high GPU performance.

Advertisement

JoeJ said:
Isn't pre-allocated memory the primary reason to use memory pools? Vectors don't give that, if object count isn't constant over time. And if object count isn't constant, using vectors also isn't sequential. Because if you delete objects, the memory they use in the vector remains but is unused, so you can't iterate sequentially without an compaction step and updating all your indices. So i would say vectors are no easy solution to those memory management problems in general. Ivica Kolic said: using indexes instead of pointers The problem is that with indices you now need to fetch the index and the base address, so chances of a cache miss are much higher than with using pointers.

JoeJ, you are right on every point. What I was trying to say is that I don't really do any kind of memory management for the simplicity of things. Eventually things will sort themself out (after few frames when vectors reach big enough size) and maybe from time to time I'll do SortForEfficiency.

So if my scene class has objects and meshes it will look like this:

class CMesh
{
	std::vector<VERTEX> _vertices;
	std::vector<INDEX> _indices;
};
class CObject
{
	int _meshID; // So I'll won't have pointer to mesh, only index to mesh. Meshes will rarelly change and if they do I'll use 
	int _childObjectID = -1;
	int _nextObjectID = -1; 
	float4x3 _positionMatrix; // This might go into seperate vector (if needed on GPU and maybe so that we can send another vector of total position matrices and not these ones in parent space)...
}
class CScene
{
	std::vector<CMesh> _meshes;
	std::vector<CObject> _objects; // Using all indices..
}

So, I won't bother with memory/resource/object pools. In the case that meshes are being changed by the system they will block rendering thread (the only one using the mesh) from using it for a while while I update the structure or resize things. In fact, rendering thread only rarely accesses _meshes vectors because that happens only when it is creating RenderingMesh for some new mesh. So only render mesh creation code has to be blocked via crit section and not entire rendering thread (if you prefer to do that in rendering tread and not some other thread). ECS system is in control of both meshes and objects and it'll block other threads that might be using them.

The same goes to objects - since this is a tool, objects and meshes are added rarely. If vector memory resizes I don't care because all the IDs (meshID, childObjectID, nextObjectID) will stay the same and I didn't need to write any kind of move constructors or any kind of memory manager (only couple of crit sections or spin locks for comunicating with rendering thread).

The thing is, OOP bloat if terrible and I can't stand those solutions that have thousands of small files when everything can fit in maybe 20-30 easily readable ones.

Original name for the thread was supposed to be “How do you handle OOP bloat?” but it seemed too harsh and I figured it will be best if everyone just writes their own coding standard so we can all pick the best features.

As for GPU optimized or CPU optimized, you are right - different packing might make a difference and I am fully opened to experimenting what is the best structure (array of objects or object of arrays or mix) - I'm not OOP SOLID anal.

I also have some of the points in common, but not all.

JoeJ said:
Isn't pre-allocated memory the primary reason to use memory pools? Vectors don't give that, if object count isn't constant over time. And if object count isn't constant, using vectors also isn't sequential. Because if you delete objects, the memory they use in the vector remains but is unused, so you can't iterate sequentially without an compaction step and updating all your indices. So i would say vectors are no easy solution to those memory management problems in general.

Vectors can still work in those scenarios, depending on how you use it. Preallocating memory can be done by reserve()-ing a sufficient amount of elements. Also, even if you delete elements, you can do one of two things:

  1. Simply delete the element out of the vector and move everything after that down, as you put it with a “compacting step”. This is a good option if you don't delete elements very often. If you delete a few entities per second/every few seconds, this won't matter much.
  2. Use “swap&pop”, where instead of moving all elements after the deleted one down, you move the last element in the vector in the spot that is now free, and reduce the size of the vector by 1. This is a good option if you don't need the guarantee that certain operations have to always occur in the same order. You might still need to update external indices/pointers to that one element, but not as many as with the other methods

Though I also have to say, the performance-benefit from placing elements in a linear block of memory doesn't seem to be that huge anymore the last few times I checked. So it might actually not be worth it for many scenarios. We are talking like 5-10% better performance for iterating over the elements, at the cost of often having to complicate a system (as you are no longer able to point to the elements directly). Might be worth for internal loops where you have the full control, but TBH I think I wouldn't bother doing it for things like ECS anymore, unless it easily integrates with your existing design.

JoeJ said:
The problem is that with indices you now need to fetch the index and the base address, so chances of a cache miss are much higher than with using pointers.

It can be worth it, if you apply it to multiple elements at once. My renderer compiles a 32-byte struct that contains all the necessary elements to execute a draw, using differently sized integer-types to achieve that tight packing. If I were to use pointers, it would be more like 128 bytes, probably. Also, having the render-items reference via index does have one practical advantage: You can replace the stored objects without the items being affected. I'm just now implemented a handling for device-removed resets in DX11, and I can do so (mostly) inside the device-implementation. It again does depend on the use-case, I would say. If you have many elements that you change from pointer to small types (optimally 1 byte), that you can pack; and you execute a lot of those elements in a loop (so that the storage for the actual objects is easily cached and reused); and you don't need to access the object-pointer manually very often, then using indices can be a win for both performance and usability.

Ivica Kolic said:
the only place where I use inheritance and virtual functions is GUI system. It is highly efficient and packs one integer per graphical element (instead of entire geometry of 4 vertices which is 28 times bigger like with some other libraries).

I find inheritance useful for gameplay, or at least easier to handle. Having a “PlayerInteractible” that has “CanInteract” and “OnInteract”-virtual methods that I can just override when I want to have an object that has some specific behaviour, I prefer that over the alternatives. In general, I would say, the more objects of the same type exist and are used at the same time, it makes sense to go with data-driven approaches. For example, handling all the “sprites” in a scene per se makes sense to me to use an ECS. However, if you have single objects that are only interacted with once at a time, like an NPC with a specific dialog-script, data-driven clearly loses to OOP - because in those cases you mainly need to be able to implement and iterate quickly, and performance for single objects is really no concern (and wouldn't benefit from DDD anyways). Though not, that I'm coming from a place of making a game that is very content-driven, with lots of custom interactions and behaviours. So for a different type of game, that is purely mechanics driven, you might not need such a solution.

Ivica Kolic said:
not using single file per class. I like to keep things that make sense together grouped. I also prefer header only libraries and no, it won't increase compile time. It takes in average 20 - 25s to fully recompile even my most complex projects.

Thats probably a bit of a matter of taste. When using external libraries, having it be header-only can obviously be a bit easier to setup, though this can be avoided by regular libraries with good build-scripts, or even better, just supplying ready-to-compile projects for the most common IDEs.
I personally do prefer to split implemenetations into cpps. And yes, being header only for sufficiently large projects can indeed increase the compile-time to an unacceptable level - sorry, but I have to completely disagree with your premise.

First, even a 30-second recompile is way too much. Making a change in a cpp-file usually takes less than a few seconds.

Second, I do not have any idea how large/complex your project really is, but I'm willing to bet that if you have that stance about compile-time, its probably not that complex compared to whats out there. Unreal Engine 5 for example has multiple million LoC. If it were header only, a full recompile for every change could take up to, if not more than an hour.

My own game-engines core has 330k LoC, and takes about 3 minutes for a clean recompile on my new i13900k. Chromium Codebase, which is used as a CPU-benchmark, also can take from 10-60 minutes to compile.

Like I said with the other points, it probably depends on the scope of the project. But for projects that are complex enough, not having to fully recompile just because you have to change one line in some random method, is a very important, if not vital, factor. Modules might shake things up a bit. private-module fragment might make header-only libraries be able to skip full recompilation in the same way that split c++-files do now.

Juliean said:

Ivica Kolic said:
not using single file per class. I like to keep things that make sense together grouped. I also prefer header only libraries and no, it won't increase compile time. It takes in average 20 - 25s to fully recompile even my most complex projects.

Thats probably a bit of a matter of taste. When using external libraries, having it be header-only can obviously be a bit easier to setup, though this can be avoided by regular libraries with good build-scripts, or even better, just supplying ready-to-compile projects for the most common IDEs.
I personally do prefer to split implemenetations into cpps. And yes, being header only for sufficiently large projects can indeed increase the compile-time to an unacceptable level - sorry, but I have to completely disagree with your premise.
First, even a 30-second recompile is way too much. Making a change in a cpp-file usually takes less than a few seconds.

I'm only using header only for my utility libraries that never change (like 3DMath.h, Dictionary.h). These are templated so header only is the only option in the first place. Studio, Scene, Mesh, Manafoild and other bigger objects are H and CPP because (like you've said) you don't want to touch any H file that is used by many other libaries because compile time will be more than a few seconds.

My engine and editors are somewhere between 100k and 150k in lines of code (counting all the utility libraries).

Ivica Kolic said:
I'm only using header only for my utility libraries that never change (like 3DMath.h, Dictionary.h). These are templated so header only is the only option in the first place. Studio, Scene, Mesh, Manafoild and other bigger objects are H and CPP because (like you've said) you don't want to touch any H file that is used by many other libaries because compile time will be more than a few seconds. My engine and editors are somewhere between 100k and 150k in lines of code (counting all the utility libraries).

Ok, that sounds much more like what I would also do, thanks for clarifying ?

Templates obviously are mostly constraint to headers - I have found a few cases where you can put template-implementations into a cpp, not sure if you've seen those. For example, if you know all the possible specializations of a template, you can sorta forward-declare them in a header (with a weird syntax where I cannot find an example every time I look for it :D) and put the methods into cpp. Or, a more common case, is a private template-method in a non-template class, those can also be put into the cpp, if so desired.

But I do agree, that it doesn't make much difference for files that are rarely changed. You would probably use precompiled headers for such files to speed up general compilation, and then you are forced to recompile completely. Though even here, I'm waiting for full module support in the hopes that it will improve compilation for the cases where you have to touch those files.

Advertisement

Ivica Kolic said:
I'm not OOP SOLID anal.

I like your attitude. : )

You know what i dislike about OOP?
Although it forms tree alike dependencies, which one would think gives a well organized structure easy to observe,
whenever i have to work with OOP code, in practice it's always much more like a spider web from the perspective of a bug.
Crawling around and seeking a way out, just to find out there is always somewhere an obstacle preventing me from doing what i want to do.

Personally i use OOP very rarely. Sometimes it's nice. But i'll never get why so many people use this concept for almost everything.

not using single file per class. I like to keep things that make sense together grouped. I also prefer header only libraries and no, it won't increase compile time. It takes in average 20 - 25s to fully recompile even my most complex projects.

Well, in this regard i'm really exaggerating. I tend to throw too much things into the same header file on early research, and i'm even too lazy to put large functions in the cpp. Multiple classes, which are related, but should be separated. Then i stick at that and postpone to clean up, accepting long compile times. I only fix the mess once i'm done with it, so it compiles fast finally, but i no longer benefit from that. My largest header file currently open is 30k lines of code.

That's really a bad habit of mine. But creating files is so tedious, and writing function definition twice in h+cpp sucks. So i can't be helped.

Juliean said:
Vectors can still work in those scenarios, depending on how you use it. Preallocating memory can be done by reserve()-ing a sufficient amount of elements. Also, even if you delete elements, you can do one of two things:

Meh - both those options suck. There is no good solution to deal with this imo. But we can accept it all sucks. \:D/

JoeJ said:
Meh - both those options suck. There is no good solution to deal with this imo. But we can accept it all sucks. \:D/

Depends on your point of view :D Even if you don't place the objects themselfs in a vector, even using a vector<Object*> still requires a O(N)-deletion time. You only save some constant factor if the objects themselves are way larger than a pointer, and even more so if the object doesn't have cheap move-semantics.

But its IMHO still bettern than using a list - right? Please say you agree, because I'm with OP that Lists pretty much suck ass for scenarios where you are not deleting elements more often than you are iterating (in which case, what the actual fuck are you doing there :D), and that would pretty much be the only way to have O(1) deletion. Or you use a dictionary but those also have way worse iteration-speed.

Juliean said:
But its IMHO still bettern than using a list - right? Please say you agree, because I'm with OP that Lists pretty much suck

Because it sucks so hard, even lists can be the best option. I mean, that's how object pools, freelists, etc., have to work. Which then likely is still better then careless dynamic allocations.

It really depends on what you do most at what cost. iteration, traversal, allocation, etc. Ideally, one option is as bad as the other. Then we can do at least what we like best without feeling guilty : )

The only general concept which helps me in this regard is tuning for granularity. E.g., if my octree causes too much pointer chasing, i might increase the block size from 2^3 to 4^3, trading better memory access patterns vs. inferior work efficiency.
This also reduces frequency of node allocations, so the node pool management costs may become negligible, while dynamic allocation using a vector would be still the same problem.

This certainly works, but i mind doing such things only when performance really shows a problem, which is rare. Much more likely, we see no certain bottlenecks, and we can't optimize just everything, so we'll leave a lot on the table, and software becomes more inefficient than it could be. Because we're just humans.

Hmmm… maybe there will be some AI assisted programming language, trying such optimization options for us, while we still have enough control to prevent black boxes. I could imagine this makes sense…

JoeJ said:
Because it sucks so hard, even lists can be the best option. I mean, that's how object pools, freelists, etc., have to work. Which then likely is still better then careless dynamic allocations.

As for object-pools - I did implement pooling recently for my ECS, but not using lists or anything. The objects are still there in memory, only being treated as-if deactived. For scenarios with lots of spawned objects that have to be destroyed often, this saves a lot of overhead from having to manipulate the vectors all the time, but it still means that when accessing entities/components, its not based on a list. If I made all those things lists, then the performance would degenerate in pretty much every scenario. The way it is, pooling can decrease overhead in certain scenarios by 2-3x (even though the objects are not fully removed and “holes” in the memory layout could appear), but when not used the performance is still optimal for iteration.

Thats also what I was saying btw - in my mind, most of the time when you have a container, you do want to iterate/traverse it more often then you modify it. Otherwise, whats the point of the container? If you have a container that you have to recreate 10x per frame, but only iterate every few frames, I'm pretty convinced that it would be just better to do the iteration over the raw underlying collection once, instead of maintaining it in the first place. I'm open for somebody actually showing me a real-world example where a list (which is really bad when you are concerned about dynamic allocations because every new node will be its own allocation) is really the best choice, but I really have not seen anything to convince me yet.

Even for Octrees, I'm not convinced that they are really a good solution all the time, or as often as people use them. Obviously, I think that for systems with many millions of elements like voxels you'll need those structures, but for anything else… going back to the tests I did in this forum with linear vs binary search, linear won up until 3000 elements, even though you would be doing wastly more steps. In the same vein, even a simple O(N) culling without an octree would probably outperform the tree up to that number of elements, even if you end up discarding many elements towards the root (remember, binary search discards half elements at each jump).
Even a O(N^2) collision detection would likely outperform more optimal structures up to a certain point - even though you'll reach the turning-point much quicker. But if you add the additional overhead of maintaining the tree as well, it'll push it further towards the “dumb" approach.

Not to mention, the overhead that maintaining such structures creates in terms of what needs to be programmed, can easily outweight any befenits you get from it. For example, as I alluded to before, I can now auto-test my 13h game in about 90 seconds at 500x speed, and it uses O(N^2) collision (with the maximum amount of objects that need to be checked about ~100) - which essentially means that the game will barely have any noticable impact on the system load on any configuration you could imagine. What would I gain by adding a quadtree, or any other struct? Even if it makes max-load case a bit faster, I'd have more headache and the low-load scenarios would perform worse, so I don't see it.

Thats kind of the point I wanted to make. Performance is important, but even “dumb” approaches can outperform smarter ones on todays hardware. And as long as performance is stable, chosing a more optimized approach could end up just wasting your time. In that regard, I do consider vector to be the best go-to unless you really have one of those rare cases (that I personally cannot imagine) that I referred to before. You could opt to chose a list if it makes your code easier to maintain as well, though. Similarily, you shouldn't start your renderer with an octree until you have both a large amount of objects, and notice that linear culling becomes way too slow with the load that you have. And even then, you should measure before and after. At least unlike the list case, I do belive that there are general cases where structures like (oc)trees are really vital ?

EDIT: Oh yeah, to shit even more on lists. I totally forgot, but this makes lists even worst: Even if removal in a list is technical O(1), you still have to find the node. And that is again O(N), now with a much worse constant factor due to the fragmented memory and indirections. Yeah, you can aliviete that in some circumstances by storing the node-pointer in the original object, but obviously only if this list is part of that objects core interface, and not if you use a list somewhere else in the code as a “consumer”.

EDIT2: To add to that, in a vector<Object> you can actually find the iterator/location of the object in O(1) by subtracting the address from the “data”-member. So now with all that, removal from a list is O(N) cache-unfriendly lookup followed by O(1) removal, while the vector can remove with O(1) lookup followed by O(N) move of elements (if using simple delete), or O(1) if using swap&pop. Which means that the downsides of removal from a vector can be completely aleviated.

This topic is closed to new replies.

Advertisement