Back to General and Gameplay Programming

What kind of coding standard do you prefer

iko iko · 2022-12-20T20:57:39

So, other than the official answer “what ever standard client prescribes”, what kind of coding standard do you use at home in your personal projects? Here is mine: Standard is optimized for speed, simplicity and readability and fits on one page. To get the best cache performance it is wise to keep your memory sequential and compact as possible (DOP instead of OOP at least at the lower levels). ECS is main paradigm. using vectors instead of object pools. I want sequential and pre-allocated memory and even if memory isn't preallocated (empty vector) it will become so after a few frames (when enough objects get created). Maybe add another vector to track free elements (or wrap those two ones in a template called ConservativeVector - but that is practically object pool if you preallocate enough memory). Be careful with references. If you have a lot of dynamic object this might not be the best solution. using indexes instead of pointers (this goes together with a point above). In most cases unsigned short will be enough and it will be 4 times smaller than pointer so your data will fit in cache better. This also means that you absolutely don't have to write copy constructors and have any kind of memory management (or very little) - literally entire engines and editors can be written without a single new, delete, malloc and free command (at least for static data like meshes and textures). Another benefit is the code can be run on GPU and CPU all the same (if you wrap recursion). I've implemented ray-tracing octrees this way and did all the debugging on the CPU. My BVH hierarchy copies to GPU the same way. If you have some extremely complex hierarchy, debugging with indices is much easier than reading pointers (easier to draw entire hierarchy on paper and setting break points). using Memory Context class to store all those vectors that would otherwise be created as a local variables in the functions. Of course, these memoryContext objects should be reused (and passed by reference) so memory allocations won't happen on every use but only on first few usages. If you have a thread pool, have one of these objects created for every thread. Be careful with recursive functions. don't use sets or lists - they are 10-20 times slower than vectors. Migrate all your helper libraries to use vectors for internal storage (RadixTries, PatriciaTries, Dictionaries, Octrees, Radix Sort functions, geometry processing functions…) SortForEfficiency is a common function in my helper libraries. Always pass by reference. Don't return vectors, but pass them as input-output parameter. That way memory pre-allocation will be maintained. use JSON for storage and clipboard copy paste. It is simple and it will maintain compatibility within different versions of application and anyone can code plugins in any language they want. Nobody has to waste time with writing custom DSL parser every time. If you are very lazy, you can even just dump those vectors in a file and hope that format won't change but I wouldn't rely on that. greedy thread pool. If it isn't faster than OpenMP you haven't created a fast thread pool. In all my use cases I only needed a thread pool where all the threads run the same task so I've actually omitted single threaded type of task in the latest implementation. It is kind of like async but more. There are 3 levels of task priority and only first one is non blocking. If bottleneck happens all the free threads can help the ones that are still working by preforming sub tasks (there are macros that I use in all of my code that will do parallel processing if thread pool is present or just single core if not). One thing, if some thread submits a task to queue and manages to finish it before anyone helped - that is OK - there won't be any penalty like with OpenMp. I rarely use inheritance and virtual functions but good example of it is the GUI system (it is highly efficient and packs one integer per graphical element instead of entire geometry of 4 vertices which is 28 times bigger like with some other libraries). So, in some cases OOD is more elegant than DOD but I mostly prefer DOD. OOP example is using virtual functions for OnCreate and OnUpdate on your engine's objects. On the other hand, if you use DOD and composition, these function can just be properties of the object. If they are set then call them, otherwise do the default functionality. not using single file per class. I like to keep things that make sense together grouped. I also prefer header only libraries for smaller things that won't change much in the project (many times these classes are templated so there isn't much choice). Bigger classes that are updated frequently should have both H and CPP files. That is it. Almost no memory management and very fast code (for certain types of project where most of the objects is mostly static). Edit: edited for clarity.

General and Gameplay Programming Programming

Started by iko iko December 16, 2022 11:35 AM

82 comments, last by JoeJ 1 year, 11 months ago

Juliean

7,377

December 18, 2022 02:32 PM

Ok, I do have to correct the results a bit.

It wasn't 35x to 72x, but 70x to 72x, in debug. While this also shows a clear impact on the applications overall performance, its not as good as it originally seemed :(
I also forgot one important step: In the old brute-force algorithm, I could still check for the masks early on to skip some potential matched, and even remove some reported collisions, which would result in (empty) script-calls. Doing that, now the brute-force is catching up at halfway, reaching 71x.

So at those numbers, it would be important to actually measure the timings of just those two algorithms under different Ns, to get a better feeling for how much the binning actually wins. Which I'll probably do later, also have to work on the game in between

JoeJ

4,359

December 18, 2022 03:52 PM

Juliean said:
Wuut? Are you sure that you didn't introduce some suble error in the algorithm for eigther case? I mean, it does find the same number of elements, but even I would not have expected this to happen for that many objects.

No. didn't change anything.

Juliean said:
Alternatively, maybe its because with that many elements, the cells all become “satured”

No, shouldn't be.

But likely yes. I will do some statistics on the grid to see if there is some bug.
Likely there is a bug.
I was already confused about having written all the stuff (excluding BVH) in short time and one go, and there was no little problem i had to mess around with.

JoeJ

4,359

December 18, 2022 04:03 PM

Yes! All 5000 objects go into one single cell. Single digit counts in two other cells. So my math is wrong. All my grid results were bullshit too.

I'll edit posts, fix the bug, and post new results later…

/:-\

JoeJ

4,359

December 18, 2022 09:41 PM

Juliean said:
The code is here: https://pastebin.com/gJRJ3M0V

static constexpr bool MASK[NUM_TYPES][NUM_TYPES] =
                {
                    {true, true, true, true, true},     // ALL

um, does this give the same performance as int all = 0b11111, and old school bit math?

Your code really makes me feel old. I can't read it. It's too fancy modern. <: )

My bug was easy to find. I have transformed the point to grid space, but then accidentally used the original point to get grid coordinates.
After that, the usual suspects of one off errors and boundary conditions came up. So i felt at home with damn grids. But world is sane again. Perf. is now as expected.

But now i have the other problem i forgot about: With grid methods it's no longer guaranteed A finds B, and B also finds A. So i can't simply compare address to find out if the collision is already processed.
I will add an index per object, and use N^2 bits for a look up. This sucks, but i never came up with a better solution for this issue.

And i have added a nice histogram to visualize timing results with increasing N : ) Currently grids do all very similar, 4-5 * faster than BVH, and brute force sky rockets. Still need to add the sorting method.

JoeJ

4,359

December 19, 2022 12:24 AM

Ok, so here the bugfixed results:

Sorry for the confusion from before. But the histogram is a nice compensation for that, i hope. ; ) It shows the timings from 0 - 10000 objects from left to right.
I have disabled brute force and sorting graphs after 5000 objects, because frame rate became too bad to fill histogram while dragging the slider.

The only problem is that the results are now pretty boring, without surprises. List per grid cell wins, followed by vector of objects per cell, then vector of pointers.
BVH isn't that bad, if you can't afford huge grids. Quadtree should be pretty similar i guess.

@ivica kolic Please take a look at the sorting code and propose improvement. I do not understand how to benefit from sorting by size additionally, or having both start and end point sorted. I did just a trivial search around the current object, and assume there's a some way?

Btw, in the past i have tried to sort by morton code, to get some multi-dimension improvement. But that's bad. Depending on query position, you might need to go through all objects.

@juliean I did not mind to make a local references on other static variables. Usually i never use them, only in such small test snippets.
Now i need to ask again: Does each access to static variables include a check if they are already initialized? Even every time in a loop, for example?
If so, i should for sure change this, as most data here is static vectors.

Edit: code updated with faster tiled BVH approach

		static bool testSimpleBVH = 1; ImGui::Checkbox("testSimpleBVH", &testSimpleBVH);
		if (testSimpleBVH)
		{
			// make some random objects

			static bool init = true;
			static float maxRadius = 0.025f;
			static int objectCount = 0; 
			if (ImGui::SliderInt ("objectCount", &objectCount, 0, 10000)) init = true;
			if (ImGui::SliderFloat ("maxRadius", &maxRadius, 0.001f, 0.1f)) init = true;

			static vec2 anim(0.f, 0.f); ImGui::SliderFloat2("anim", (float*)&anim, -1.f,1.f);
			
			struct Object
			{
				vec2 pos;
				float radius;
				Object *next = 0; // used only for the grid list
				int index = 0;

				Object()
				{
					pos = vec2( float(std::rand()) / float(RAND_MAX), float(std::rand()) / float(RAND_MAX));
					pos += anim * maxRadius;
					pos[0] = max(maxRadius*1.01f, min(1.f-maxRadius*1.1f, pos[0]));
					pos[1] = max(maxRadius*1.01f, min(1.f-maxRadius*1.1f, pos[1]));
					radius = (1.f + 3.f * float(std::rand()) / float(RAND_MAX)) * .25f * maxRadius;
				}
			};

			static std::vector<Object> objects;
			static std::vector<bool> pairMask;
			if (init) 
			{
				objects.resize(objectCount);
				pairMask.resize(objectCount*objectCount);
			}
			
			std::srand(1);
			for (int i=0; i<objects.size(); i++) 
			{
				auto &obj = objects[i];
				obj = Object();
				obj.index = i;
				RenderCircle(obj.radius, obj.pos, vec(0,0,1), 1,1,1);
			}



			auto Intersects = [](const Object &o0, const Object &o1)
			{
				float sqDist = vec2(o0.pos - o1.pos).SqL();
				float dist = sqrt(sqDist);
				return (dist < o0.radius + o1.radius);
			};

			auto IntersectsG = [&](const Object &o0, const Object &o1)
			{
				int i0 = o0.index;
				int i1 = o1.index;
				if (i0 == i1) 
					return false;
				if (i0 > i1) std::swap(i0, i1);
				int bitIndex = i0*objectCount + i1;
				if (pairMask[bitIndex])
					return false; // already processed
				pairMask[bitIndex] = true;

				float sqDist = vec2(o0.pos - o1.pos).SqL();
				float dist = sqrt(sqDist);
				return (dist < o0.radius + o1.radius);
			};





			// brute force
			static bool doBruteForce = 0; ImGui::Checkbox("doBruteForce", &doBruteForce);
			uint64_t timeBruteForce = ULLONG_MAX;
			if (doBruteForce && objects.size()<5000) // too slow to fill histogram with larger number
			{
				uint64_t time = SystemTools::GetTimeNS();
				timeBruteForce = time;

				const int count = objects.size();
				int collCount = 0;
				for (int i=0; i<count-1; i++)
				for (int j=i+1; j<count; j++)
				{
					const auto &o0 = objects[i];
					const auto &o1 = objects[j];
					if (Intersects(o0, o1))
						collCount++, RenderLine(o0.pos, o1.pos, 1,0,0); // appends vertices and color to a buffer
				}

				ImGui::TextColored(ImVec4(1,0,0,1), "Brute force collisions: %i time: %s", collCount, SystemTools::GetDurationStringAndRestart(time).c_str());
				timeBruteForce = time-timeBruteForce;
			}





			// grid...

			int gridDim = (int)ceil(.5f / maxRadius);
			float gridRes = float(gridDim);
			float cellSize = 1.f / gridRes;

			ImGui::Text("\nGrid resolution: %i^2 cells", gridDim); 

			auto PointToGridCoords = [&](int &x, int &y, const vec2 &p)
			{
				vec2 pg = p * gridRes;// + vec2(.5f);
				x = (int) pg[0];
				y = (int) pg[1];
				assert (x>=0 && x<gridDim &&
						y>=0 && y<gridDim);						
			};

			if (1) // debug grid
			{
				auto VisCell = [=](int x, int y)
				{
					vec2 p0 (float(x) / gridRes, float(y) / gridRes);
					vec2 p1 (float(x+1) / gridRes, float(y+1) / gridRes);
					RenderAABBox (p0, p1, 1,1,1);
				};
				RenderAABBox (vec2(0,0), vec2(1,1), 1,1,1);
				static vec2 point(0.5f, 0.5f); ImGui::SliderFloat2("point", (float*)&point, maxRadius*1.1f, 1.f-maxRadius*1.1f);
																									 
				int x, y; PointToGridCoords (x, y, point);
				VisCell (x,y);
				vec2 bmin = point - vec2(maxRadius, maxRadius);
				vec2 bmax = point + vec2(maxRadius, maxRadius);
				int minX, minY; PointToGridCoords (minX, minY, bmin);
				int maxX, maxY; PointToGridCoords (maxX, maxY, bmax);
				for (int y=minY; y<=maxY; y++)
				for (int x=minX; x<=maxX; x++)
				{
					VisCell (x,y);
				}
				RenderPoint(point, 1,0,0);
				RenderAABBox (bmin, bmax, 1,0,0);
			}



			// grid of std::vector, adding each object to multiple cells 
			static bool doGridVector = 0; ImGui::Checkbox("doGridVector", &doGridVector);
			uint64_t timeGridV = ULLONG_MAX;
			if (doGridVector)
			{
#if 0
				uint64_t time = SystemTools::GetTimeNS();
				std::vector<std::vector<Object*>> grid;
				grid.resize(gridDim*gridDim);
#else
				static std::vector< std::vector<Object*> > gridGlobal;
				if (init || !gridGlobal.size()) gridGlobal.resize(gridDim*gridDim);
				std::fill(pairMask.begin(), pairMask.end(), false); // excluding from measurement, becasue that's soemtimes ridiculously slow and should be replaced

				uint64_t time = SystemTools::GetTimeNS();
				timeGridV = time;

				auto& grid = gridGlobal;

				for (auto &list : grid) list.clear();
#endif

				for (auto &obj : objects)
				{
					int minX, minY; PointToGridCoords (minX, minY, obj.pos - vec2(obj.radius, obj.radius));
					int maxX, maxY; PointToGridCoords (maxX, maxY, obj.pos + vec2(obj.radius, obj.radius));
					assert (maxX-minX<2 && maxY-minY<2);

					for (int y=minY; y<=maxY; y++)
					for (int x=minX; x<=maxX; x++)
					{
						grid[x + y*gridDim].push_back(&obj);
						//VisCell (x, y);
					}
				}

				ImGui::TextColored(ImVec4(0,1,0,1), "Build grid of vectors time: %s", SystemTools::GetDurationStringAndRestart(time).c_str());
				
				int collCount = 0;
				for (const auto &o0 : objects)
				{
					int x, y; PointToGridCoords (x, y, o0.pos);
					auto &list = grid[x + y*gridDim];
					for (const auto *o1 : list)
					{
						if (IntersectsG(o0, *o1))
							collCount++, RenderLine(o0.pos, o1->pos, 0,1,0);
					}
				}

				ImGui::TextColored(ImVec4(0,1,0,1), "Found collisions: %i time: %s", collCount, SystemTools::GetDurationStringAndRestart(time).c_str());
				timeGridV = time - timeGridV;
			}


			// grid of std::vector, adding a copy of each object to multiple cells 
			static bool doGridVectorCopy = 0; ImGui::Checkbox("doGridVectorCopy", &doGridVectorCopy);
			uint64_t timeGridVC = ULLONG_MAX;
			if (doGridVectorCopy)
			{

				static std::vector< std::vector<Object> > gridGlobal;
				if (init || !gridGlobal.size()) gridGlobal.resize(gridDim*gridDim);
				std::fill(pairMask.begin(), pairMask.end(), false);

				uint64_t time = SystemTools::GetTimeNS();
				timeGridVC = time;

				auto& grid = gridGlobal;

				for (auto &list : grid) list.clear();


				for (auto &obj : objects)
				{
					int minX, minY; PointToGridCoords (minX, minY, obj.pos - vec2(obj.radius, obj.radius));
					int maxX, maxY; PointToGridCoords (maxX, maxY, obj.pos + vec2(obj.radius, obj.radius));
					assert (maxX-minX<2 && maxY-minY<2);

					for (int y=minY; y<=maxY; y++)
					for (int x=minX; x<=maxX; x++)
					{
						grid[x + y*gridDim].push_back(obj);
					}
				}

				ImGui::TextColored(ImVec4(0,0,1,1), "Build grid of copy vectors time: %s", SystemTools::GetDurationStringAndRestart(time).c_str());

				if (0) // grid statistics
				{
					int sum = 0;
					int maxCount = 0;
					for (auto &list : grid)
					{
						sum += list.size();
						maxCount = max(maxCount, list.size());
						//ImGui::Text("  size: %i", list.size());

					}
					ImGui::Text("  sum of all counts: %i", sum);
					ImGui::Text("  average count per cell: %f", float(sum) / float(grid.size()));
					ImGui::Text("  max count: %i", maxCount);

					uint64_t time = SystemTools::GetTimeNS();
					timeGridVC = time - timeGridVC;
				}

				
				int collCount = 0;
				for (const auto &o0 : objects)
				{
					int x, y; PointToGridCoords (x, y, o0.pos);
					auto &list = grid[x + y*gridDim];
					for (const auto &o1 : list)
					{
						if (IntersectsG(o0, o1))
							collCount++, RenderLine(o0.pos, o1.pos, 1,1,1);
					}
				}

				ImGui::TextColored(ImVec4(0,0,1,1), "Found collisions: %i time: %s", collCount, SystemTools::GetDurationStringAndRestart(time).c_str());
				timeGridVC = time - timeGridVC;
			}


			// grid of lists, adding each object to only one cell 
			static bool doGridList = 0; ImGui::Checkbox("doGridList", &doGridList);
			uint64_t timeGridL = ULLONG_MAX;
			if (doGridList)
			{
				static std::vector<Object*> grid;
				if (init || !grid.size()) grid.resize(gridDim*gridDim);
				std::fill(pairMask.begin(), pairMask.end(), false);

				uint64_t time = SystemTools::GetTimeNS();
				timeGridL = time;

				std::fill (grid.begin(), grid.end(), nullptr);

				for (auto &obj : objects)
				{
					int x, y; PointToGridCoords (x, y, obj.pos);
					Object* &cell = grid[x + y*gridDim];

					obj.next = cell;
					cell = &obj;					
				}

				ImGui::TextColored(ImVec4(1,0,1,1), "Build grid of lists time: %s", SystemTools::GetDurationStringAndRestart(time).c_str());
				
				int collCount = 0;
				for (const auto &o0 : objects)
				{
					int minX, minY; PointToGridCoords (minX, minY, o0.pos - vec2(o0.radius, o0.radius));
					int maxX, maxY; PointToGridCoords (maxX, maxY, o0.pos + vec2(o0.radius, o0.radius));

					for (int y=minY; y<=maxY; y++)
					for (int x=minX; x<=maxX; x++)
					{
						Object *o1 = grid[x + y*gridDim];
						while (o1)
						{
							if (IntersectsG(o0, *o1))
								collCount++, RenderLine(o0.pos, o1->pos, 1,1,1);
							o1 = o1->next;
						}
					}
				}

				ImGui::TextColored(ImVec4(1,0,1,1), "Found collisions: %i time: %s", collCount, SystemTools::GetDurationStringAndRestart(time).c_str());
				timeGridL = time - timeGridL;
			}


			
			// BVH
			static bool doBVH = 1; ImGui::Checkbox("doBVH", &doBVH);
			uint64_t timeBVH = ULLONG_MAX;
			if (doBVH)
			{
				std::fill(pairMask.begin(), pairMask.end(), false); // needed only for the tiling approach

				// we use an axis aligned bounding box for our bounding volume
			
				struct AABox 
				{
					vec2 minmax[2];
	
					void Init ()
					{
						minmax[0] = vec2 (FLT_MAX, FLT_MAX);
						minmax[1] = vec2 (-FLT_MAX, -FLT_MAX);
					}

					void Extend (const AABox &b)
					{
						minmax[0][0] = min (minmax[0][0], b.minmax[0][0]);
						minmax[0][1] = min (minmax[0][1], b.minmax[0][1]);

						minmax[1][0] = max (minmax[1][0], b.minmax[1][0]);
						minmax[1][1] = max (minmax[1][1], b.minmax[1][1]);
					}

					vec2 Center () const
					{
						return (minmax[0] + minmax[1]) * 0.5f;
					}

					bool BoundsPoint (const vec2 point) const
					{
						for (int i=0; i<2; i++)
						{
							if (point[i] < minmax[0][i] || point[i] > minmax[1][i])
								return false;
						}
						return true;
					}

					bool IntersectsBox (const AABox &box) const
					{
						for (int i=0; i<2; i++)
						{
							if (box.minmax[1][i] < minmax[0][i] || box.minmax[0][i] > minmax[1][i])
								return false; // disjoint
						}
						return true; // intersecting
					}

					static AABox& Empty()
					{
						static AABox empty = {vec2 (FLT_MAX, FLT_MAX), vec2 (-FLT_MAX, -FLT_MAX)};
						return empty;
					}
				};



				// BVH

				struct BVHNode
				{
					AABox bound;
					int firstChild; // if negative, it's an index to a triangle
					int childCount;
				};

				// build BVH

				uint64_t time = SystemTools::GetTimeNS();
				timeBVH = time;

				static int levels = 5; ImGui::SliderInt ("levels", &levels, 1, 10);

				// create leaf per object, and make them all children of the root

				std::vector<BVHNode> leafNodes (objects.size());
				std::vector<BVHNode> tree (1);

				constexpr int rootIndex = 0;
				BVHNode &root = tree[rootIndex];
				root.bound.Init();
				for (int i=0; i<(int)objects.size(); i++)
				{
					Object &obj = objects[i];
					BVHNode &leaf = leafNodes[i];
					leaf.bound.minmax[0] = obj.pos - vec2(obj.radius, obj.radius);
					leaf.bound.minmax[1] = obj.pos + vec2(obj.radius, obj.radius);
					leaf.firstChild = -1 - i;
					leaf.childCount = 0;

					// extend root box
					root.bound.Extend(leaf.bound);
				}
				root.firstChild = 0;
				root.childCount = objects.size();
				//RenderAABBox (root.bound.minmax[0], root.bound.minmax[1], 1,0,0);

				// build internal hierarchy

				std::vector<int> stack;
				stack.push_back(rootIndex); // add root to process
				int head = 0;
				int tail = 1;

				std::vector<BVHNode> reorderedLeafs (objects.size()); // temporary buffer storing leafs in sorted order

				for (int level=0; level<levels; level++) // for each iteration adding one new level of hierarchy
				{
					int sortedIndex = 0; // we use this to reorder leafs to the newly sorted state after adding the level
					for (int i=head; i<tail; i++) // process all nodes of the current level, now becoming parents to subdivide
					{
						int parentIndex = stack[i];
						BVHNode &parent = tree[parentIndex];
						vec2 parentCenter = parent.bound.Center();

						// we make a simple BVH4, so subdividing children into quadrants from the center of the parent bound
						int qCount[4] = {0,0,0,0}; // for each quadrant, count how many leafs are in
						AABox qBox[4] = {AABox::Empty(), AABox::Empty(), AABox::Empty(), AABox::Empty()}; // for each quadrant make a new bound

						int firstChild = parent.firstChild;
						int childCount = parent.childCount;
						for (int n=0; n<childCount; n++)
						{
							int leafIndex = firstChild + n;
							BVHNode &leaf = leafNodes[leafIndex];

							vec2 diff = leaf.bound.Center() - parentCenter;
							int quadrant = (diff[0] > 0) | ((diff[1] > 0)<<1);
						
							leaf.childCount = (qCount[quadrant]<<2) | quadrant; // we alias this variable to store the leafs new subindex + quadrant

							qCount[quadrant]++;
							qBox[quadrant].Extend(leaf.bound);
						}

						// create new children from quadrants
						int firstQuadrantIndex[4];
						int newFirstChild = -1;
						int newChildCount = 0;
						for (int q=0; q<4; q++) if (qCount[q])
						{
							int newChildIndex = (int)tree.size();
							if (newFirstChild == -1) 
								newFirstChild = newChildIndex;
							newChildCount++;

							tree.push_back(BVHNode());
							BVHNode &newChild = tree.back();

							newChild.bound = qBox[q];
							newChild.childCount = qCount[q];
							newChild.firstChild = sortedIndex;

							firstQuadrantIndex[q] = sortedIndex;
							sortedIndex += qCount[q];

							stack.push_back(newChildIndex); // refine further
						}
						tree[parentIndex].firstChild = newFirstChild;
						tree[parentIndex].childCount = newChildCount;

						// iterate children again, to calculate their final newly sorted index and reordering them accordingly
						for (int n=0; n<childCount; n++)
						{
							int leafIndex = firstChild + n;
							BVHNode &leaf = leafNodes[leafIndex];

							int quadrant = (leaf.childCount & 3);
							int subIndex = (leaf.childCount)>>2;
							int newIndex = firstQuadrantIndex[quadrant] + subIndex;
							assert(newIndex < (int)leafNodes.size());
							//leaf.childCount = newIndex; // alias again to store the leafs newly sorted index
							reorderedLeafs[newIndex] = leaf;
						}
					}

					// reorder the leafs
					/*for (int i=0; i<(int)leafNodes.size(); i++)
					{
						BVHNode &leaf = leafNodes[i];
						int newIndex = leaf.childCount;
						assert(newIndex < (int)reorderedLeafs.size());
						reorderedLeafs[newIndex] = leaf;
					}*/

					std::swap(reorderedLeafs, leafNodes);

					head = tail;
					tail = (int)stack.size();
					//ImGui::Text("build level %i completed.  head %i  tail %i", level, head, tail);
				}

				// add leafs to tree

				int offset = (int)tree.size();
				std::copy(leafNodes.begin(), leafNodes.end(), std::back_inserter(tree));
				for (int i=head; i<tail; i++)
				{
					int parentIndex = stack[i];
					BVHNode &parent = tree[parentIndex];
					parent.firstChild += offset;
				}

				ImGui::TextColored(ImVec4(1,1,0,1), "Build BVH time: %s", SystemTools::GetDurationStringAndRestart(time).c_str());

				// visually proof sorted order from color pattern
				vec2 prevCenter;
				if (0) for (int i=0; i<(int)leafNodes.size(); i++)
				{
					BVHNode &leaf = leafNodes[i]; 
					float t = float(i) / float(leafNodes.size()) * 0.66f;
					vec2 center = leaf.bound.Center();
					RenderPoint(center, Vis::RainbowR(t), Vis::RainbowG(t), Vis::RainbowB(t));
					if (i) RenderLine(center, prevCenter, Vis::RainbowR(t), Vis::RainbowG(t), Vis::RainbowB(t));
					prevCenter = center;
				}


#if 0			// collision detection - traversal per object

				int collCount = 0;
				for (const auto &o0 : objects)
				{
					AABox query;
					query.minmax[0] = o0.pos - vec2(o0.radius, o0.radius);
					query.minmax[1] = o0.pos + vec2(o0.radius, o0.radius);

					stack.clear();
					stack.push_back(rootIndex);
					head = 0;
					tail = 1;

					for (int level=0; level<=levels; level++) // each iteration traverses one level of the hierarchy in breadth first order
					{
						for (int i=head; i<tail; i++) // traverse all nodes of the current level
						{
							int parentIndex = stack[i];
							const BVHNode &parent = tree[parentIndex];
					
							int firstChild = parent.firstChild;
							int childCount = parent.childCount;
							for (int n=0; n<childCount; n++)
							{
								int childIndex = firstChild + n;
								const BVHNode &child = tree[childIndex];
								if(!child.bound.IntersectsBox(query))
									continue;

								//float t = float(level) / float(levels) * 0.66f;
								//RenderAABBox (child.bound.minmax[0], child.bound.minmax[1], Vis::RainbowR(t), Vis::RainbowG(t), Vis::RainbowB(t));

								if (level<levels)
									stack.push_back(childIndex);
								else
								{
									int objectIndex = -child.firstChild -1;
									const Object &o1 = objects[objectIndex];
									if (&o0 < &o1 && Intersects(o0, o1))
										collCount++, RenderLine(o0.pos, o1.pos, 1,1,1);
								}
							}
						}

						head = tail;
						tail = (int)stack.size();
					}
				}

				ImGui::TextColored(ImVec4(1,1,0,1), "Found collisions: %i time: %s", collCount, SystemTools::GetDurationStringAndRestart(time).c_str());
				timeBVH = time - timeBVH;

#else			// collision detection - traversal per brute force tile

				std::vector<Object> objectRange;
				static int tiles = 8; ImGui::SliderInt("tile res", &tiles, 4, 20);
				int collCount = 0;

				for (int y=0; y<tiles; y++)
				for (int x=0; x<tiles; x++)
				{
					AABox query;
					query.minmax[0] = vec2(float(x) / float(tiles), float(y) / float(tiles));
					query.minmax[1] = vec2(float(x+1) / float(tiles), float(y+1) / float(tiles));
					//RenderAABBox(query.minmax[0],query.minmax[1], 0,1,0);					
					objectRange.clear();

					stack.clear();
					stack.push_back(rootIndex);
					head = 0;
					tail = 1;

					for (int level=0; level<=levels; level++) // each iteration traverses one level of the hierarchy in breadth first order
					{
						for (int i=head; i<tail; i++) // traverse all nodes of the current level
						{
							int parentIndex = stack[i];
							const BVHNode &parent = tree[parentIndex];
					
							int firstChild = parent.firstChild;
							int childCount = parent.childCount;
							for (int n=0; n<childCount; n++)
							{
								int childIndex = firstChild + n;
								const BVHNode &child = tree[childIndex];
								if(!child.bound.IntersectsBox(query))
									continue;

								//float t = float(level) / float(levels) * 0.66f;
								//RenderAABBox (child.bound.minmax[0], child.bound.minmax[1], Vis::RainbowR(t), Vis::RainbowG(t), Vis::RainbowB(t));

								if (level<levels)
									stack.push_back(childIndex);
								else
								{
									int objectIndex = -child.firstChild -1;
									objectRange.push_back(objects[objectIndex]);
								}
							}
						}

						head = tail;
						tail = (int)stack.size();
					}

					//ImGui::Text("o: %i", objectRange.size());

					const int count = objectRange.size();
					for (int i=0; i<count-1; i++)
					for (int j=i+1; j<count; j++)
					{
						const auto &o0 = objectRange[i];
						const auto &o1 = objectRange[j];
						if (IntersectsG(o0, o1))
							collCount++, RenderLine(o0.pos, o1.pos, 1,1,0); // appends vertices and color to a buffer
					}
					
				}

				ImGui::TextColored(ImVec4(1,1,0,1), "Found collisions: %i time: %s", collCount, SystemTools::GetDurationStringAndRestart(time).c_str());
				timeBVH = time - timeBVH;
#endif
			}



			// Sorting per dimension
			static bool doSort = 0; ImGui::Checkbox("doSort", &doSort);
			uint64_t timeSort = ULLONG_MAX;
			if (doSort && objects.size()<5000) // Ivica Kolic, https://www.gamedev.net/forums/topic/713512-what-kind-of-coding-standard-do-you-prefer/5453647/?page=4
			{ 
				std::fill(pairMask.begin(), pairMask.end(), false);

				std::vector<int> _sortedObjects[2]; // For each dimension we have object indices sorted by start position. This is our acceleration structure 
				// Tmp stuff to be persisted in a permanent memory context object just so we don't allocate these local vectors all the time
				std::vector<float> objectKeys; // tmp vector (but reuse vector for every time) - I should ommit this and just use objects with lambda for specifying sorting key (which can be start position, end position or size).
				std::vector<int> swapBuffer; // Nice to have persisted so that radix sort doesn't have to use its own memory allocations on each use

				_sortedObjects[0].resize(objects.size());
				_sortedObjects[1].resize(objects.size());

				// Let's update accelleration structure:

				objectKeys.reserve(objects.size());
				swapBuffer.resize(objects.size()); // Usefull preallocated tmp memory


				uint64_t time = SystemTools::GetTimeNS();
				timeSort = time;

				// For each dimension we will first sort by size (or maybe end position)
				for (int dim = 0; dim < 2; ++dim)
				{
					// // Sort by size:
					// objectKeys.clear();
					// //for (auto& o : objects) objectKeys.push_back(o.GetMax()[dim] - o.getMin()[dim]); // This should be lambda but...
					// for (auto& o : objects) objectKeys.push_back(o.radius * 2.f); // This should be lambda but...
					// RadixSortIndexed<float, int>(_sortedObjects[dim].data(), objectKeys.data(), objects.size(), true, swapBuffer.data());
	
					// Sort by start pos:
					objectKeys.clear();
					//for (auto& o : objects) objectKeys.push_back(o.GetMin()[dim]); // This should be lambda but...
					for (auto& o : objects) objectKeys.push_back(o.pos[dim]/* - o.radius*/); // This should be lambda but...
					RadixSortIndexed<float, int>(_sortedObjects[dim].data(), objectKeys.data(), objects.size(), true, swapBuffer.data());
				}

				ImGui::TextColored(ImVec4(0,1,1,1), "Sorting time: %s", SystemTools::GetDurationStringAndRestart(time).c_str());

				//int D=1;
				//for (int i=1; i<_sortedObjects[D].size(); i++)
				//{
				//	int i0 = _sortedObjects[D][i-1];
				//	int i1 = _sortedObjects[D][i];
				//	RenderLine(objects[i0].pos, objects[i1].pos, 1,0,1);
				//}

				int collCount = 0;
				for (int D=0; D<2; D++) // instead proposed binary search to find an onject in selscted dimension sort sequence, iterate per dimansion to spare the search
				{
					const auto &seq = _sortedObjects[D];
					const int seqSize = seq.size();
					for (int i=0; i<seqSize; i++)
					{
						int sortedI0 = seq[i];
						const auto &o0 = objects[sortedI0];

						// select dimension
						int d = ((12829 * o0.index)>>3) & 1; // Should choose the shorter dimension of object bounding box, but since using spheres, we use a random dim. 
															// Density histogram might be better than object box, but does not make sense here either.
						if (d!=D) 
							continue; // process only if choice matches

						// search for potential collision range in sorted sequence

						float rangeMin = o0.pos[D] - o0.radius - maxRadius;					
						int rangeI = i;
						while (--rangeI >= 0)
						{
							int sortedI1 = seq[rangeI];
							const auto &o1 = objects[sortedI1];
							if (o1.pos[D] < rangeMin)
								break;

							if (IntersectsG(o0, o1))
								collCount++, RenderLine(o0.pos, o1.pos, 0,1,1);
						}

						float rangeMax = o0.pos[D] + o0.radius + maxRadius;
						rangeI = i;
						while (++rangeI < seqSize)
						{
							int sortedI1 = seq[rangeI];
							const auto &o1 = objects[sortedI1];
							if (o1.pos[D] > rangeMax)
								break;

							if (IntersectsG(o0, o1))
								collCount++, RenderLine(o0.pos, o1.pos, 0,1,1);
						}
					}
				}

				ImGui::TextColored(ImVec4(0,1,1,1), "Found collisions: %i time: %s", collCount, SystemTools::GetDurationStringAndRestart(time).c_str());
				timeSort = time - timeSort;
			}

			// histogram of timings
			{
				constexpr int R = 200;
				static std::vector< std::array<uint64_t,6> > histogram(R, std::array<uint64_t,6>({ULLONG_MAX,ULLONG_MAX,ULLONG_MAX,ULLONG_MAX,ULLONG_MAX,ULLONG_MAX}));
				int i = min(R-1, objects.size() * (R-1) / 10000);
				ImGui::Text("hbin: %i", i);
				histogram[i][0] = min(histogram[i][0], timeBruteForce);
				histogram[i][1] = min(histogram[i][1], timeGridV);
				histogram[i][2] = min(histogram[i][2], timeGridVC);
				histogram[i][3] = min(histogram[i][3], timeGridL);
				histogram[i][4] = min(histogram[i][4], timeBVH);
				histogram[i][5] = min(histogram[i][5], timeSort);

				static float scale = 1.f; ImGui::DragFloat("histogram scale", &scale, 0.01f);
				for (int j=1; j<R; j++) 
				{
					float t0 = float(j-1) / (R-1);
					float t1 = float(j) / (R-1);
					for (int i=0; i<6; i++)
					{
						float k0 = float(histogram[j-1][i]) * 0.00000002f * scale;
						float k1 = float(histogram[j][i]) * 0.00000002f * scale;
						RenderLine (vec2(t0+1, k0), vec2(t1+1, k1), i==0||i==3||i==4, i==1||i==4||i==5, i==2||i==5||i==3);
					}
				}
			}

			//if (objectCount<10000) objectCount+=10;
			init = 0;
		}

JoeJ

4,359

December 19, 2022 12:45 AM

Oh, i forgot to adjust BVH tree levels with higher object counts. This isn't calculated automatically here.

So it does a bit better:

iko iko

Author

104

December 19, 2022 08:03 AM

JoeJ said:
@ivica kolic Please take a look at the sorting code and propose improvement. I do not understand how to benefit from sorting by size additionally, or having both start and end point sorted. I did just a trivial search around the current object, and assume there's a some way?

Sorting by size makes no sense here. I've used when sorting triangles of a mesh where many triangles have the same start position (start vertex) and if I exclude the largest one (left search) I can skip all the triangles with the same starting position since I know that they are shorter. The problem with the Radix code that I've given you is that it resets indices at every sort (to 0, 1 , 2, 3, 4) so there is no pointing in double sorting.

In any case, I think you've proven that this isn't good data structure for physic engine. It is good enough for triangle meshes for finding all triangles within some bounding box (it handles long triangles better than octrees (except if you decimate triangles which takes too long).

Juliean

7,377

December 19, 2022 08:50 AM

JoeJ said:
@juliean I did not mind to make a local references on other static variables. Usually i never use them, only in such small test snippets. Now i need to ask again: Does each access to static variables include a check if they are already initialized? Even every time in a loop, for example?

Well I did just test it in godbolt, and in my simple examples I couldn't reproduce a difference (in generated ASM). So eigther compiler got smarter about this, or you need a more complicated examples for this to make a difference.

JoeJ said:
um, does this give the same performance as int all = 0b11111, and old school bit math?

Interesting question. I did do another quick comparison in godbolt, using std::bitset for the integer-math (this is ought to get optimized to the same as integer-math you suggested, but feel free to add this yourself in the link, I just don't feel doing that :P):

https://godbolt.org/z/P54vbW5s1

There

JoeJ

4,359

December 19, 2022 08:54 AM

I have added a tiled approach for the BVH. Because grids are rarely an option in 3D, i want to show BVH can do nearly as good.

It's a very simple apporach: Instead doing a traversal for each object, i tile the space and do one traversal per tile. Copies of found objects are then processed with brute force efficiently.
This could be improved in many ways, but actually i think a sparse grid would be generally better for the problem.

The actual timings on the right are for 10000 objects on a single core, so we can estimate what's about practical. Ofc. the scene is much too dense to make sense, as collision count is 16x higher than object count, but still.

But now i must stop this, and go back to real work… :D

Will update the code in the former post.

Juliean

7,377

December 19, 2022 12:57 PM

JoeJ said:
The actual timings on the right are for 10000 objects on a single core, so we can estimate what's about practical. Ofc. the scene is much too dense to make sense, as collision count is 16x higher than object count, but still. But now i must stop this, and go back to real work… :D Will update the code in the former post.

If you ever end up having time to revisit this again, I'd be interested in one last thing. If you removed the “list” linking inside the objects themself and place them in a separate structure (to simulate the example that I mentioned where you can't store such a link in the objects themselves), how much does that affect performance for the list-case? No hurry, just something I'm curious about.

----------------------------------------------

I have now finished my own implementation of the bucket-sorted collision masks, and well, its so-and-so.

I did end up changing a few things about it: I did cache the buckets, using my already existing entity-cache dirty notification mechanism. This still creates it a few times too often then really need be, because there are cases where the cache gets flagged dirty and those buckets wouldn't need to. But this is still something that could be improved in the future. In general, this seems to be faster than recreating the buckets each frame, even if I now have to calculate some data more times during the actual collision check. Also, this change was sorta necessary for the integration of this system for actual collision, as there are multiple points in a frame where the actual content of the buckets could need to change.
I also ended up creating an array<vector> for seeing which buckets need to be checked after all - it seems to be at least as fast, but requires less if/nestings. Also somelike this would be required if I started to un-hardcode the settings and allow it to be configured via the editor, as for what can collide with what.

Debug-performance for running the tests did improve to something like 70x→75x for complicated recordings, which is definately a nice thing. cuts the time needed to run a 17 minute recording down by a few seconds, which is neat I guess (no, really, this does matter in practice as I run the editor mainly in debug, and run those tests often).

Now for release, after everything is done, I cannot really say that anything has improved. Differences in timings for executing the full release-recording between the old brute-force and the new version are all in a range with enough fluctuation between runs, that I cannot really say eigther one is faster. At least the new system is not noticable slower, but it doesn't seem to improve much for this case. Probably because bottlenecks are on other places. At a relative speed of 10 seconds per frame(!), there are lot of switches between scenes happening each frame, meaning I might be locked by those. Would need to profile to find out more.

Unfortunately I introduced a bug somewhere in the legacy-code, so I cannot easily test the case of “mask, but no buckets” anymore. But I'd guess its somewhere in the middle. Certain parts of the code already didn't really execute in O(N^2), due to most entities being flagged “inactive” and thus not running their own pass through the array at all.

One case that did noticably improve is the “how many player-characters can I have in a scene?"-benchmark I run every few months. That one is a bit more interesting (albeit for now purely hypothetical in nature for now).
With the new system, I can have about 4096 “player” objects in the scene while running at 14ms (in release), without any additional modifications. Previously, I could have around the same, but I had to deactivate collision/triggers on all of them, otherwise I'd end up with 400ms. This is a little bit "cheated", because well, by the way the new system works, no player needs to collide with another (this is by design), and since no buckets have to be recreated, the runtime-overhead of collision/triggers are now, well, 0. Though it is also a nice application of this feature, and kind of reflects what this system (in practice) does for my game (even though speedups are not that apparent). In the game, most things only need to check collision with a few things. Player needs to trigger enemies (and vice versa), projectiles only need to trigger eigther player or enemy, depending on who shoots them, etc… So for my max-load object counts, with ~100 projectiles and one player, this means that I only need to do one check for each projectile.
Additional clarifiction: The player-objects are complex, in fact the most complex object of the game. They have a dynamic ARPG-controller with lots of different actions/interactions which, in this implementation at least, require lots of additional checks/queries to be executed each frame, via custom scripts.

So, maybe a grid could bring this even further down, but since we already went from O(N^2) to a realistic O(N) with minimal updates to the buckets, with no real-world improvement… I don't think I'm going to go further. Grid would be more complicated, we would need to mark it dirty many more times when entities changes their properties or transform. I'm still kind of on the fence about if it was “worth” it. I guess I can now judge the performance of my engine with my player-object overload-test without having to turn off specific systems, which is nice. And in a few years down the line, those seconds I save on running tests might make up for the time I had to spend changing this ? And maybe I'll be doing a 2d-game with more objects in eve more years, who knows. At least I now got a (theoretically configurable) mask for excluding collisions for specific types, which is something that I was lacking.
Though I still stand that in terms of if it was really worth to implement the buckets per se, no - even though they should yield a reduction of complexity for this specific case that is similar to what a grid would do in other scenarios. The gains are to marginal to really justify investing any time at all, if I have to be blunt.
Because even if the individual collision-systems now were infinitely more effective, its still just a small part of a complex system. Sure, the cap of how many objects can be had (at least assuming they don't need to collide with each other) has now increased by a large margin, as been shown in my synthetical test. But even then, doubling the objects would still barely run the required fromrate due to other overhead in scripts, etc… So its not like the gigant difference of x32 in execution speed that I got from going bytecode, or the x2-5 additional speed I got from my JIT- though those also took way more than just two half-days of work to complete ?

What kind of coding standard do you prefer

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

What kind of coding standard do you prefer

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines