Advertisement

How do you manage yourself and your time as years pass on?

Started by March 31, 2024 11:29 PM
48 comments, last by taby 8 months, 2 weeks ago

JoeJ said:
A decade ago, STL for gamedev was considered a no-go due to bad performance. Has this changed? And if so, why? Is current HW so powerful we can just waste it? Seems not, as performance issues are seemingly the main concern and critique on recent games.

I don't work directly in games industry, but adjacent to it, but I'd guess that still holds true today. The main issues with STL are design-related and unlikely to be fixed. The main problems I've seen are:

  • Exceptions are used everywhere, which bloat binary and have other downsides (code can get slower because less fits in the instruction cache).
  • Support for custom allocators for containers is not ideal. It's not easy to have e.g. a vector that uses a custom generic allocator that allocates 32-byte aligned memory for AVX, primarily because the allocator type must be templated by the type it allocates (terrible design choice), rather than a void* interface similar to malloc()/free().
  • Inconsistency in the implementation and performance on different platforms. You can't say for sure you are getting an optimal implementation, you have to trust the platform SDK to do the right thing, which is often not the case. Example: std::vector growth factor may be 2 or 1.5, depending on platform.
  • Bad specifications, e.g. std::deque which is hamstrung by the spec to be inefficient.
  • wchar_t/char nonsense depending on platform. They should have mandated UTF8 everywhere as the default, and provided built-in conversion to UTF16, UTF32, ASCII.
  • std:atomic is non-copyable for no good reason. This drives me nuts because it requires you to write custom copy constructors everywhere, e.g. just to use a class containing std::atomic within a std::vector. It's cancerous.

For these reasons and others I have written my own “standard library” over the years which avoids these issues. I have the benefit that I don't have to stick to specification, and can make improvements to the design at will, without compatibility worries. This iteration process has produced a very nice code base, since I can learn from STL mistakes.

It saddens me to see programmers become lazy and write code in less optimal ways as they become more disconnected from what the hardware is doing. There is definitely a trend in software towards bloat (games are probably not affected so much due to tight constraints). Compare a program (e.g. built-in calculator on MacOS). On my 2013 personal machine with 10 year old OS, the calculator opens in about 0.3 seconds, while my newer faster work laptop it takes 1 second, and then it's not even responsive for another 1 second after that. There definitely seems to be a tendency for programmers to become lazy, add unnecessary abstractions and layers and bloat. Hardware becomes faster, and programmers loosen their belts some more. At least this creates an opening for high-performance software which is many times more responsive than the majority.

Aressera said:
Support for custom allocators for containers is not ideal. It's not easy to have e.g. a vector that uses a custom generic allocator that allocates 32-byte aligned memory for AVX, primarily because the allocator type must be templated by the type it allocates (terrible design choice), rather than a void* interface similar to malloc()/free().

I originally wanted to state to @joej that you can have custom allocators for STL containers - but as you wrote, it is a pain.

In one of the projects we had to use memory arenas (memory fragmentation could become problem due to how that system was designed), implementing that on top of STL was a pain (especially as it also had to interact with OpenCV and supply custom allocators there too).

On paper it looks simple, but there are edge case scenarios that make this into absolute nightmare. It worked in the end though.

Aressera said:
wchar_t/char nonsense depending on platform. They should have mandated UTF8 everywhere as the default, and provided built-in conversion to UTF16, UTF32, ASCII.

This is one of the most annoying things out there - it's not just about mandating it - but the conversion (which even changed over time - and ends in ifdef/elif/endif bloat … and there is another change with C++26).

One of the things that wasn't mentioned here is ABI and portability of C++ parameters in exported functions from library. I.e. how do you call exported function from let's say C# which has a signature like this:

__declspec(dllexport) std::map<std::string, std::vector<int>> foo(const std::vector<float>& a, std::function<void()> callback) 
{
    ...
}

The proper answer is a f*** … or different curse word - and rather use something that makes sense, like:

__declspec(dllexport) map_t* foo(float* a, size_t aSize, void (*callback)())
{
    ....
}

Where of course you will have to define some new types (like map_t). But everything is passed around as either value or address - and definitions of those are clear and obvious (and not platform/compiler/version dependent).

Using STL in exported functions make sense if and only if the only language you're calling the library from is C++ and under the same ABI (which is a valid case).

STL does make sense, has proper use cases - but shouldn't be used literally everywhere just because it exists.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Advertisement

Aressera said:
Exceptions are used everywhere, which bloat binary and have other downsides (code can get slower because less fits in the instruction cache).

Code getting slower with exceptions is absolute outdated information. x64 exception handling deals with exceptions in terms of unwind-information tables, which are installed outside of the actual function. No single additional instruction will need to be processed, unless an exception is actually thrown. Contrary, having return-registers frees and not having to if-check return values all the time will reduce code-size and conditional jumps all over the place.

Binary-size comparisons are mostly biased, because people will just write exception-free code and then turn off the feature, claiming X code size being gained, which is just due to turning of an unused feature, nstead of comparing code using exceptions VS those that don't. But sure, code using exceptions could be generally bigger - though that will mostly be code that is not touched often and thus not matter for the CPU at all - though not to a point where it's worth diregarding the feature in most cases.

Juliean said:
Code getting slower with exceptions is absolute outdated information.

I understand what you're getting at but I don't think it's that outdated. Here is a paper from 2022, a quote from section 3.2:

When enabling exceptions the fib case needs 29ms, even though not a single exception is thrown, which illustrates that exceptions are not truly zero overhead. They cause overhead by pessimizing other code.

The other test cases in that document show that the overhead when exceptions are thrown is significant.

Aside from overhead, there are other reasons to avoid exceptions. With exceptions, any function that calls a function that could throw (almost anything since allocation can throw) needs to be exception-safe, which is hard to guarantee in real-world code. You have to be constantly on alert and write code in a certain way using RAII to ensure resources are released properly. It creates great potential for bugs to occur, since error handling is non-local and hard to reason about.

Thanks guys, my uncertainties are resolved.

Juliean said:
STL “list” (if that's what you mean) is slow because it's a linked list, which in itself is slow on current hardware. You cannot practically implement a linked list that is considerably faster than what STL does. If you use an array, that's not a list, thus a different container, so you cannot compare them.

We talked about this before.
To me a list simply means pointing from one node to the next, independent of how those nodes are stored in memory. I do not require each node must be it's own allocation to call it a list.

And i rather assume we usually store our nodes in some std::vector or array, and the next pointer (or index) is just there to use the logic which lists allow us to do.
Deleting a node may or may not require to move the last node in the array to its position to avoid gaps. Such things are related to ‘memory management’, but not to 'lists'.
A container usually implements or inherits memory management functionality, but a list node is not necessarily derived from a container base object.

That said, idk how STL implementations manage memory for it's list, but to me it feels very slow.
Contrary, chasing pointers is slower than reading linear memory, but we usually do it to reduce work, so the concept of a list - following my definition of it - is not slow but usually fast.
That's what i meant.

Juliean said:
So for all intents and purposes, the std::vector is as fast as the C array.

That's clear, since it's both the same once the memory is allocated.

To make a more interesting example, we often do not know how many objects we need in advance, but we might be able to use an upper bound, allowing to use a constant sized buffer.

Let's say we want to calculate vertex normals of a mesh, the two code examples would be:

for (int vertex : mesh.vertices)
{
	std::vector<int> indices = mesh.FindAdjacentPolygonIndices(vertex);
	
	vec3 n(0.f);
	for (int poly : indices) n += polygonNormals[poly];
	vertexNormals[vertex] = n.Normalized();
}

Or:

std::array<int, 16> indices;
for (int vertex : mesh.vertices)
{
	int count = mesh.FindAdjacentPolygonIndices(indices , vertex);
	
	vec3 n(0.f);
	for (int i=0; i<count ; i++) n += polygonNormals[indices[i]];
	vertexNormals[vertex] = n.Normalized();
}

(please ignore we would not need to store adjacency indices at all - i do so just to make some example)

Now i would assume the second example should be faster because it avoids allocation in inner loop.

But that's often not the case. It varies.
So i assume the vector of indices is not freed but reused, although it goes out of scope for each iteration.
Maybe compilers are smart enough to notice the loop and only clear the vector, or OS is just automatically reallocating the same memory, idk.

I'm often puzzled by such things, but um… seems i'm too lazy to debug for memory address changes, or looking at compiled machine instructions. <:]

JoeJ said:
So i assume the vector of indices is not freed but reused, although it goes out of scope for each iteration. Maybe compilers are smart enough to notice the loop and only clear the vector, or OS is just automatically reallocating the same memory, idk.

I don't think the compiler would be smart enough to do that. Maybe the allocator could give you the same memory block back, but you still need to heap allocate and free every iteration, which will be slow. I'd suggest rewriting your example to explicitly reuse the vector and pass it by reference to the helper function:

std::vector<int> indices;
for (int vertex : mesh.vertices)
{
    indices.clear();
    mesh.FindAdjacentPolygonIndices(vertex, indices);
    
    vec3 n(0.f);
    for (int poly : indices) n += polygonNormals[poly];
    vertexNormals[vertex] = n.Normalized();
}

Then you eliminate any uncertainty that you are getting a fast code.

Advertisement

Aressera said:
I'd suggest rewriting your example to explicitly reuse the vector and pass it by reference to the helper function:

That's what i actually do (or would do - actually i just sum up while traversing adjacency).

But i often see people conveniently generating vectors as return type for example.
Hell, i have even seen vec3 implemented using std::vector for x,y,z, and the guy who did it was no idiot (otherwise).
And in other languages, people seemingly have no related worries at all.

It surely depends on what you do, but there is so much waste for no reason… ; )

JoeJ said:
But i often see people conveniently generating vectors as return type for example.

I would never do such a thing, but it's not as bad as it used to be due to move constructors and NRVO which reduce the cost. You still pay the allocation cost which is significant, so I'd always call that out in a code review and suggest more efficient alternatives. If you only call the function once it's not bad at all, but repeated calls can reuse the allocation for a big speedup.

JoeJ said:
Hell, i have even seen vec3 implemented using std::vector for x,y,z, and the guy who did it was no idiot (otherwise).

That's pure madness.

JoeJ said:

Aybe One said:
After 30 years of service, my CRT resigned… Haven't been able to fix it and these servicing shops are no more… I don't like that I'm going to have to throw it away because it's still in pretty good shape otherwise…

Maybe you would meet a guy with necessary skills in the future. My basement is full of working and broken CRTs. ; )

I really wonder why CRTs seem no longer made. So many people whine about how much better they were.
Afaik, currently they sell more vinyl records than compact discs, so here the retro tech has survived.
Maybe those CRT tubes are really hard to manufacture.

taby said:
nothing more than vector, string, and map.

But reducing to those does not stop them to use modern syntax i can not read. : )

It's a monster CRT actually, 16/9, +30Kg. Issue is the sound, once it warms up you can hear the case contracting and then it goes off. Serviced it a few times, it stopped, I then changed a lot more components but it didn't hold up. I think the PCB might have some cracks, that would be out of reach for me.

Also have a bunch of vinyl, music equipment, video games, magazines, most likely to be trashed too.
That ends up being much simpler than waiting for someone to come and pick them up.

You know, things that had high sentimental value to you but when you look at eBay they sell for pennies…

But there's one thing I'm not going to trash for sure, my Korg EMX-1, such a fantastic device :)

Seems like the original thread topic has slowly deviated to “my C++ is bigger than yours”! 🤣

In fact, all one really needs is char and a memory allocator, you can then do your own int, string, map… 😎

This topic is closed to new replies.

Advertisement