Advertisement

Custom memory allocator for STL - dead end due to constructors?

Started by June 24, 2024 02:33 PM
36 comments, last by JoeJ 5 months, 2 weeks ago

If you can, try to make your allocators stateless by making the internal storage part of the type. For example:

template<class T, my_allocator *InnerAlloc> my_std_allocator {
public:
  T* allocate(std::size_t n) {
    return static_cast<T *>(InnerAlloc->allocate(n * sizeof(T)));
  }
  // ...and so on...
};

my_allocator my_alloc;
std::vector<int, my_std_allocator<&my_alloc> > my_vector;

Of course, this requires that all the my_allocator objects have a static lifetime.

JoeJ said:
Sure, solving a memory fragmentation problem by introducing even more memory fragmentation. :D

You are using vectors, which are terrible and will re-allocate at will. Even if you use your own allocator. If you truly wrote your own allocator, then you should know how to allocate 10 std::vectors right next to each other in that allocators memory. You either new/place them in memory as you like, or you let the compiler do it on the stack as you have. Making them 10 pointers has nothing to do with memory fragmentation.

NBA2K, Madden, Maneater, Killing Floor, Sims

Advertisement

dpadam450 said:
If you truly wrote your own allocator, then you should know how to allocate 10 std::vectors right next to each other in that allocators memory.

Yes, initially. But as the data is processed, the size of the vectors change, so does their memory address. That's exactly what memory fragmentation is.

dpadam450 said:
Making them 10 pointers has nothing to do with memory fragmentation.

But i have thousands of objects, each having dozens of vectors. The objects are loaded from disk, processed and changed, saved back. After few minutes of processing, CPU utilization completely flatlines. Probably all those cores do is waiting for one core chasing pointers to find some memory to allocate.

Using an array of pointers instead an array of objects would add one level of indirection everywhere for no reason, making the problem noticeable worse.

Just saying. I know your proposal is the standard ‘solution’ to this problem, since a proper solution is not possible.
Personally i rarely use constructors for this reason. This is also the primary reason why i think ‘proper OOP’ is not possible, at least not with C++. It's a nice idea for Bjarnes books, but it does not work in practice. The promise of automated object life cycle management causes much more limitations, constraints and problems, than it solves.

I'm going to have to recommend EASTL like another did, mostly because it's already built, debugged, and freely available with a generous license. Github source.

I'd go with an eastl:::vector and a pool allocator based on what you described for your scenario.

JoeJ said:
Yes, initially. But as the data is processed, the size of the vectors change, so does their memory address. That's exactly what memory fragmentation is.

While memory fragmentation is a real concern, it's an extremely rare issue to encounter in 64-bit world, even in long-running games. The system randomizes 47-bits of addresses, with 256TB of virtual pages that can be shuffled around when the system does reallocations. You may be in a 32-bit environment without virtual memory where address-space fragmentation is a bigger concern, or other issues. So it's rare, but still exists.

The typical approach is generally allocating from a known pool. Consider boost's pool_allocator and fast_pool_allocator implementations. Like EASTL already recommended, they're already built, debugged, and freely available, Even more generous license than EASTL.

“Thousands of objects” isn't a problem, unless those objects are themselves obscenely huge or you're on limited hardware. Very often programmers are pessimistic about memory performance. Games these days throw around 4K textures, each object with 6 or more of them applied . Detailed game objects these days can have 100+ megabytes of textures as they compose complex materials properties, plus another 40+ megabytes of audio data. Yes, save bytes where it makes sense, just sanity check that you're looking at saving bytes in the right areas. Far too many programmers balk over kilobytes while artists and audio consume gigabytes.

Through development you should reasonably understand the high water marks of all your containers, and you should be able to reserve the space for them in situations where you need it. If you know your containers are going to hold a few hundred megabytes, reserve the space early, do it once, and get it over with. In most modern hardware with virtual memory it's effectively free, all you're doing is reserving the address space rather than the physical memory until it page faults.

frob said:
I'd go with an eastl:::vector and a pool allocator based on what you described for your scenario.

Yeah, working on that currently. Still need to figure out how to use custom allocators, but looks all good and i'm very optimistic.
Seems a godsend. Much better than Doom source code. \:D/

frob said:
While memory fragmentation is a real concern, it's an extremely rare issue to encounter in 64-bit world, even in long-running games.

I'm surprised myself it hits me so hard, but it does. Though, i have millions of allocations up at a time, all quite small and just temporary. It's not really a wonder fragmentation grows to no end.
I have 64GB ram, and for the current test case the total usage from the app remains at <20 and does not change over time. It does not grow, so there is no bad leak. Anything seems working correctly, but CPU utilization constantly goes down until only one core seems still working.
Some of my tasks do heavy CPU work, like solver iterations. If this happens, utilization goes up again to 95%.
For most other tasks, e.g. building meshes which is little work but requires temporary memory, it quickly drops to 10% again.
Profiler also lists memory alloc on top, if i get this right.
So at this point i'm sure allocation is my bottleneck, not disk access as assumed earlier.

It should be easy to fix. I'll use many allocators organized spatially, allocating large equally sized blocks on demand and suballocating from that. I know i never need vectors beyond a certain size, and memory requirements correlate with spatial scene density.

Well, i was thinking STL is good enough for tools development. And mostly it surely is.
But for my stuff there seems no way around custom memory management. I assume the virtual memory advantage gets lost because my allocations are small and just too many, so the sub allocation from a page does too much searching within in a critical section, or something like that.
I guess the OS does a good job in the general case, but once you stress it beyond a certain limit, it can't catch up and it all feels like a 486 on Windows 3.11 again. : )

Look at boost::static_vector and boost::small_vector. The former is a fixed-capacity version of std::vector that avoids dynamic allocation entirely. You could use it to replace your array of ten vectors. The latter is similar except that it switches to dynamic allocation when its fixed capacity is exhausted instead of crashing and burning.

Advertisement

It still seems strange, though, to experience so many reallocations.

How much profiling have you done regarding memory use? Are you reserving blocks properly? Have you initialized containers to their expected size? Have you looked at the code triggering reallocations to verify it makes sense?

It seems like it would show up quickly in profiling because system allocation is slow. Fixing your systems to allocate the memory once, then using the space inside the container, would help performance as well as memory.

You can also look at memory techniques used in older games consoles and limited memory systems. Treating permanent allocations and temporary allocations differently. Reserving all blocks at startup. Monitoring high water marks. Lots of pools. Replace the default new/delete in your program to better control use. Lots of memory use tracking.

Constructors generally are not the problem. Misuse might be.

JoeJ said:
Using an array of pointers instead an array of objects would add one level of indirection everywhere for no reason, making the problem noticeable worse.

Not sure what you are doing or trying to solve it, but caches are pretty big and good use. Since it's a vector that re-allocates, its core memory is not going to live near your vector object anyway. So Object::Update() won't pre-cache any of this vector data. If you want the vector object and vector data to live at the same spot, I suggest you just use an array and manually resize it OR construct/deconstruct the std vector object and resize it so that it lives so close, otherwise the indirection has no hit on anything. I'm not sure why you think the indirection is bad. Once you have the pointer and fetch the first element, your array is caching all the next elements. There is no indirection. A pointer to an array of 1000 objects vs an array of 1000 objects will work the same. If you want the data to live closer in some manner then you need to do something else.

More importantly is what you are actually doing to solve the problems you are seeing. But just storing a pointer that gets indirected once per frame (no matter how many elements are in it), per object. 1000 indirect load calls is not even a noticeable thing you would find in a performance test.

NBA2K, Madden, Maneater, Killing Floor, Sims

a light breeze said:
ook at boost::static_vector and boost::small_vector. The former is a fixed-capacity version of std::vector that avoids dynamic allocation entirely. You could use it to replace your array of ten vectors.

I might accept some workaround for the array problem. But the array is just a example. What i really want is decoupling object creation and initialization. Eventually i want to create the objects from a single thread which loads them from disk, and i may want to initialize them later from parallel jobs.

Objects which require a non default constructor to work are thus bad design in most cases.
At least for something general like std::vector, that's bad design for sure.
They tried to fix this. Only since C++11 STL allocators can have some state, e.g. to assign different vectors to different memory pools. And technically everything would indeed work now.
But we are forced to construct every object, meaning creation and initialization has to happen at the same time.
This constraint is not acceptable. They failed, and now they have to try again.

The flaw is already rooted in the C++ language.
Why does making an array of vectors call their default constructors, but there is no way to call parametrized constrictors?
Placement new addresses the latter, but the former is still a problem requiring a fix.

frob said:
How much profiling have you done regarding memory use? Are you reserving blocks properly? Have you initialized containers to their expected size? Have you looked at the code triggering reallocations to verify it makes sense?

I did what i can. Always reserving, minimizing allocations in any way, profiling, etc.
Still it was a matter of years to come to the final conclusion about allocation being the culprit.
I can't be 100% sure yet, but i bet a beer that custom memory management will prevent the performance degradation over time completely. Might be a week of work to pert everything to EASTL and adding the management, but i'll report the outcome…

frob said:
Constructors generally are not the problem. Misuse might be.

Hehe, well - it took me the whole yesterday to figure out how to attach my allocator to EASTL vectors.
My mistake was to copy paste empty copy constructors from the given stateless code examples.
So yeah, mistakes on my side are quite likely. :D

dpadam450 said:
Not sure what you are doing or trying to solve it, but caches are pretty big and good use. Since it's a vector that re-allocates, its core memory is not going to live near your vector object anyway. …

Yes, yes. Workarounds are possible, and they may be acceptable.
But we should not defend the broken status quo by discussing which workaround is the least evil.
We should complain about the broken status quo, so they will fix it in the future.

With EASTL i can do exactly what i want:

				eastl::vector<int, eastlAlloc> vecCustomN[10];
				
				for (auto &a : vecCustomN) a.set_allocator(myCustomAlloc);

				vecCustomN[5].resize(3,-1);
				vecCustomN[5][1] = 1;
				vecCustomN[5].push_back(3);
				for (auto &v : vecCustomN[5]) ImGui::Text("vecCustomN[5]: %i", v);
				vecCustomN[5].clear();
				vecCustomN[5].set_capacity(0);

Creating the array of empty vectors does not yet create a bogus element just to make their implementation work. They do not allocate anything until we say so.

I can (thus) set the allocator after the object creation, at any time when i'm ready to do so.

They did all this work because they were not happy with the status quo either.

And i can even read and learn from the code. STL is just cryptic random letters to my eyes.

So that's the proper solution for me i guess. Will report my experience…

If you really want to separate memory allocation from initialization, there's always std::optional. But really that's a non-issue in most code, because almost all classes have cheap default constructors. std::vector certainly does, unless you deliberately subvert this by using an allocator type without a cheap default constructor.

This topic is closed to new replies.

Advertisement