Advertisement

what is memory layout for complex object in languages like c# or c++

Started by July 30, 2021 04:48 PM
5 comments, last by Shaarigan 3 years, 4 months ago

hi.

in oop we have aggregate types like struct and classes that can hold primitive types like int, string… or can hold other aggregate types objects. I need to know in raw memory, how they are kept and does many inner aggregate types like a hierarchy have a bad influence on performance or not.

Probably with C++ you could take a pretty good guess at it. However it's going to change with machine and compiler. For example some machines are little ending and some are big endian. Also there is data alignment and padding. From my experiments, Visual C++ always aligns between parent and child classes on 8 byte boundaries while GCC can align on 4 (but don't take that as gospel) . There is also the fact that the standard library may be implemented differently. So things like std::string might be hard to pin down if you want to make this portable. Finally there is the Vtable pointer. For C# I have no clue really as it's more of a black box. What do you need this for? Perhaps there is another way to do what you want.

Advertisement

moeen k said:
I need to know in raw memory

In C you can assume data is in memory in the same order you write your struct, no matter if this data is built in types like int or float, or if it's other structs containing those same types.

Padding as mentioned above happens, so some bytes remain unused. There are compiler settings to affect padding, and you can also get the byte offset of a given variable relative to the struct. E.g. using the offsetof macro, but C++ has newer options here. Otherwise it's the same for classes and C++, but i don't know about C#.

Objects in C# don't access the memory itself but a register which then points to the “real” memory location the class/struct is allocated in. This enables the GC to move memory around without the need to invalidate object pointers. The register also contains some info about the object, like a type ID flag and probably other flags like the monitor thing (when calling lock(myObjectInstance) for example.

The memory itself is usually padded e.g. reordered to optimize the layout, but I don't know to which size. But you see that it's padded/reordered when you try to p/invoke between C# and C/C++ code via the DLLImportAttribute. The marshaller passes the current C# memory address to the native function ‘as is’. This causes errors if you don't remove padding/reordering with the StructLayoutAttribute.

Except for that, it follows the same rules as C/C++ do. An object instance member is a pointer to the object's register while a struct member is most likely embedded into the surrounding type.

Nesting types depends (regardless of the language) on if it is a pointer or value type. Value types (like structs) are most likely put into the same memory as the sursounding type. This leads to possibly more performance in access but a larger memory layout which can cause delay on the CPU when there is need to reload the other portion of the type's memory. A pointer reference on the other side will cause the CPU to load that reference's memory into cache which will likely have an impact on access performance however

Another element is that although some people and teachers have the concept of objects being tiny entities, they are not necessarily that way.

As taught by some groups “objects” are terrible for performance. They operate on a single element at a time, they contain a single element, etc. They are terrible for the cache, terrible for SIMD processing, terrible for computing.

But if you look at languages, perhaps the biggest core language library features are collections and containers, and bulk processing tasks.

Ultimately the implementation is whatever you make of it. You can make decisions that give great performance, or decisions that give terrible performance. Use your brain and think rather than mindlessly applying dogmas about what objects should be, or data should be, or operations should be.

Good old stack vs heap discussion incomming?

In my sources, I'm almost using short living instances which are allocated on the stack for as long as I'm using them. Some static instances are allocated in statick memory and only a few really need to be heap allocated. This may however change at some point. It is also preferable to manage your own memory/ allocation mechanisms in order to keep memory aligned and close together. This also enables bulk operations on instances allocated each on their own when put into a linear block of memory.

In C# the case is much worser. For a long time there was only ‘new’ to have objects allocated. This changed with the Span data type but finally you still don't have that control you have in C/C++ over where the CLR allocates the requested memory. An the GC moving memory around is also a thing which makes everything unreliabl in case of CPU caching and performance

This topic is closed to new replies.

Advertisement