Advertisement

Memory

Started by September 21, 2020 08:30 PM
6 comments, last by Calin 4 years, 2 months ago

Im trying to understand how memory works. Phisically speaking there is a chunk of memory on CPU (fast memory), and another that comes attached into motherboard slots called RAM. But as a programmer when I code there is only RAM, fast CPU memory is just a bridge to phisical RAM.

My project`s facebook page is “DreamLand Page”

FFS, you were all about system programming with assembly last year and you come asking questions like this?

ED: my bad, you haven't asked any questions. Why are you posting in the forums?

🙂🙂🙂🙂🙂<←The tone posse, ready for action.

Advertisement

Virtual memory is the term you're looking for. Basically, there are several layers of indirection made in firmware and the kernel. These layers help protect programs from unintentionally or maliciously modifying memory of other processes.

Virtual memory is important to understand, but then actually on the CPU die there is generally two types of storage of interest.

Basically on a modern system there is several layers of storage, some are controllable by the programmer (to varying extents depending on the language) some mostly by the CPU itself (cache, generally L1, L2 and L3) and others by the operating system (virtual memory, and its relation to physical memory and page/swap files).

  1. Registers. These are on the CPU, and you can generally think of them as a fixed set with specific names, sizes, and purposes depending on the architecture (e.g. EAX, EBX, ECX, EDX are some of the 32bit general purpose registers on x86).

    The program machine code will specify exactly which it is using as part of instructions, for the most part this is your compilers job (or the JIT compiler for Java etc.), as a programmer you will probably only see them directly in assembly (be it source or disassembly from a compiled executable).

    They do not correlate to any RAM/"pointer" address.

    Some processor designs may do register renaming, in which case the register name does not correlate directly to a specific bit of storage on the CPU die for each core, but this should be invisible to the programmer.
  2. Cache. Most CPUs have multiple levels of cache, often 3 levels referred to as L1, L2 and L3 and sometimes L4. L1 is the fastest, and may be split into the instruction and the data cache, and is generally private to each core. L2 is also often private, while L3 is shared across the entire CPU. While the lower levels are faster, the higher levels have more storage space.

    These represent actual physical memory, and allows the CPU to avoid actually reading or writing RAM in many cases, generally programs have little control over its operation directly.

    On x64 I believe the cache commonly operates in cache lines (blocks) of 64 bytes, and DDR4 memory can read/write 8 bytes at a time in bursts of 8 (8*8 giving the 64 byte size). So when your thread first reads some 64byte cache line further reads from that same block should be faster by avoiding accessing RAM. Writes can also be cached.

    It is in this manner you can optimise your program, and why generally operating on a flat array is much faster than say a linked list which may jump around in memory with each next pointer landing in a different part of memory .

    Writes in mutli-threaded code also need to account for this, to make sure all cores/threads see the correct latest data, which can have a performance cost.

So basically the CPU socket only connects L1, L2 etc. to the rest of the motherboard through a data highway. The CPU itself won`t talk to the outer world by different means other than L1,L2. The fact that the RAM chip is phisicaly a separated component is just an artifice made for better hardware management.

So when it comes to memory management everyone is playing nice. The apps take turns at using the processor. the kernel decides what memory is to be used and who gets a turn at using the processor.

When an ASM/C++ program makes a request for memory allocation the resulting binary code will talk to what? Are there memory chunks available for grabbing that have their addresses saved somewhere which my binary can read? Is there an `app template` the compiler is using, as in will the compiler add code/instructions to every app which is supposed to run on this or that OS?

My project`s facebook page is “DreamLand Page”

On modern systems the the CPU does access RAM directly using the Integrated Memory Controller, in fact many things are now on the CPU die.

I can't recall all the exact details, but I think a fair simplification would be to say the CPU core when it fetches an instruction or has an instruction to read/write memory asks the L1 cache if it had it. If not the L1 cache and the L2. If not the L2 asks the L3.

If not the L3 asks the IMC to get it from physical RAM first, known as a soft fault.

If not then the OS is responsible to first load the data, known as a hard fault. Most memory allocations will use the page or swap file, but executables, dynamic libraries, and memory mapped files may use their own file. Since operating systems might also use compressed memory, and I believe that is handled the same, the the OS decompressing the data at this point.

I believe the translation between virtual and physical address happens between the L1 and L2 layers on most x86 designs, and the L1 is effectively invalidated by a context switch to another process. Not 100% sure on this and might vary between microarchitectures.

Calin said:
When an ASM/C++ program makes a request for memory allocation the resulting binary code will talk to what? Are there memory chunks available for grabbing that have their addresses saved somewhere which my binary can read?

So on a modern system “request for memory allocation” always goes to the operating system (e.g. VirtualAlloc) which will set up some virtual memory space requested and return the address for it. Fixed physical memory addresses with special purposes are generally not used anymore by user programs.

The OS itself can of course itself see physical RAM memory directly (after all it needs to manage the swap/page file etc.), but it still doesn't do much directly with the CPU cache.

Calin said:
Is there an `app template` the compiler is using, as in will the compiler add code/instructions to every app which is supposed to run on this or that OS?

I guess you can call it a “template” of sorts. Executable (e.g. .exe) and shared library (e.g. .dll or .so) have defined formats that the operating system understands. The OS will load them into memory (possibly at a fixed virtual memory address, but more commonly now a random one for security and to avoid shared libraries clashing), load any dll/so dependencies, do some fixes to the machine code etc. to accommodate that base address (e.g. any function or other global pointers the program stores in variables, maybe jmps/calls/etc. to other libraries), create all the data for main thread on the OS side with a stack in virtual memory, and determine initial register values (e.g. with the address to that new stack and the address to the first instruction in the “entry point” function), then start it off running.

Advertisement

Thanks for sharing what you guys think

My project`s facebook page is “DreamLand Page”

This topic is closed to new replies.

Advertisement