ASM program flow

None · 2020-11-02T08:30:30

Im trying to understand the sequence in which ASM instructions are executed. Does the execution take place in the order the instructions appear in the text editor? Does it work the same as a c++ program with difference residing in the fact that there are no operators and context brackets. So if it works like a c++ program (with commands getting executed top to bottom row by row ) what happens to the registers? Does one CPU cycle execute one ASM code row?

frob

46,297

October 21, 2020 04:39 AM

Calin said:
It doesn`t confuse me I think. It`s useful to have a broad idea about what`s hiding behind the `pipelining` word, (introduction level type of knowledge)

It's been about 7 years since I put it together, but this article should explain much of it.

From a high level viewpoint you can treat the x86 processors of today exactly the same as they were treated in 1978. Conceptually you can treat it as reading one instruction at a time, then stepping through to the next instruction, completing them one after another.

What happens inside the box has advanced tremendously over the past 40 years. A single CPU has two faces, called Hyper-Threading or Simultaneous Multithreading depending on if you use Intel's or AMD's branding, and can decode up to 8 instructions per cycle, four per interface. Internal pipelines have grown tremendously long, so an instruction can sit inside the box for ages.

Some minor updates from that article: Internally the current round of processor can start up to 6 micro-ops per clock cycle shared between two HT/SMT front ends, and retire up to 4 micro-ops per cycle. Sometimes instructions finish early, so in theory after a bottleneck it can appear as though up to 14 assembly instructions all complete in a single cycle.

From a black box perspective, that means up to 4 instructions consumed in a single cycle, up to 14 instructions completed in a cycle. The black box always treats instructions sequentially, running in the same observable ordering as the instructions ran four decades ago.

Clock cycle timings today mean something of different importance than what they meant 40 years ago, and have transitioned over time. Right now about 5GHz is the physical maximum. When the timer signals a clock tick, the electric signal barely has time to spread across the chip before a new clock tick. If you could see it as a wave spreading across the entire chip, the wave would only be about ⅔ of the way across the big multicore chip before the next signal is sent. Some instructions take several CPU cycles because that is how long it takes for the signal to physically be sent across the chip. Coupled with the effects of the out-of-order core, instruction timings and pipeline flows aren't nearly as useful to understand as they once were.

Two and three decades ago tremendous efforts were spent organizing code to minimize pipeline bubbles. CPU instruction timings were important for optimizers. These days in general the amortized cost is zero over the pipeline, with the bottlenecks coming from keeping the caches loaded rather than execution time. Instead of waiting for instructions to finish, our big bottlenecks today are keeping the CPU fed with data and instructions. The biggest step between CPUs for about 15 years has not been CPU speeds, but instead the sizes of on-die caches, with L1, L2, now massive L3 sizes. Some of today's premium chips have 64MB of L3 cache on die, to help keep the CPU fed with data.

Calin

Author

460

October 21, 2020 08:02 AM

Thanks. to all of you. Im not looking to split things down to machine code. What you guys told me makes good for starters for someone my level I would guess.

My project`s facebook page is “DreamLand Page”

LorenzoGatti

4,655

October 21, 2020 08:20 AM

Just remember the general principle that the CPU pretends to execute code serially, with one program counter and a single set of registers, but it actually does whatever it takes (speculative execution, superscalar execution, pipelining, extra registers, emulation, etc.) to perform more work in less time. If you can tell the difference, it's a serious bug.

Omae Wa Mou Shindeiru

Calin

Author

460

October 21, 2020 10:07 AM

Hey LorenzoGatti Ive noticed youve been posting quite often to this forum, I have nothing against you but dealing with too many `seniors is confusing. I`ve had the chance to meet and engage in exchange of ideas with a lot of awesome people on this forum. Every person I get to interact with becomes a friend. I`m not American, I`m not even from the Western world. I can`t have here as many friends as a Westner could. I don`t mind having a ton of friends but it`s not realistic. I got to know a lot of people but I`m at a point where I can`t add more friends to the those I currently have. If you were some casual person things would have been different

My project`s facebook page is “DreamLand Page”

fleabay

1,327

October 21, 2020 11:04 AM

🙂🙂🙂🙂🙂<←The tone posse, ready for action.

Infinisearch

3,058

November 01, 2020 03:30 AM

ddlox said:
were executed in parallel at the same time because they did not have dependencies on each other: this process was called pipelining and

Actually that is called superscalar execution and in x86 land was introduced with the original pentium and IIRC the pipelines were referred to as the U and V pipes (i got to look it up its been a while) and there were ‘interlocks’ that had to do with invalid instruction pairings. Abrash's zen of code optimization or is it Black book is floating around the web for perusal and download… You might be able to follow along @calin you should take a look, I think gamedev.net has it or at least a link. Also though I haven't read it myself this might be a good free introduction https://en.wikibooks.org/wiki/X86_Assembly

-potential energy is easily made kinetic-

Gnollrunner

474

November 01, 2020 06:16 AM

ddlox said:
then when RISC CPUs were released starting with pentiums (if i remember well),

One minor comment. Pentiums were never realy considered RISC. I worked at intel at the time and there was always talk about the competition between RISC and “our” CISC processors, and who would come out on top. As near as I can tell this distinction isn't there any more anyway.

Infinisearch

3,058

November 02, 2020 07:58 AM

Gnollrunner said:
One minor comment. Pentiums were never realy considered RISC.

That's because they weren't they were 100% CISC because they implement the x86 instruction set.

Gnollrunner said:
I worked at intel at the time and there was always talk about the competition between RISC and “our” CISC processors, and who would come out on top. As near as I can tell this distinction isn't there any more anyway.

Are you saying because intel doesn't have any non x86 competition anymore? Or are you saying there's no distiction between CISC and RISC, or at least no distiction between their performance or maybe underlying implementation?

-potential energy is easily made kinetic-

Gnollrunner

474

November 02, 2020 08:30 AM

@Infinisearch As I'm sure you know, RISC stands for Reduced Instruction Set Computer, but also came with an understanding of some level of pipeline execution and fixed instruction length. CISC stands for Complex Instruction Set Computer. Now CISC processors have a lot of pipe-lining and RISC processors have larger instruction sets (sometimes larger than CISC processors). Some RISC processors even have different length instructions. In any case RISC vs CISC doesn't seem to be a big deal as it once was.

ASM program flow

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

ASM program flow

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines