Advertisement

How a CPU Works? The Nitty Gritty

Started by March 21, 2007 03:02 AM
25 comments, last by python_regious 17 years, 7 months ago
You might find OpenCores.org interesting - lots of VHDL and Verilog source for processors and subsystems of various types.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Quote: Original post by Endar
That's just no fair. I did both the Architecture and Operating System subjects (architecture I unfortunately didn't pay any attention in, but I was able to pass), but we didn't do anything approaching implementing a processor in Verilog.

That's too bad--we built a simple processor in Verilog in a 200-level class at my university (required course for CS and EE students). It was really an eye-opening project for me. To pass the class, you had to run an undisclosed program provided by the professor on your processor (flashed onto an FPGA) and get the correct output.
Advertisement
Zmurf: To understand this, you'll want to study digital logic. There are two broad categories: combinational and sequential. Combinational logic is logic which returns an output for a given input (think AND, OR, NOT, XOR, etc.) Sequential logic retains state information and, therefore, the output can depend on previous states. Sequential logic circuits are made out of combinational elements as well as logic circuits which have "memory" -- latches and flip-flops.

From there on out, it's just a matter of more abstraction. If you're in college, take some digital logic courses and learn an HDL.
----Bart
Here's another reference: Intel® 64 and IA-32 Architectures Software Developer's Manuals
"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
Has anyone used or read this book (Computer Architecture: A Quantitative Approach)? Amazon recommends it in conjunction with Computer Organization and Design. How much is there in the way of overlap - and how do they compare?
[TheUnbeliever]
Quote: Has anyone used or read this book (Computer Architecture: A Quantitative Approach)? Amazon recommends it in conjunction with Computer Organization and Design. How much is there in the way of overlap - and how do they compare?


My Computer Design supervisor informs be it's got pretty much most of the stuff that Computer Organization and Design has plus more.

I've read a bit of Computer Architecture, and it means it when it says 'A Quantitative Approach'. They're got a fair amount of (simple)maths to compare various ways of doing architecture. It pretty much assumes you know the basics (e.g. The basics of pipelining are covered in the appendix) though, so Computer Organization probably gives more in-depth coverage of them.

Though really if you want to know the nitty gritty of how a CPU works, I recomend (as others have in this thread) to grab yourself some free tools from Xilinx or Altera, and a hardware board and make a CPU. Generally books don't go into precisely how these things work, e.g. you may have lots of detail about how to deal with pipeline hazards, but when it comes to say stalling the pipeline or flushing it they don't say how this is actually going to be done in hardware (probably because it can vary quite a bit depending on how you've constructed the thing).
Advertisement
Quote: Original post by Extrarius
While it's possible to make an N-bit adder from a 1-bit full adder, it's generally not the best way to do it (the latency is too high).
Anyways, another proper name for this is "Digital Logic". Once you understand transistors, it's not to big a leap to logic gates and flip-flops (the simplest unit of memory), and once you have that, it's almost trivial to make an ALU (arithmetic logic unit, which does adding etc). For something simple, you'd just have every possibly type of calculation done (addition, shift, etc) and then have a multiplexer that picks which value to output based on the instruction bits in the opcode/microcode.


I was thinking about this recently while some people in a group I'm are taking a digital circuitry class which involved making a 4-bit ALU (exactly as you describe). I knew that having a, say, 64-bit adder would have too long of a propogation delay as just a series of 1-bit full adders. Just now, I came up with a method for adding that acts somewhat like a Fast Fourier Transform algorithm and I wanted to know if it was similar to what is done in real processors.

Say you have two 64-bit numbers A and B and you wish to generate the sum C. Break apart A, B, and C in half into A1, A2, B1, B2, C1, and C2 where X1 is the lower order bits and X2 is the higher order bits.

A1 + B1 = C1 w/ or w/o Carry
A2 + B2 + Carry = C2a
A2 + B2 + = C2b

Once C2a and C2b are calculated it simply becomes a matter of outputing the version of C2 that correspons to there being a carry or not.

This method can further be broken down. Divide A1, B1, and C1 in half and perform the same technique as before. Continue until you reach a size where the propogation delay is negligible.

My question is: is this how it's done, or is there a better trick?

C++: A Dialog | C++0x Features: Part1 (lambdas, auto, static_assert) , Part 2 (rvalue references) , Part 3 (decltype) | Write Games | Fix Your Timestep!

It's generally done using a Carry look-ahead adder. As I don't remember how it works precisely and I'm feeling too lazy to remind myself, I don't know how similar it is to your method.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Yep, that's pretty much how it's done.
Quote: Original post by nobodynews
Say you have two 64-bit numbers A and B and you wish to generate the sum C. Break apart A, B, and C in half into A1, A2, B1, B2, C1, and C2 where X1 is the lower order bits and X2 is the higher order bits.

A1 + B1 = C1 w/ or w/o Carry
A2 + B2 + Carry = C2a
A2 + B2 + = C2b

Once C2a and C2b are calculated it simply becomes a matter of outputing the version of C2 that correspons to there being a carry or not.

This method can further be broken down. Divide A1, B1, and C1 in half and perform the same technique as before. Continue until you reach a size where the propogation delay is negligible.

My question is: is this how it's done, or is there a better trick?


Your explanation is so high level it's hard for me to tell what implementation details you are thinking of that would lessen prorogation delay. The 1-bit series implementation is actually exactly your description if you break it up all the way to 1 bit.

Break all the additions down into 1-bit slices. Then have the Carry Out of the previous slice feed into the Carry In of the next slice and that will determine whether C2a or C2b will be chosen because the carry will either be 1 (implicitly choose C2a) or 0.

If you are going to stop breaking it down at some point, for example 8 blocks of 8 bits, then you have to explain how you are going to make the 8 bit addition in each block fast. This is not really a problem. Though, whatever circuit you use to implement the 8-bit blocks does not make the design automatically faster than the 1-bit slices in series design. You would have to calculate the propagation delay of these circuits and see if it the total propagation delay is actually less than the 1-bit adders in series.

Carry look ahead adders are modifications of the ripple carry adder that are proven to lessen propagation delay compared to ripple carry. Read the Wikipedia article and also follow the Look ahead carry unit link at the bottom and you can see them build up a 64 bit adder.

It's too bad they just show block diagrams and not gates within. If they did it would be easy to compare and see that the longest path is much shorter using carry look ahead vs ripple.

Don't think ripple carry adders are never used. If you need a design that takes up little physical area and the speed of your ripple carry adder design is not a problem, then you could use it.

This topic is closed to new replies.

Advertisement