Inside the CPU – Computerphile

Inside the CPU – Computerphile

In a previous video, we looked at how CPU’s can use caches to speed up accesses to memory. So, the CPU has to fetch things from memory; it might be a bit of data, it might be an instruction And it goes through the cache to try and access it. And the cache keeps a local copy in fast memory to try and speed up the accesses But what we didn’t talk about is:
What does a CPU do with what it’s fetched from memory what is it actually doing and how does it process it? So the CPU is fetching values from memory. We’ll ignore the cache for now, because it doesn’t matter if the CPU has a cache or not it’s still gonna do roughly the same things And we’re also gonna look at very old CPU’s the sort of things that are in 8-bit machines purely because they’re simpler to deal with and simpler to see what’s going on The same idea is still applied to an ARM CPU today or an X86 chip or whatever it is you got in your machine. Modern CPU’s use what’s called
the Van Neumann architecture and what this basically means is
that you have a CPU and you have a block of memory. And that memory is connected to the CPU
by two buses Each is just a collection of several wires that are connecting And again we’re looking at old-fashioned macines.
On a modern machine it gets a bit more complicated But the idea, the principle, is the same. So we have an addess bus and the idea is that the CPU can generate a number in here in binary to access any particular value in here. So we say that the first one is at adress 0 and we’re gonna use a 6502 as an example We’ll say that the last one is at address 65535 in decimal, or FFFF in hexadecimal So we can generate any of these numbers on 16 bits of this address bus to access any of the individual bytes
in this memory How do we get the data between the two?
Well we have another bus which is called the data bus,
which connects the two together Now the reason why this is a Van Neumann machine is because this memory can
contain both the program i.e. the bytes that make up the instructions
that the CPU can execute and the data So the same block of memory
contain some bytes which contain program instructions some bytes which contain data And the CPU if you wanted to could
treat the program as data or treat the data as program Well if you do that then it would probably crash So what we’ve got here is an old BBC Micro
using a 6502 CPU and we’re gonna just write a very, very simple
machine code program that uses well the operation is saying just to
print out the letter C for computerphile So if you assemble it,
we’re using hexadecimal we’ve started our program at 084C So that’s the address,
were our program is being created And our program is very simple It loads one of the CPU’s registers which is just basically a temporary data store
that you can use and this one is called the accumulator with the ascii code 67 which represents
a capital C and then it says:
jump to the subroutine at this address which will print out that particular character And then we tell it we want to stop
so we gotta return from subroutine.
And if we run this and type in the address,
so we’re at … 84C then you’ll see that it prints out the letter C and then we get a prompt
to carry on doing things So our program,
we write it in assembly language which we can understand as humans -ish, LDA: Load Accumulator
JSR: Jump to subroutine RTS: Return to subroutine You get the idea once you’ve done it a few times And the computer converts this
into a series of numbers, in binary The CPU is working in binary but to make it easier to read we display it as hexadecimal So our program becomes:
A9, 43 20 EE FF
60 That’s the program we’ve written And the CPU, when it runs it
needs to fetch those bytes from memory into the CPU Now, how does it do that? To get the first byte we need to
put the address: 084C on the address bus and a bit later on, the memory will send back
the byte that represents the instruction: A9 Now, how does the CPU know where to get these instructions from? Well, it’s quite simple.
Inside the CPU there is a register, which we call
the program counter, or PC on a 6502 or something like an X86 machine it’s
known as the instruction pointer. And all that does is store the address
to the next instruction to execute So when we were starting up here,
it would have 084C in it That’s the address to the instruction we want to execute So when the CPU wants to fetch the
instruction it’s gonna execute It puts that address on the address bus and the memory then sends the instruction
back to the CPU So the first thing the CPU is
gonna do to run our program is to fetch the instruction and the way it does that is by
putting the address from the program counter onto
the address bus and then fetching the actual instruction So the memory provides it,
but the CPU then reads that in on it’s input on the data bus Now it needs to fetch the whole
instruction that the CPU is gonna execute and on the example we saw there
it was relatively straightforward because the instruction was only
a byte long Not all CPU’s are that simple Some CPU’s will vary these things,
so this hardware can actually be quite complicated so it needs to work out how long
the instruction is So it could be as short as one byte it could be as long on some CPU’s
as 15 bytes and you sometimes don’t know how long it’s gonna be until you’ve read at few of the bytes So this hardware can be relatively trivial So an ARM CPU makes it very, very simple
it says: all instructions are 32 bits long So the Archimedes over there
can fetch the instruction very, very simply 32 bits On something like an x86, it can be
any length up to 15 bytes or so and so this becomes more complicated,
you have to sort of work out what it is utnil you’ve got it But we fetch the instruction So in the example we’ve got,
we’ve got A9 here So we now need to work out what A9 does Well, we need to decode it into
what we want the CPU to actually do So we need to have another bit
of our CPU’s hardware which we’re dedicating to
decoding the instruction So we have a part of the CPU which is
fetching it and part of the CPU which is then
decoding it So it gets A9 into it:
So the A9 comes into the decode And it says: Well okay, that’s a load instruction. So I need to fetch a value from memory which was the 43 the ASCII code for the capital letter C
that we saw earlier So we need to fetch something else
from memory We need to access memory again,
and we need to work out what address that’s gonna be. We also then need to,
once we’ve got that value, update the right register
to store that value So we’ve gotta do things in sequence. So part of the Decode logic is to
take the single instruction byte, or how long it is, and work out what’s the sequence that we need to drive the other bits of the CPU to do And so that also means that we have
another bit of the CPU which is the actual bit that does things, which is gonna be all the logic
which actually executes instructions So we start off by fetching it and then once we’ve fetched it
we can start decoding it and then we can execute it And the decode logic is responsible for saying: Put the address for where you want to get the value,
that you can load into memory from and then store it,
once it’s been loaded into the CPU So you’re doing things in order: We have to fetch it first and we can’t decode it until we’ve fetched it and we can’t execute things
until we’ve decoded it So, at any one time,
we’ll probably find on a simple CPU that quite a few of the bits of the
CPU wouldn’t actually be doing anything So, while we’re fetching the value
from memory to work out how we’re gonna decode it the decode and the execute logic
aren’t doing anything They’re just sitting there, waiting for their turn And then, when we decode it,
it’s not fetching anything and it’s not executing anything So we’re sort of moving through these different
states one after the other And that takes different amounts of time If we’re fetching 15 bytes it’s gonna take longer than
if we’re fetching one decoding it might well be shorter than if we’re fetching something from memory,
cos’ this is all inside the CPU And the execution depends on
what’s actually happening So your CPU will work like this:
It will go through each phase, then once it’s done that,
it’ll start on the next clock tick all the CPU’s are synchronized to a clock, which just keeps things moving in sequence and you can build a CPU.
Something like the 6502 worked like that But, as we said, lots of the CPU aren’t actually
doing anything at any time which is a bit wasteful of the resources So is there another way you can do this? And the answer is yes!
You can do what’s called a sort of pipe-lined model of a CPU So what you do here is,
you still have the same 3 bits of the CPU But you say: Okay, so we gotta fetch (f) instruction one In the next bit of time,
I’m gonna start decoding this one So, I’m gonna start decoding instruction one But I’m gonna say: I’m not using
the fetch logic here, so I’m gonna have this start to get things ready and, start to do things ahead of schedule I’m also at the same time
gonna fetch instruction 2 So now I’m doing two things,
two bit’s of my CPU in use the same time I’m fetching the next instruction,
while decoding the first one And once we’ve done decoding, I can start
executing the first instruction So I execute that But at the same time, I can start
decoding instruction 2 and hopefully,
I can start fetching instruction 3 So what? It is still taking the same
amount of time to execute that first instruction So the beauty is when it
comes to executing instruction two it completes exactly one
cycle after the other rather than having to wait for it to go through
the fetch and decode and execute cycles we can just execute it as soon as we’ve
finished instruction one So each instruction still takes the
same amount of time it’s gonna take, say, three clock cycles
to go through the CPU but because we’ve sort of pipelined it together they actually appear to execute one after each other so it appears to execute one clock cycle
after each other And we could do this again
So we could start decoding instruction 3 here at the same time as we’re executing instruction two Now there can be problems This works for some instructions,
but say this instruction said “store this value in memory” Now you’ve got a problem You’ve only got one address bus
and one data bus so you can only access or store
one thing in memory at a time You can’t execute a store instruction and fetch a value from memory So you wouldn’t be able to fetch it until the next clock cycle So we fetch instruction four there while executing instruction three But we can’t decode anything here So in this clock cycle, we can
decode instruction four and fetch instruction five but we can’t execute anything We’ve got what’s called a “bubble”
in our pipelines, or pipeline store because at this point,
the design of the CPU doesn’t let us fetch an instruction and execute an instruction at the same time it’s … what is called “pipeline hazards” that you can get when designing a pipeline CPU because the design of the CPU
doesn’t let you do the things you need to
do at the same time at the same time.
So you have to delay things, which means that
you get a bubble So, you can’t quite get up to
one instruction per cycle efficiency But you can certainly get closer than you could if you
just had everything to do one instruction at a time.

100 thoughts to “Inside the CPU – Computerphile”

  1. What's up with that into? Looking away and then at the camera? Is that some kind of cinematography trick? It just looks kinda awkward to be honest.

  2. I programmed a 65816 emulator… I already know how this stuff works… Why do I still watch this video…

  3. wow. The revenge of the Nerds! But not even FFFF 00100000 views yet. We need more L2 cache notification to increase and register views

  4. Cool how I already learned pipelining basics before this video. Too bad he didn't go into branching, although that's a bit much for one video.

  5. "How can I focus on what he say with this nail 2:38 ?

    Damn, 2:48

    cut it pleaaaase 3:05
    Hey, he cut right hand nails, but not the left 4:10

    Maybe he play guitar? He write with his right hand, ok, I must talk about it to my friend on skype
    … (5 minutes later)
    Ok, so he don't play guitar… fu, I must restart this video now to focus on the talk
    (restart the video)

    NAIIIIIIIL >< (2:38)"

  6. Generally a cool video, but I think you should've split it into one about pipelining (there are other hazards as well like conditions). And then you (or Steve actually) could have expanded about the actual steps inside the CPU. I saw a video recently of someone building a CPU (+ Memory) on breadboards using separate chips for the registers and so on. And I found that video really helpful to see what actually means "fetching", "decoding" and "executing".

  7. I don't really get what the decoding is supposed to do, can't you just execute a piece of code after you fetched it?

  8. cant the bubbles be removed if the instruction memory and data memory were seperate? the structural hazards can be avoided that way since we can access both at the same time

  9. Why would you need 8^15 bit long instructions? I knew x86 was somehow suboptimal by todays standards, but that is just crazy!

  10. Why would you need 8^15 bit long instructions? I knew x86 was somehow suboptimal by todays standards, but that is just crazy!

  11. I wish I didn't have such a problem understanding these logic videos. Must be a learning disability or something. I'm an IT professional(5 years in) and I still have no clue.

  12. CPU is way more complicated than the video entails. Modern processors do all types of things to keep the pipeline filled such as pulling more data than required. For example: rather than fetching one instruction at a time, it could fetch an arbitrary volume of memory (let's say 64 bytes). It would then decode as many instructions as it could from that block and prefetch the next block when the fetch bus isn't in use. Then, it can cache the decoded instructions to prevent fetching and decoding recent code. The CPU can also make notes about the instructions prefetched to predetermine holes in pipeline and try to fill them by reorganizing instructions. This also helps with branch prediction and register renaming.

  13. Is there an animation of this anywhere? The mechanism that fetches the bits and bytes, how it corrals them and brings them back. What is moving, electrons? If they move, how does a copy stay behind? Or is it like Morse code where a signal is sent by the storage using some sort of transmitter that reads the info and sends out what it reads. I'm trying to imagine this tiny world where nothing moves, but a lot happens.

  14. Interesting that you give an example of a system where most of the parts of the CPU are idle, then compare it to a 6502…
    Which does instruction decoding and execution in parallel. (it's like a short, 2 stage pipeline, but not quite.)
    compared to some other processors from the era the instruction times for 6502 code were very short and consistent.

    I miss The 65x family. But it died out because it's entire design is built around having RAM that is faster than the processor.
    And since the mid 90's it's pretty much guaranteed that the processor is faster than RAM.
    That's why cache memory exists. If your main memory was fast enough you wouldn't bother implementing a cache, because it would be redundant. But… When main memory is slow… Cache helps keep the CPU busy…

  15. Next you introduce memory interleave architectures.

    Enjoyable video – takes me back – wrote my first program in '69

  16. so if the pipeline infrastructure cant fetch a command and execute a command at the same time,

    doesnt that just mean you need another data bus?

    It seems to me that if you need one data bus for fetching instructions and another for accessing memory, that you should have every possible part necessary to execute a command redundant and in parallel, basically one bus for accessing memory, one for fetching instructions, a decoder for both, and then no matter what the instruction says, you always have a bus ready for it to be used on the next tick, so you always have an incoming pipeline and a parallel pipeline for things required in the actual instruction.

    If you have pipeline flow issues, make the pipe bigger or in parallel 😛

  17. Great video! Could you guys also cover on different instruction sets for processors? Since I always found the difference between x64, x86 , RISC to be confusing. Would be awesome if you could make a video on it 🙂

  18. Every time he touched and left a fingerprint on the monitor my soul hurt 🙂 In all seriousness though, great video!

  19. oh boy I hope there is an extended or part two, so many interesting options in CPU functioning, also a vid for GPUs and openCL (GPU for non graphics computation) to contrast with the CPU.

  20. Before CLUs. There was nothing. I want to know how they ŵent rom nothing to something as complicated as a cpu.

  21. Can you explain the "on an ARM processor all instruments are 32 bits long"? I'm going to take that to mean "the same bit length" vs 32 bits. But besides that, I remember doing ASM on a NXP chip and some instructions take a few cycles. But I could've swarn some of the Java and thumb 2 stuff had a different instruction length…?

  22. Quick note about "von Neumann Architecture". John von Neumann didn't actually invent it. It was named after him because he wrote a report for some government agency discussing various computer architectures used in computers of that time and one of those architectures was the one we call "von Neumann Architecture". He had nothing to do with its invention. It's one of the most badly-named terms ever.

  23. But you almost always have to fetch data from memory, cause there are just a few assember commands which works without further data (like RTS, INC, DEC, etc.). So in your short assembler source the CPU fetches A9 and decodes that as Load a given value to the accumulator. Cause A9 means that load get its value as an absolute value (opposite to an relative value which is stated as a pointer to an address – which will use some other value then A9 as opcode of course 😉 )) it has to fetch the next byte from memory. So in most cases you get this pipeline bubble anyway.

  24. Id like to know how the design effects performance, and why AMD has trouble competing with Intel, why has the improvement slowed down in the last 5 years,  what are the challenges in making a better processor etc.  Is the whole approach with the way the CPU designed wrong?  Not wrong but are there different ways not explored yet?  Does functional programming have anything to with it?

  25. 3:13
    "we write it in assembly language which we can understand as humans * pauses * -ish"
    FF/FF Makes me laugh every time.

  26. I always love this low level computer stuff. I would like to see things go even a little bit lower, like what exactly executing a command looks like in mathematical terms though

  27. Does anyone know how many reviews does it take for subs to get approved? Or how it even works?
    I got some ppl to review my Spanish subs, but they don't appear yet (in this video nor the MegaProcessor one).

  28. I have a Computer Architecture exam tomorrow. I am so glad that youtube recommended me to watch this. Thanks Computerphile <3

  29. Because of you, I made a computer! Check out the video on my page. I made it control servers, leds, sounds and more! I just had to say thank you.

  30. I still have a big question: how does this translate into transistors? the piece piece I am missing of the puzzle is how adding more transistors increases the speed, specially knowing that there are tasks that require to be sequential.

  31. You folks should insert a link to Ben Eater's "Building an 8-bit Breadboard Computer" series right here on YT. He's brilliant at simplifying the complications of a CPU to a level which the ordinary person can understand.

    The Breadboard Computer which Ben Eater builds and explains over the course of the series can be built by anyone. The only really big complication is finding all of the parts because some of them have become quite scarce since the book which Ben used as his guide was written.

Leave a Reply

Your email address will not be published. Required fields are marked *