Unformatted text preview: #0 #1 #5, L R1, R2 R2, #1 R1, R1 // // // // // // // R1 = 0 R2 = 1 Stop when R2 > 5 R1 = R1 + R2 R2++ Go back to the loop test R3 = R1 + R1 cs420: speed with complexity 9 What happens at every clock ­cycle Control Unit Datapath WR V C N Z Branch Control DA ADRS Instruction RAM Register file AA PC D A B BA constant 1 0 Mux B MB OUT Instruction Decoder In our simple processor: 1.  Instruction Fetch 2.  Instruction decode 3.  Register fetch 4.  ALU operation 5.  Store results to register DA AA BA MB FS MD WR MW FS V C N Z A MW B DATA Data RAM OUT ALU G 01 Mux D 3. Conditional 4. Change PC ADRS MD 3. Fetch address 3. Fetch address 4. Memory Load 4. Memory store 5. Store to register cs420: speed with complexity 10 On each clock cycle •  Use Program Counter (PC) output to fetch next instruction •  Decode instruction to generate control signals •  Depending on the type of instruction: –  ALU, Load or Store This is the so-called “vonneuman” architecture. cs420: speed with complexity 11 A computer like that can be func3onal •  But we want to make it even faster.. •  What are the obstacles to speed? –  Long chain of gate delays –  “Floating point” computations –  Slow memory –  Virtual memory and paging •  The theme for today: –  The quest for speed to overcome these causes a signi`icant increase in complexity, and makes performance dif`icult to predict and control cs420: speed with complexity 12 Latency vs bandwidth, and pipelining •  Imagine you are putting a `ire out –  Only buckets, no hose –  100 seconds to walk with a bucket from water to `ire, (and 100 to walk to walk back) –  But if you form a bucket brigade •  (needs people and buckets) –  You can deliver a bucket every 10 seconds •  So, latency is 100 or 200 seconds, but bandwidth is...
CS 420 taught by Professor Kale,l during the Fall '08 term at University of Illinois, Urbana Champaign.

