229 Pages

all_slides

Course: ICS 152, Fall 2009
School: CSU Channel Islands
Rating:
 
 
 
 
 

Word Count: 13977

Document Preview

Lectures for 3rd Edition Note: these lectures are often supplemented with other materials and also problems from the text worked out on the blackboard. You'll want to customize these lectures for your class. The student audience for these lectures have had exposure to logic design and attend a hands-on assembly language programming lab that does not follow a typical lecture format. 2004 <a...

Register Now

Unformatted Document Excerpt

Coursehero >> California >> CSU Channel Islands >> ICS 152

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Lectures for 3rd Edition Note: these lectures are often supplemented with other materials and also problems from the text worked out on the blackboard. You'll want to customize these lectures for your class. The student audience for these lectures have had exposure to logic design and attend a hands-on assembly language programming lab that does not follow a typical lecture format. 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 1 Chapter 1 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 2 Introduction This course is all about how computers work But what do we mean by a computer? Different types: desktop, servers, embedded devices Different uses: automobiles, graphics, finance, genomics... Different manufacturers: Intel, Apple, IBM, Microsoft, Sun... Different underlying technologies and different costs! Analogy: Consider a course on &quot;automotive vehicles&quot; Many similarities from vehicle to vehicle (e.g., wheels) Huge differences from vehicle to vehicle (e.g., gas vs. electric) Best way to learn: Focus on a specific instance and learn how it works While learning general principles and historical perspectives 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 3 Why learn this stuff? You want to call yourself a &quot;computer scientist&quot; You want to build software people use (need performance) You need to make a purchasing decision or offer &quot;expert&quot; advice Both Hardware and Software affect performance: Algorithm determines number of source-level statements Language/Compiler/Architecture determine machine instructions (Chapter 2 and 3) Processor/Memory determine how fast instructions are executed (Chapter 5, 6, and 7) Assessing and Understanding Performance in Chapter 4 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 4 What is a computer? Components: input (mouse, keyboard) output (display, printer) memory (disk drives, DRAM, SRAM, CD) network Our primary focus: the processor (datapath and control) implemented using millions of transistors Impossible to understand by looking at each transistor We need... 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 5 Abstraction Delving into the depths reveals more information An abstraction omits unneeded detail, helps us cope with complexity What are some of the details that appear in these familiar abstractions? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 6 How do computers work? Need to understand abstractions such as: Applications software Systems software Assembly Language Machine Language Architectural Issues: i.e., Caches, Virtual Memory, Pipelining Sequential logic, finite state machines Combinational logic, arithmetic circuits Boolean logic, 1s and 0s Transistors used to build logic gates (CMOS) Semiconductors/Silicon used to build transistors Properties of atoms, electrons, and quantum dynamics So much to learn! 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 7 Instruction Set Architecture A very important abstraction interface between hardware and low-level software standardizes instructions, machine language bit patterns, etc. advantage: different implementations of the same architecture disadvantage: sometimes prevents using new innovations True or False: Binary compatibility is extraordinarily important? Modern instruction set architectures: IA-32, PowerPC, MIPS, SPARC, ARM, and others 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 8 Historical Perspective ENIAC built in World War II was the first general purpose computer Used for computing artillery firing tables 80 feet long by 8.5 feet high and several feet wide Each of the twenty 10 digit registers was 2 feet long Used 18,000 vacuum tubes Performed 1900 additions per second Since then: Moore's Law: transistor capacity doubles every 18-24 months 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 9 Chapter 2 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 10 Instructions: Language of the Machine We'll be working with the MIPS instruction set architecture similar to other architectures developed since the 1980's Almost 100 million MIPS processors manufactured in 2002 used by NEC, Nintendo, Cisco, Silicon Graphics, Sony, ... 1400 1300 1200 1100 1000 900 800 700 600 500 400 300 200 100 0 1998 1999 2000 2001 2002 Other SPARC Hitachi SH PowerPC Motorola 68K MIPS IA-32 ARM 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 11 MIPS arithmetic All instructions have 3 operands Operand order is fixed (destination first) Example: C code: MIPS `code': a = b + c add a, b, c (we'll talk about registers in a bit) &quot;The natural number of operands for an operation like addition is three...requiring every instruction to have exactly three operands, no more and no less, conforms to the philosophy of keeping the hardware simple&quot; 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 12 MIPS arithmetic Design Principle: simplicity favors regularity. Of course this complicates some things... C code: MIPS code: a = b + c + d; add a, b, c add a, a, d Operands must be registers, only 32 registers provided Each register contains 32 bits Design Principle: smaller is faster. Why? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 13 Registers vs. Memory Arithmetic instructions operands must be registers, -- only 32 registers provided Compiler associates variables with registers What about programs with lots of variables Control Memory Datapath Processor Input Output I/O 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 14 Memory Organization Viewed as a large, single-dimension array, with an address. A memory address is an index into the array &quot;Byte addressing&quot; means that the index points to a byte of memory. 0 1 2 3 4 5 6 ... 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 15 Memory Organization Bytes are nice, but most data items use larger &quot;words&quot; For MIPS, a word is 32 bits or 4 bytes. 0 32 bits of data 4 32 bits of data Registers hold 32 bits of data 32 bits of data 8 12 32 bits of data ... 232 bytes with byte addresses from 0 to 232-1 230 words with byte addresses 0, 4, 8, ... 232-4 Words are aligned i.e., what are the least 2 significant bits of a word address? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 16 Instructions Load and store instructions Example: C code: MIPS code: A[12] = h + A[8]; lw $t0, 32($s3) add $t0, $s2, $t0 sw $t0, 48($s3) Can refer to registers by name (e.g., $s2, $t2) instead of number Store word has destination last Remember arithmetic operands are registers, not memory! Can't write: add 48($s3), $s2, 32($s3) 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 17 Our First Example Can we figure out the code? swap(int v[], int k); { int temp; temp = v[k] v[k] = v[k+1]; v[k+1] = temp; swap: } muli $2, $5, 4 add $2, $4, $2 lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) jr $31 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 18 So far we've learned: MIPS -- loading words but addressing bytes -- arithmetic on registers only Instruction add $s1, $s2, $s3 sub $s1, $s2, $s3 lw $s1, 100($s2) sw $s1, 100($s2) Meaning $s1 = $s2 + $s3 $s1 = $s2 $s3 $s1 = Memory[$s2+100] Memory[$s2+100] = $s1 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 19 Machine Language Instructions, like registers and words of data, are also 32 bits long Example: add $t1, $s1, $s2 registers have numbers, $t1=9, $s1=17, $s2=18 Instruction Format: 000000 10001 10010 op rs rt 01000 00000 100000 rd shamt funct Can you guess what the field names stand for? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 20 Machine Language Consider the load-word and store-word instructions, What would the regularity principle have us do? New principle: Good design demands a compromise Introduce a new type of instruction format I-type for data transfer instructions other format was R-type for register Example: lw $t0, 32($s2) 35 op 18 rs 9 rt 32 16 bit number Where's the compromise? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 21 Stored Program Concept Instructions are bits Programs are stored in memory -- to be read or written just like data Processor Memory memory for data, programs, compilers, editors, etc. Fetch &amp; Execute Cycle Instructions are fetched and put into a special register Bits in the register &quot;control&quot; the subsequent actions Fetch the &quot;next&quot; instruction and continue 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 22 Control Decision making instructions alter the control flow, i.e., change the &quot;next&quot; instruction to be executed MIPS conditional branch instructions: bne $t0, $t1, Label beq $t0, $t1, Label Example: if (i==j) h = i + j; bne $s0, $s1, Label add $s3, $s0, $s1 Label: .... 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 23 Control MIPS unconditional branch instructions: j label Example: if (i!=j) h=i+j; else h=i-j; beq $s4, $s5, Lab1 add $s3, $s4, $s5 j Lab2 Lab1: sub $s3, $s4, $s5 Lab2: ... Can you build a simple for loop? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 24 So far: Instruction add $s1,$s2,$s3 sub $s1,$s2,$s3 lw $s1,100($s2) sw $s1,100($s2) bne $s4,$s5,L beq $s4,$s5,L j Label R I J Formats: op op op rs rs rt rt rd shamt funct Meaning $s1 = $s2 + $s3 $s1 = $s2 $s3 $s1 = Memory[$s2+100] Memory[$s2+100] = $s1 Next instr. is at Label if $s4 $s5 Next instr. is at Label if $s4 = $s5 Next instr. is at Label 16 bit address 26 bit address 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 25 Control Flow We have: beq, bne, what about Branch-if-less-than? New instruction: if $s1 &lt; $s2 then $t0 = 1 slt $t0, $s1, $s2 else $t0 = 0 Can use this instruction to build &quot;blt $s1, $s2, Label&quot; -- can now build general control structures Note that the assembler needs a register to do this, -- there are policy of use conventions for registers 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 26 Policy of Use Conventions Name Register number 0 $zero 2-3 $v0-$v1 4-7 $a0-$a3 8-15 $t0-$t7 16-23 $s0-$s7 24-25 $t8-$t9 28 $gp 29 $sp 30 $fp 31 $ra Usage the constant value 0 values for results and expression evaluation arguments temporaries saved more temporaries global pointer stack pointer frame pointer return address Register 1 ($at) reserved for assembler, 26-27 for operating system 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 27 Constants Small constants are used quite frequently (50% of operands) e.g., A = A + 5; B = B + 1; C = C - 18; Solutions? Why not? put 'typical constants' in memory and load them. create hard-wired registers (like $zero) for constants like one. MIPS Instructions: addi $29, $29, 4 slti $8, $18, 10 andi $29, $29, 6 ori $29, $29, 4 Design Principle: Make the common case fast. Which format? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 28 How about larger constants? We'd like to be able to load a 32 bit constant into a register Must use two instructions, new &quot;load upper immediate&quot; instruction lui $t0, 1010101010101010 1010101010101010 0000000000000000 filled with zeros Then must get the lower order bits right, i.e., ori $t0, $t0, 1010101010101010 1010101010101010 ori 0000000000000000 1010101010101010 0000000000000000 1010101010101010 1010101010101010 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 29 Assembly Language vs. Machine Language Assembly provides convenient symbolic representation much easier than writing down numbers e.g., destination first Machine language is the underlying reality e.g., destination is no longer first Assembly can provide 'pseudoinstructions' e.g., &quot;move $t0, $t1&quot; exists only in Assembly would be implemented using &quot;add $t0,$t1,$zero&quot; When considering performance you should count real instructions 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 30 Other Issues Discussed in your assembly language programming lab: support for procedures linkers, loaders, memory layout stacks, frames, recursion manipulating strings and pointers interrupts and exceptions system calls and conventions Some of these we'll talk more about later We'll talk about compiler optimizations when we hit chapter 4. 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 31 Overview of MIPS simple instructions all 32 bits wide very structured, no unnecessary baggage only three instruction formats R I J op op op rs rs rt rt rd shamt funct 16 bit address 26 bit address rely on compiler to achieve performance -- what are the compiler's goals? help compiler where we can 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 32 Addresses in Branches and Jumps Instructions: bne $t4,$t5,Label $t5 beq $t4,$t5,Label $t5 j Label op Formats: I J op rs rt Next instruction is at Label if $t4 Next instruction is at Label if $t4 = Next instruction is at Label 16 bit address 26 bit address Addresses are not 32 bits -- How do we handle this with load and store instructions? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 33 Addresses in Branches Instructions: bne $t4,$t5,Label beq $t4,$t5,Label Formats: I op rs rt 16 bit address Next instruction is at Label if $t4$t5 Next instruction is at Label if $t4=$t5 Could specify a register (like lw and sw) and add it to address use Instruction Address Register (PC = program counter) most branches are local (principle of locality) Jump instructions just use high order bits of PC address boundaries of 256 MB 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 34 To summarize: MIPS operands Example $s0-$s7, $t0-$t9, $zero, $ 32 registersa0-$a3, $v0-$v1, $gp, $fp, $sp, $ra, $at Memory[0], Name Comments Fast locations for data. In MIPS, data must be in registers to perform arithmetic. MIPS register $zero always equals 0. Register $at is reserved for the assembler to handle large constants. Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential words differ by 4. Memory holds data structures, such as arrays, and spilled registers, such as those saved on procedure calls. 2 30 memory Memory[4], ..., Memory[4294967292] words MIPS assembly language Category add Instruction Example add $s1, $s2, $s3 sub $s1, $s2, $s3 Meaning $s1 = $s2 + $s3 $s1 = $s2 - $s3 Comments Three operands; data in registers Three operands; data in registers to add constants from memory to register from register to memory from memory to register Arithmetic subtract add immediate load word addi $s1, $s2, 100 lw $s1, 100($s2) sw $s1, 100($s2) store word lb $s1, 100($s2) Data transfer load byte sb $s1, 100($s2) store byte lui load upper immediate $s1, 100 branch on equal $s1 = $s2 + 100 Used $s1 = Memory[ $s2 + 100]ord W $s2 + 100] = $s1ord Memory[ W $s1 = Memory[ $s2 + 100]yte B $s2 + 100] = $s1yte B Memory[ $s1 = 100 * 2 if$s1 == $s2) go to ( PC + 4 + 100 if$s1 != $s2) go to ( PC + 4 + 100 if$s2 &lt; $s3) $s1 = 1; ( $s1 = 0 else if$s2 &lt; 100) $s1 = 1; ( $s1 = 0 else 16 from register to memory Loads constant in upper 16 bits Equal test; PC-relative branch Not equal test; PC-relative Compare less than; for beq, bne Compare less than constant beq $s1, $s2, 25 $s1, $s2, 25 $s1, $s2, $s3 $s1, $s2, 100 2500 $ra 2500 bne branch on not equal Conditional set on less than slt branch set less than immediate jump slti j jr jal Uncondijump register tional jump jump and link go to 10000 Jump to target address go to$ra For switch, procedure return $ra = PC + 4; go to 10000 procedure call For 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 35 1. Immediate addressing op rs rt Immediate 2. Register addressing op rs rt rd ... funct Registers Register 3. Base addressing op rs rt Address Memory Register + Byte Halfword Word 4. PC-relative addressing op rs rt Address Memory PC + Word 5. Pseudodirect addressing op Address Memory PC Word 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 36 Alternative Architectures Design alternative: provide more powerful operations goal is to reduce number of instructions executed danger is a slower cycle time and/or a higher CPI &quot;The path toward operation complexity is thus fraught with peril. To avoid these problems, designers have moved toward simpler instructions&quot; Let's look (briefly) at IA-32 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 37 IA - 32 1978: The Intel 8086 is announced (16 bit architecture) 1980: The 8087 floating point coprocessor is added 1982: The 80286 increases address space to 24 bits, +instructions 1985: The 80386 extends to 32 bits, new addressing modes 1989-1995: The 80486, Pentium, Pentium Pro add a few instructions (mostly designed for higher performance) 1997: 57 new &quot;MMX&quot; instructions are added, Pentium II 1999: The Pentium III added another 70 instructions (SSE) 2001: Another 144 instructions (SSE2) 2003: AMD extends the architecture to increase address space to 64 bits, widens all registers to 64 bits and other changes (AMD64) 2004: Intel capitulates and embraces AMD64 (calls it EM64T) and adds more media extensions &quot;This history illustrates the impact of the &quot;golden handcuffs&quot; of compatibility &quot;adding new features as someone might add clothing to a packed bag&quot; &quot;an architecture that is difficult to explain and impossible to love&quot; 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 38 IA-32 Overview Complexity: Instructions from 1 to 17 bytes long one operand must act as both a source and destination one operand can come from memory complex addressing modes e.g., &quot;base or scaled index with 8 or 32 bit displacement&quot; Saving grace: the most frequently used instructions are not too difficult to build compilers avoid the portions of the architecture that are slow &quot;what the 80x86 lacks in style is made up in quantity, making it beautiful from the right perspective&quot; 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 39 IA-32 Registers and Data Addressing Registers in the 32-bit subset that originated with 80386 Name 31 EAX ECX EDX EBX ESP EBP ESI EDI CS SS DS ES FS GS EIP EFLAGS 0 GPR 0 GPR 1 GPR 2 GPR 3 GPR 4 GPR 5 GPR 6 GPR 7 Code segment pointer Stack segment pointer (top of stack) Data segment pointer 0 Data segment pointer 1 Data segment pointer 2 Data segment pointer 3 Instruction pointer (PC) Condition codes Use 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 40 IA-32 Register Restrictions Registers are not &quot;general purpose&quot; note the restrictions below 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 41 IA-32 Typical Instructions Four major types of integer instructions: Data movement including move, push, pop Arithmetic and logical (destination register or memory) Control flow (use of condition codes / flags ) String instructions, including string move and string compare 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 42 IA-32 instruction Formats Typical formats: (notice the different lengths) a. JE EIP + displacement 4 4 8 JE Condi- Displacement tion b. CALL 8 CALL 32 Offset c. MOV 6 MOV EBX, [EDI + 45] 1 1 8 d w r/m Postbyte 8 Displacement d. PUSH ESI 5 PUSH 3 Reg e. ADD EAX, #6765 4 3 1 ADD Reg w 32 Immediate f. TEST EDX, #42 7 1 TEST w 8 Postbyte 32 Immediate 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 43 Summary Instruction complexity is only one variable lower instruction count vs. higher CPI / lower clock rate Design Principles: simplicity favors regularity smaller is faster good design demands compromise make the common case fast Instruction set architecture a very important abstraction indeed! 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 44 Chapter Three 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 45 Numbers Bits are just bits (no inherent meaning) -- conventions define relationship between bits and numbers Binary numbers (base 2) 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001... decimal: 0...2n-1 Of course it gets more complicated: numbers are finite (overflow) fractions and real numbers negative numbers e.g., no MIPS subi instruction; addi can add a negative number How do we represent negative numbers? i.e., which bit patterns will represent which numbers? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 46 Possible Representations Sign Magnitude: One's Complement Two's Complement 000 = +0 001 = +1 010 = +2 011 = +3 100 = -0 101 = -1 110 = -2 111 = -3 000 = +0 001 = +1 010 = +2 011 = +3 100 = -3 101 = -2 110 = -1 111 = -0 000 = +0 001 = +1 010 = +2 011 = +3 100 = -4 101 = -3 110 = -2 111 = -1 Issues: balance, number of zeros, ease of operations Which one is best? Why? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 47 MIPS 32 bit signed numbers: 0000 0000 0000 ... 0111 0111 1000 1000 1000 ... 1111 1111 1111 0000 0000 0000 0000 0000 0000 0000two = 0ten 0000 0000 0000 0000 0000 0000 0001two = + 1ten 0000 0000 0000 0000 0000 0000 0010two = + 2ten 1111 1111 0000 0000 0000 1111 1111 0000 0000 0000 1111 1111 0000 0000 0000 1111 1111 0000 0000 0000 1111 1111 0000 0000 0000 1111 1111 0000 0000 0000 1110two 1111two 0000two 0001two 0010two = = = = = + + 2,147,483,646ten 2,147,483,647ten 2,147,483,648ten 2,147,483,647ten 2,147,483,646ten maxint minint 1111 1111 1111 1111 1111 1111 1101two = 3ten 1111 1111 1111 1111 1111 1111 1110two = 2ten 1111 1111 1111 1111 1111 1111 1111two = 1ten 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 48 Two's Complement Operations Negating a two's complement number: invert all bits and add 1 remember: &quot;negate&quot; and &quot;invert&quot; are quite different! Converting n bit numbers into numbers with more than n bits: MIPS 16 bit immediate gets converted to 32 bits for arithmetic copy the most significant bit (the sign bit) into the other bits 0010 1010 -&gt; 0000 0010 -&gt; 1111 1010 &quot;sign extension&quot; (lbu vs. lb) 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 49 Addition &amp; Subtraction Just like in grade school (carry/borrow 1s) 0111 0111 0110 + 0110 - 0110 - 0101 Two's complement operations easy subtraction using addition of negative numbers 0111 + 1010 Overflow (result too large for finite computer word): e.g., adding two n-bit numbers does not yield an n-bit number 0111 + 0001 note that overflow term is somewhat misleading, 1000 it does not mean a carry &quot;overflowed&quot; 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 50 Detecting Overflow No overflow when adding a positive and a negative number No overflow when signs are the same for subtraction Overflow occurs when the value affects the sign: overflow when adding two positives yields a negative or, adding two negatives gives a positive or, subtract a negative from a positive and get a negative or, subtract a positive from a negative and get a positive Consider the operations A + B, and A B Can overflow occur if B is 0 ? Can overflow occur if A is 0 ? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 51 Effects of Overflow An exception (interrupt) occurs Control jumps to predefined address for exception Interrupted address is saved for possible resumption Details based on software system / language example: flight control vs. homework assignment Don't always want to detect overflow -- new MIPS instructions: addu, addiu, subu note: addiu still sign-extends! note: sltu, sltiu for unsigned comparisons 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 52 Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's look at 3 versions based on a gradeschool algorithm 0010 __x_1011 (multiplicand) (multiplier) Negative numbers: convert and multiply there are better techniques, we won't look at them 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 53 Multiplication: Implementation Start Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 Multiplicand Shift left 64 bits 1a. Add multiplicand to product and place the result in Product register 64-bit ALU Multiplier Shift right 32 bits 2. Shift the Multiplicand register left 1 bit Product Write 64 bits Control test 3. Shift the Multiplier register right 1 bit 32nd repetition? No: &lt; 32 repetitions Datapath Yes: 32 repetitions Control Done 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 54 Final Version Start Multiplier starts in right half of product Product0 = 1 1. Test Product0 Product0 = 0 Multiplicand 32 bits 32-bit ALU Product 64 bits Shift right Write Control test 3. Shift the Product register right 1 bit 32nd repetition? No: &lt; 32 repetitions What goes here? Yes: 32 repetitions Done 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 55 Floating Point (a brief look) We need a way to represent numbers with fractions, e.g., 3.1416 very small numbers, e.g., .000000001 very large numbers, e.g., 3.15576 109 Representation: sign, exponent, significand: ( 1)sign significand 2exponent more bits for significand gives more accuracy more bits for exponent increases range IEEE 754 floating point standard: single precision: 8 bit exponent, 23 bit significand double precision: 11 bit exponent, 52 bit significand 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 56 IEEE 754 floating-point standard Leading &quot;1&quot; bit of significand is implicit Exponent is &quot;biased&quot; to make sorting easier all 0s is smallest exponent all 1s is largest bias of 127 for single precision and 1023 for double precision summary: ( 1)sign ( 1 + significand) 2exponent bias Example: decimal: -.75 = - ( + ) binary: -.11 = -1.1 x 2-1 floating point: exponent = 126 = 01111110 IEEE single precision: 10111111010000000000000000000000 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 57 Floating point addition Sign Exponent Fraction Sign Exponent Fraction Start Small ALU 1. Compare the exponents of the two numbers. Shift the smaller number to the right until its exponent would match the larger exponent Exponent difference 0 1 0 1 0 1 2. Add the significands Control Shift right 3. Normalize the sum, either shifting right and incrementing the exponent or shifting left and decrementing the exponent Big ALU Overflow or underflow? No Yes Exception 0 1 0 1 4. Round the significand to the appropriate number of bits Increment or decrement Shift left or right No Rounding hardware Still normalized? Yes Sign Exponent Fraction Done 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 58 Floating Point Complexities Operations are somewhat more complicated (see text) In addition to overflow we can have &quot;underflow&quot; Accuracy can be a big problem IEEE 754 keeps two extra bits, guard and round four rounding modes positive divided by zero yields &quot;infinity&quot; zero divide by zero yields &quot;not a number&quot; other complexities Implementing the standard can be tricky Not using the standard can be even worse see text for description of 80x86 and Pentium bug! 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 59 Chapter Three Summary Computer arithmetic is constrained by limited precision Bit patterns have no inherent meaning but standards do exist two's complement IEEE 754 floating point Computer instructions determine &quot;meaning&quot; of the bit patterns Performance and accuracy are important so there are many complexities in real machines Algorithm choice is important and may lead to hardware optimizations for both space and time (e.g., multiplication) You may want to look back (Section 3.10 is great reading!) 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 60 Chapter 4 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 61 Performance Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation Why is some hardware better than others for different programs? What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?) How does the machine's instruction set affect performance? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 62 Which of these airplanes has the best performance? Airplane Passengers 101 470 132 146 Range (mi) Speed (mph) 630 4150 4000 8720 598 610 1350 544 Boeing 737-100 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 How much faster is the Concorde compared to the 747? How much bigger is the 747 than the Douglas DC-8? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 63 Computer Performance: TIME, TIME, TIME Response Time (latency) -- How long does it take for my job to run? -- How long does it take to execute a job? -- How long must I wait for the database query? Throughput -- How many jobs can the machine run at once? -- What is the average execution rate? -- How much work is getting done? If we upgrade a machine with a new processor what do we increase? If we add a new machine to the lab what do we increase? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 64 Execution Time Elapsed Time counts everything (disk and memory accesses, I/O , etc.) a useful number, but often not good for comparison purposes CPU time doesn't count I/O or time spent running other programs can be broken up into system time, and user time Our focus: user CPU time time spent executing the lines of code that are &quot;in&quot; our program 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 65 Book's Definition of Performance For some program running on machine X, PerformanceX = 1 / Execution timeX &quot;X is n times faster than Y&quot; PerformanceX / PerformanceY = n Problem: machine A runs a program in 20 seconds machine B runs the same program in 25 seconds 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 66 Clock Cycles Instead of reporting execution time in seconds, we often use cycles seconds = program Clock &quot;ticks&quot; indicate when to start activities (one abstraction): time cycle time = time between ticks = seconds per cycle clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) A 4 Ghz. clock has a 1 4 109 1012 = 250 picoseconds (ps) cycle time 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 67 How to Improve Performance seconds = program So, to improve performance (everything else being equal) you can either (increase or decrease?) ________ the # of required cycles for a program, or ________ the clock cycle time or, said another way, ________ the clock rate. 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 68 How many cycles are required for a program? Could assume that number of cycles equals number of instructions 2nd instruction 3rd instruction 1st instruction 4th 6th 5th ... time assumption is incorrect, different instructions take different amounts of time on different machines. ? hint: remember that these are machine instructions, not lines of C code 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 69 Different numbers of cycles for different instructions time Multiplication takes more time than addition Floating point operations take longer than integer ones Accessing memory takes more time than accessing registers Important point: changing the cycle time often changes the number of cycles required for various instructions (more later) 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 70 Example Our favorite program runs in 10 seconds on computer A, which has a 4 GHz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target?&quot; Don't Panic, can easily work this out from basic principles 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 71 Now that we understand cycles A given program will require some number of instructions (machine instructions) some number of cycles some number of seconds We have a vocabulary that relates these quantities: cycle time (seconds per cycle) clock rate (cycles per second) CPI (cycles per instruction) a floating point intensive application might have a higher CPI MIPS (millions of instructions per second) this would be higher for a program using simple instructions 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 72 Performance Performance is determined by execution time Do any of the other variables equal performance? # of cycles to execute program? # of instructions in program? # of cycles per second? average # of cycles per instruction? average # of instructions per second? Common pitfall: thinking one of the variables is indicative of performance when it really isn't. 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 73 CPI Example Suppose we have two implementations of the same instruction set architecture (ISA). For some program, Machine A has a clock cycle time of 250 ps and a CPI of 2.0 Machine B has a clock cycle time of 500 ps and a CPI of 1.2 What machine is faster for this program, and by how much? If two machines have the same ISA which of our quantities (e.g., clock rate, CPI, execution time, # of instructions, MIPS) will always be identical? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 74 # of Instructions Example A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively). The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. Which sequence will be faster? How much? What is the CPI for each sequence? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 75 MIPS example Two different compilers are being tested for a 4 GHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software. The first compiler's code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. The second compiler's code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. Which sequence will be faster according to MIPS? Which sequence will be faster according to execution time? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 76 Benchmarks Performance best determined by running a real application Use programs typical of expected workload Or, typical of expected class of applications e.g., compilers/editors, scientific applications, graphics, etc. Small benchmarks nice for architects and designers easy to standardize can be abused SPEC (System Performance Evaluation Cooperative) companies have agreed on a set of real program and inputs valuable indicator of performance (and compiler technology) can still be abused 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 77 Benchmark Games An embarrassed Intel Corp. acknowledged Friday that a bug in a software program known as a compiler had led the company to overstate the speed of its microprocessor chips on an industry benchmark by 10 percent. However, industry analysts said the coding error...was a sad commentary on a common industry practice of &quot;cheating&quot; on standardized performance tests...The error was pointed out to Intel two days ago by a competitor, Motorola ... came in a test known as SPECint92...Intel acknowledged that it had &quot;optimized&quot; its compiler to improve its test scores. The company had also said that it did not like the practice but felt to compelled to make the optimizations because its competitors were doing the same thing...At the heart of Intel's problem is the practice of &quot;tuning&quot; compiler programs to recognize certain computing problems in the test and then substituting special handwritten pieces of code... Saturday, January 6, 1996 New York Times 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 78 SPEC `89 Compiler &quot;enhancements&quot; and performance 800 700 600 o i t a 500 r e c n a m r 400 o f r e p C E P 300 S 200 100 0 gcc espresso spice doduc nasa7 li eqntott matrix300 fpppp Compiler tomcatv Benchmark Enhanced compiler 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 79 SPEC CPU2000 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 80 SPEC 2000 Does doubling the clock rate double the performance? Can a machine with a slower clock rate have better performance? 1.6 1400 1.4 1200 Pentium 4 CFP2000 1000 Pentium 4 CINT2000 800 600 Pentium III CINT2000 400 200 0 500 1000 1500 2000 2500 3000 3500 Clock rate in MHz Pentium III CFP2000 0.2 0.0 SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000 Always on/maximum clock Laptop mode/adaptive clock Benchmark and power mode Minimum power/minimum clock 0.4 1.0 0.8 0.6 1.2 Pentium M @ 1.6/0.6 GHz Pentium 4-M @ 2.4/1.2 GHz Pentium III-M @ 1.2/0.8 GHz 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 81 Experiment Phone a major computer retailer and tell them you are having trouble deciding between two different computers, specifically you are confused about the processors strengths and weaknesses (e.g., Pentium 4 at 2Ghz vs. Celeron M at 1.4 Ghz ) What kind of response are you likely to get? What kind of response could you give a friend with the same question? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 82 Amdahl's Law Execution Time After Improvement = Execution Time Unaffected +( Execution Time Affected / Amount of Improvement ) Example: &quot;Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?&quot; How about making it 5 times faster? Principle: Make the common case fast 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 83 Example Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floating-point instructions? We are looking for a benchmark to show off the new floating-point unit described above, and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 100 seconds with the old floating-point hardware. How much of the execution time would floatingpoint instructions have to account for in this program in order to yield our desired speedup on this benchmark? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 84 Remember Performance is specific to a particular program/s Total execution time is a consistent summary of performance For a given architecture performance increases come from: increases in clock rate (without adverse CPI affects) improvements in processor organization that lower CPI compiler enhancements that lower CPI and/or instruction count Algorithm/Language choices that affect instruction count Pitfall: expecting improvement in one aspect of a machine's performance to affect the total performance 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 85 Lets Build a Processor Almost ready to move into chapter 5 and start building a processor First, let's review Boolean Logic and build the ALU we'll need (Material from Appendix B) operation a 32 ALU result 32 b 32 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 86 Review: Boolean Algebra &amp; Gates Problem: Consider a logic function with three inputs: A, B, and C. Output D is true if at least one input is true Output E is true if exactly two inputs are true Output F is true only if all three inputs are true Show the truth table for these three functions. Show the Boolean equations for these three functions. Show an implementation consisting of inverters, AND, and OR gates. 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 87 An ALU (arithmetic logic unit) Let's build an ALU to support the andi and ori instructions we'll just build a 1 bit ALU, and use 32 of them operation a b result op a b res Possible Implementation (sum-of-products): 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 88 Review: The Multiplexor Selects one of the inputs to be the output, based on a control input S note: we call this a 2-input mux even though it has 3 inputs! A B 0 1 C Lets build our ALU using a MUX: 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 89 Different Implementations Not easy to decide the &quot;best&quot; way to build something Don't want too many inputs to a single gate Don't want to have to go through too many gates for our purposes, ease of comprehension is important Let's look at a 1-bit ALU for addition: CarryIn a Sum b cout = a b + a cin + b cin sum = a xor b xor cin CarryOut How could we build a 1-bit ALU for add, and, and or? How could we build a 32-bit ALU? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 90 Building a 32 bit ALU CarryIn Operation a0 b0 Operation CarryIn a1 a 0 1 b1 CarryIn ALU0 CarryOut Result0 CarryIn ALU1 CarryOut Result1 Result a2 CarryIn ALU2 CarryOut Result2 b 2 b2 CarryOut a31 b31 CarryIn ALU31 Result31 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 91 What about subtraction (a b) ? Two's complement approach: just negate b and add. How do we negate? A very clever solution: Binvert Operation CarryIn a 0 1 Result b 0 1 2 CarryOut 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 92 Adding a NOR function Can also choose to invert a. How do we get &quot;a NOR b&quot; ? Ainvert Binvert Operation CarryIn a 0 1 0 1 Result b 0 1 + 2 CarryOut 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 93 Tailoring the ALU to the MIPS Need to support the set-on-less-than instruction (slt) remember: slt is an arithmetic instruction produces a 1 if rs &lt; rt and 0 otherwise use subtraction: (a-b) &lt; 0 implies a &lt; b Need to support test for equality (beq $t5, $t6, $t7) use subtraction: (a-b) = 0 implies a = b 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 94 Supporting slt Can we figure out the idea? Ainvert Binvert a Operation CarryIn a 0 1 1 Result b 0 1 Less Set Overflow detection Overflow CarryOut 3 + 2 0 Ainvert Binvert Operation CarryIn 0 1 0 1 Result b 0 1 Less 3 + 2 Use this ALU for most significant bit all other bits Supporting slt Binvert Ainvert CarryIn Operation a0 b0 CarryIn ALU0 Less CarryOut Result0 a1 b1 0 CarryIn ALU1 Less CarryOut Result1 a2 b2 0 CarryIn ALU2 Less CarryOut . . . . . . CarryIn CarryIn ALU31 Less . . . Result2 a31 b31 0 Result31 Set Overflow 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 96 Test for equality Notice control lines: 0000 0001 0010 0110 0111 1100 = = = = = = and or add subtract slt NOR Bnegate Ainvert a0 b0 CarryIn ALU0 Less CarryOut Operation Result0 a1 b1 0 CarryIn ALU1 Less CarryOut Result1 . . . Zero Note: zero is a 1 when the result is zero! a2 b2 0 CarryIn ALU2 Less CarryOut . . . Result2 . . . CarryIn CarryIn ALU31 Less Result31 . . . . . . a31 b31 0 Set Overflow 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 97 Conclusion We can build an ALU to support the MIPS instruction set key idea: use multiplexor to select the output we want we can efficiently perform subtraction using two's complement we can replicate a 1-bit ALU to produce a 32-bit ALU Important points about hardware all of the gates are always working the speed of a gate is affected by the number of inputs to the gate the speed of a circuit is affected by the number of gates in series (on the &quot;critical path&quot; or the &quot;deepest level of logic&quot;) Our primary focus: comprehension, however, Clever changes to organization can improve performance (similar to using better algorithms in software) We saw this in multiplication, let's look at addition now 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 98 Problem: ripple carry adder is slow Is a 32-bit ALU as fast as a 1-bit ALU? Is there more than one way to do addition? two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c1 = b0c0 + a0c0 + a0b0 c2 = b1c1 + a1c1 + a1b1 c2 = c3 = b2c2 + a2c2 + a2b2 c3 = c4 = b3c3 + a3c3 + a3b3 c4 = Not feasible! Why? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 99 Carry-lookahead adder An approach in-between our two extremes Motivation: If we didn't know the value of carry-in, what could we do? gi = ai bi When would we always generate a carry? When would we propagate the carry? Did we get rid of the ripple? pi = ai + bi c1 = g0 + p0c0 c2 = g1 + p1c1 c2 = c3 = g2 + p2c2 c3 = c4 = g3 + p3c3 c4 = Feasible! Why? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 100 Use principle to build bigger adders CarryIn a0 b0 a1 b1 a2 b2 a3 b3 CarryIn Result003 ALU0 P0 G0 C1 pi gi ci + 1 Carry-lookahead unit a4 b4 a5 b5 a6 b6 a7 b7 CarryIn Result407 ALU1 P1 G1 C2 pi + 1 gi + 1 ci + 2 Result8011 Can't build a 16 bit adder this way... (too big) Could use ripple carry of 4-bit CLA adders Better: use the CLA principle again! a8 b8 a9 b9 a10 b10 a11 b11 CarryIn ALU2 P2 G2 C3 pi + 2 gi + 2 ci + 3 a12 b12 a13 b13 a14 b14 a15 b15 CarryIn Result12015 ALU3 P3 G3 C4 CarryOut pi + 3 gi + 3 ci + 4 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 101 ALU Summary We can build an ALU to support MIPS addition Our focus is on comprehension, not performance Real processors use more sophisticated techniques for arithmetic Where performance is not critical, hardware description languages allow designers to completely automate the creation of hardware! 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 102 Chapter Five 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 103 The Processor: Datapath &amp; Control We're ready to look at an implementation of the MIPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j Generic Implementation: use the program counter (PC) to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 104 More Implementation Details Abstract / Simplified View: 4 Add Add Data Register # Registers Register # Register # Data PC Address Instruction Instruction memory ALU Address Data memory Two types of functional units: elements that operate on data values (combinational) elements that contain state (sequential) 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 105 State Elements Unclocked vs. Clocked Clocks used in synchronous logic when should an element that contains state be updated? Falling edge Clock period cycle time Rising edge 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 106 An unclocked state element The set-reset latch output depends on present inputs and also on past inputs R Q S Q 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 107 Latches and Flip-flops Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) Change of state (value) is based on the clock Latches: whenever the inputs change, and the clock is asserted Flip-flop: state changes only on a clock edge (edge-triggered methodology) &quot;logically true&quot;, -- could mean electrically low A clocking methodology defines when signals can be read and written -- wouldn't want to read a signal at the same time it was being written 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 108 D-latch Two inputs: the data value to be stored (D) the clock signal (C) indicating when to read &amp; store D Two outputs: the value of the internal state (Q) and it's complement C Q D C _ Q D Q 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 109 D flip-flop Output changes only on the clock edge D D C D latch Q D C D latch Q Q Q Q C D C Q 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 110 Our Implementation An edge triggered methodology Typical execution: read contents of some state elements, send values through some combinational logic write results to one or more state elements State element 1 Combinational logic State element 2 Clock cycle 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 111 Register File Built using D flip-flops Read register number 1 Register 0 Register 1 ... M u x Read data 1 Read register number 1 Read register number 2 Write register Write data Register file Read data 1 Register n 0 2 Register n 0 1 Read data 2 Read register number 2 Write M u x Read data 2 Do you understand? What is the &quot;Mux&quot; above? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 112 Abstraction Make sure you understand the abstractions! Sometimes it is easy to think you do, when you don't Select A31 Select B31 A 32 M u x 32 C A30 M u x C31 B 32 B30 M u x . . . C30 . . . A0 B0 M u x C0 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 113 Register File Note: we still use the real clock to determine when to write Write C Register 0 . . . D C Register 1 D . . . 0 1 Register number n-to-2n decoder n01 n C Register n 0 2 D C Register n 0 1 Register data D 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 114 Simple Implementation Include the functional units we need for each instruction Instruction address Instruction Instruction memory a. Instruction memory b. Program counter c. Adder Write data PC Add Sum Address MemWrite Read data 16 Data memory Sign extend 32 MemRead a. Data memory unit b. Sign-extension unit 5 Register numbers 5 5 Read register 1 Read register 2 Write register Write Data Registers Read data 1 Data Read data 2 4 ALU operation Zero ALU ALU result Data Why do we need this stuff? b. ALU RegWrite a. Registers 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 115 Building the Datapath Use multiplexors to stitch them together PCSrc M u x Add Shift left 2 Read register 1 ALUSrc Read data 1 ALU operation MemWrite Zero ALU ALU result Address Read data MemtoReg ALU result Add 4 PC Read address Instruction Instruction memory 4 Read register 2 Registers Read Write data 2 register Write data RegWrite 16 32 M u x M u x Write data Sign extend MemRead Data memory 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 116 Control Selecting the operations to perform (ALU, read/write, etc.) Controlling the flow of data (multiplexor inputs) Information comes from the 32 bits of the instruction Example: add $8, $17, $18 000000 10001 10010 op rs rt Instruction Format: 01000 00000 100000 rd shamt funct ALU's operation based on instruction type and function code 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 117 Control e.g., what should the ALU do with this instruction Example: lw $1, 100($2) 35 op 2 rs 1 rt 100 16 bit offset ALU control input 0000 0001 0010 0110 0111 1100 AND OR add subtract set-on-less-than NOR Why is the code for subtract 0110 and not 0011? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 118 Control Must describe hardware to compute 4-bit ALU control input given instruction type 00 = lw, sw ALUOp 01 = beq, computed from instruction type 10 = arithmetic function code for arithmetic Describe it using a truth table (can turn into gates): 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 119 0 Add 4 RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Read register 1 Read register 2 Write register Write data Shift left 2 ALU Add result M u x 1 Instruction [31026] Control PC Read address Instruction [3100] Instruction memory Instruction [25021] Instruction [20016] 0 M u Instruction [15011] x 1 Read data 1 Zero Read data 2 0 M u x 1 ALU ALU result Address Read data 1 M u x 0 Registers Data Write memory data ALU control Instruction [1500] 16 Sign extend 32 Instruction [500] Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 Control Simple combinational logic (truth tables) Inputs Op5 Op4 Op3 ALUOp ALU control block ALUOp0 ALUOp1 Outputs F3 F (50 0) F2 F1 F0 Operation2 Operation1 Operation0 Operation R-format Iw sw beq RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOpO Op2 Op1 Op0 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 121 Our Simple Control Structure All of the logic is combinational We wait for everything to settle down, and the right thing to be done ALU might not produce &quot;right answer&quot; right away we use write signals along with clock to determine when to write Cycle time determined by length of the longest path State element 1 State element 2 Combinational logic Clock cycle We are ignoring some details like setup and hold times 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 122 Single Cycle Implementation Calculate cycle time assuming negligible delays except: memory (200ps), ALU and adders (100ps), register file access (50ps) PCSrc M u x Add Shift left 2 Read register 1 ALUSrc Read data 1 ALU operation MemWrite Zero ALU ALU result Address Read data MemtoReg ALU result Add 4 PC Read address Instruction Instruction memory 4 Read register 2 Registers Read Write data 2 register Write data RegWrite 16 32 M u x M u x Write data Sign extend MemRead Data memory 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 123 Where we are headed Single Cycle Problems: what if we had a more complicated instruction like floating point? wasteful of area One Solution: use a &quot;smaller&quot; cycle time have different instructions take different numbers of cycles a &quot;multicycle&quot; datapath: PC Address Instruction or data Instruction register Data Register # Registers Register # A ALU B ALUOut Memory Data Memory data register Register # 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 124 Multicycle Approach We will be reusing functional units ALU used to compute address and to increment PC Memory used for instruction and data Our control signals will not be determined directly by instruction e.g., what should the ALU do for a &quot;subtract&quot; instruction? We'll use a finite state machine for control 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 125 Multicycle Approach Break up the instructions into steps, each step takes a cycle balance the amount of work to be done restrict each cycle to use only one major functional unit At the end of a cycle store values for use in later cycles (easiest thing to do) introduce additional &quot;internal&quot; registers PC 0 M u x 1 Instruction [25021] Instruction [20016] Instruction [1500] Instruction register Instruction [1500] Memory data register Read register 1 0 M u x 1 Address Memory MemData Write data 0 M Instruction u x [15011] 1 0 M u x 1 16 Read data 1 Read register 2 Registers Write Read register data 2 Write data A Zero ALU ALU result ALUOut B 4 0 1M u 2 x 3 Sign extend 32 Shift left 2 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 126 Instructions from ISA perspective Consider each instruction from perspective of ISA. Example: The add instruction changes a register. Register specified by bits 15:11 of instruction. Instruction specified by the PC. New value is the sum (&quot;op&quot;) of two registers. Registers specified by bits 25:21 and 20:16 of the instruction Reg[Memory[PC][15:11]] &lt;= [20:16]] In order to accomplish this we must break up the instruction. (kind of like introducing variables when programming) Reg[Memory[PC][25:21]] op Reg[Memory[PC] 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 127 Breaking down an instruction ISA definition of arithmetic: Reg[Memory[PC][15:11]] &lt;= Reg[Memory[PC][25:21]] Reg[Memory[PC][20:16]] Could break down to: IR &lt;= Memory[PC] A &lt;= Reg[IR[25:21]] B &lt;= Reg[IR[20:16]] ALUOut &lt;= A op B Reg[IR[20:16]] &lt;= ALUOut We forgot an important part of the definition of arithmetic! PC &lt;= PC + 4 op 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 128 Idea behind multicycle approach We define each instruction from the ISA perspective (do this!) Break it down into steps following our rule that data flows through at most one major functional unit (e.g., balance work across steps) Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.) Finally try and pack as much work into each step (avoid unnecessary cycles) while also trying to share steps where possible (minimizes control, helps to simplify solution) Result: Our book's multicycle Implementation! 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 129 Five Execution Steps Instruction Fetch Instruction Decode and Register Fetch Execution, Memory Address Computation, or Branch Completion Memory Access or R-type instruction completion Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 130 Step 1: Instruction Fetch Use PC to get instruction and put it in the Instruction Register. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using RTL &quot;Register-Transfer Language&quot; IR &lt;= Memory[PC]; PC &lt;= PC + 4; Can we figure out the values of the control signals? What is the advantage of updating the PC now? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 131 Step 2: Instruction Decode and Register Fetch Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A &lt;= Reg[IR[25:21]]; B &lt;= Reg[IR[20:16]]; ALUOut &lt;= PC + (sign-extend(IR[15:0]) &lt;&lt; 2); We aren't setting any control lines based on the instruction type (we are busy &quot;decoding&quot; it in our control logic) 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 132 Step 3 (instruction dependent) ALU is performing one of three functions, based on instruction type Memory Reference: ALUOut &lt;= A + sign-extend(IR[15:0]); R-type: ALUOut &lt;= A op B; Branch: if (A==B) PC &lt;= ALUOut; 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 133 Step 4 (R-type or memory-access) Loads and stores access memory MDR &lt;= Memory[ALUOut]; or Memory[ALUOut] &lt;= B; R-type instructions finish Reg[IR[15:11]] &lt;= ALUOut; The write actually takes place at the end of the cycle on the edge 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 134 Write-back step Reg[IR[20:16]] &lt;= MDR; Which instruction needs this? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 135 Summary: 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 136 Simple Questions How many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label add $t5, $t2, $t3 sw $t5, 8($t3) ... #assume not Label: What is going on during the 8th cycle of execution? In what cycle does the actual addition of $t2 and $t3 takes place? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 137 PCWriteCond PCWrite IorD MemRead MemWrite MemtoReg IRWrite Op [500] Control Outputs PCSource ALUOp ALUSrcB ALUSrcA RegWrite RegDst 26 Shift left 2 28 Jump address [3100] 0 M 1 u x 2 Instruction [25-0] Instruction [31026] Address Memory MemData Write data Instruction [25021] Instruction [20016] Instruction [1500] Instruction register Instruction [1500] Memory data register Read register 1 Read data 1 Read register 2 Registers Write Read register data 2 Write data 0 M u x 1 PC 0 M u x 1 PC [31028] A Zero ALU ALU result ALUOut 0 M Instruction u x [15011] 1 0 M u x 1 16 B 4 0 1M u 2 x 3 Sign extend 32 Shift left 2 ALU control Instruction [500] Review: finite state machines Finite state machines: a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input) Next state Current state Next-state function Clock Inputs Output function Outputs We'll use a Moore machine (output based only on current state) 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 139 Review: finite state machines Example: B. 37 A friend would like you to build an &quot;electronic eye&quot; for use as a fake security device. The device consists of three lights lined up in a row, controlled by the outputs Left, Middle, and Right, which, if asserted, indicate that a light should be on. Only one light is on at a time, and the light &quot;moves&quot; from left to right and then from right to left, thus scaring away thieves who believe that the device is monitoring their activity. Draw the graphical representation for the finite state machine used to specify the electronic eye. Note that the rate of the eye's movement will be controlled by the clock speed (which should not be too great) and that there are essentially no inputs. 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 140 Implementing the Control Value of control signals is dependent upon: what instruction is being executed which step is being performed Use the information we've accumulated to specify a finite state machine specify the finite state machine graphically, or use microprogramming Implementation can be derived from specification 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 141 Graphical Specification of FSM 0 Start Instruction fetch MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 Instruction decode/ register fetch 1 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 Note: don't care if not mentioned asserted if name only otherwise exact value Memory address computation Execution 6 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 Branch completion 8 9 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 Jump completion How many state bits will we need? 2 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 PCWrite PCSource = 10 Memory access 3 MemRead IorD = 1 5 Memory access 7 MemWrite IorD = 1 R-type completion RegDst = 1 RegWrite MemtoReg = 0 Memory read completon step 4 RegDst = 1 RegWrite MemtoReg = 0 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 142 Finite State Machine for Control Implementation: PCWrite PCWriteCond IorD MemRead MemWrite Control logic IRWrite MemtoReg PCSource ALUOp Outputs ALUSrcB ALUSrcA RegWrite RegDst NS3 NS2 NS1 NS0 Inputs 5 p O 4 p O 3 p O 2 p O 1 p O 0 p O 3 S 2 S 1 S 0 S Instruction register opcode field State register 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 143 PLA Implementation If I picked a horizontal or vertical line could you explain it? Op5 Op4 Op3 Op2 Op1 Op0 S3 S2 S1 S0 PCWrite PCWriteCond IorD MemRead MemWrite IRWrite MemtoReg PCSource1 PCSource0 ALUOp1 ALUOp0 ALUSrcB1 ALUSrcB0 ALUSrcA RegWrite RegDst NS3 NS2 NS1 NS0 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 144 ROM Implementation ROM = &quot;Read Only Memory&quot; values of memory locations are fixed ahead of time A ROM can be used to implement a truth table if the address is m-bits, we can address 2m entries in the ROM. our outputs are the bits of data that the address points to. m n 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1 0 1 m is the &quot;height&quot;, and n is the &quot;width&quot; 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 145 ROM Implementation How many inputs are there? 6 bits for opcode, 4 bits for state = 10 address lines (i.e., 210 = 1024 different addresses) How many outputs are there? 16 datapath-control outputs, 4 state bits = 20 outputs ROM is 210 x 20 = 20K bits (and a rather unusual size) Rather wasteful, since for lots of the entries, the outputs are the same -- i.e., opcode is often ignored 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 146 ROM vs PLA Break up the table into two parts -- 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM -- 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM -- Total: 4.3K bits of ROM PLA is much smaller -- can share product terms -- only need entries that produce an active output -- can take into account don't cares Size is (#inputs #product-terms) + (#outputs #product-terms) For this example = (10x17)+(20x17) = 510 PLA cells PLA cells usually about the size of a ROM cell (slightly bigger) 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 147 Another Implementation Style Complex instructions: the &quot;next state&quot; is often current state + 1 Control unit PCWrite PCWriteCond IorD MemRead MemWrite IRWrite BWrite MemtoReg PCSource ALUOp ALUSrcB ALUSrcA RegWrite RegDst AddrCtl PLA or ROM Outputs Input 1 State Adder Address select logic ] 0 0 5 [ p O Instruction register opcode field 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 148 Details Op 000000 000010 000100 100011 101011 Dispatch ROM 1 Opcode name R-format jmp beq lw sw Value 0110 1001 1000 0010 0010 Op 100011 101011 Dispatch ROM 2 Opcode name lw sw Value 0011 0101 PLA or ROM 1 State Adder 3 Mux 2 1 AddrCtl 0 0 Dispatch ROM 2 Dispatch ROM 1 Address select logic Instruction register opcode field State number 0 1 2 3 4 5 6 7 8 9 Address-control action Use incremented state Use dispatch ROM 1 Use dispatch ROM 2 Use incremented state Replace state number by 0 Replace state number by 0 Use incremented state Replace state number by 0 Replace state number by 0 Replace state number by 0 Value of AddrCtl 3 1 2 3 0 0 3 0 0 0 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 149 Microprogramming Control unit PCWrite PCWriteCond IorD MemRead MemWrite IRWrite BWrite MemtoReg PCSource ALUOp ALUSrcB ALUSrcA RegWrite RegDst AddrCtl Microcode memory Datapath Outputs Input 1 Microprogram counter Adder Address select logic Instruction register opcode field What are the &quot;microinstructions&quot; ? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 150 Microprogramming A specification methodology appropriate if hundreds of opcodes, modes, cycles, etc. signals specified symbolically using microinstructions Label Fetch Mem1 LW2 ALU control Add Add Add SRC1 PC PC A SRC2 4 Extshft Read Extend Register control PCWrite Memory control Read PC ALU Sequencing Seq Dispatch 1 Dispatch 2 Seq Fetch Fetch Seq Fetch Fetch Fetch Read ALU Write MDR Write ALU B Write ALU B ALUOut-cond Jump address SW2 Rformat1 Func code A BEQ1 JUMP1 Subt A Will two implementations of the same architecture have the same microcode? What would a microassembler do? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 151 Microinstruction format Field name ALU control Add Subt Func code PC A B 4 Extend Extshft Read Write ALU Register control Write MDR Value Signals active ALUOp = 00 ALUOp = 01 ALUOp = 10 ALUSrcA = 0 ALUSrcA = 1 ALUSrcB = 00 ALUSrcB = 01 ALUSrcB = 10 ALUSrcB = 11 Comment Cause the ALU to add. Cause the ALU to subtract; this implements the compare for branches. Use the instruction's function code to determine ALU control. Use the PC as the first ALU input. Register A is the first ALU input. Register B is the second ALU input. Use 4 as the second ALU input. Use output of the sign extension unit as the second ALU input. Use the output of the shift-by-two unit as the second ALU input. Read two registers using the rs and rt fields of the IR as the register numbers and putting the data into registers A and B. Write a register using the rd field of the IR as the register number and the contents of the ALUOut as the data. Write a register using the rt field of the IR as the register number and the contents of the MDR as the data. Read memory using the PC as address; write result into IR (and the MDR). Read memory using the ALUOut as address; write result into MDR. Write memory using the ALUOut as address, contents of B as the data. Write the output of the ALU into the PC. If the Zero output of the ALU is active, write the PC with the contents of the register ALUOut. Write the PC with the jump address from the instruction. Choose the next microinstruction sequentially. Go to the first microinstruction to begin a new instruction. Dispatch using the ROM 1. Dispatch using the ROM 2. SRC1 SRC2 Read PC Memory Read ALU Write ALU ALU PC write control ALUOut-cond jump address Seq Fetch Dispatch 1 Dispatch 2 Sequencing RegWrite, RegDst = 1, MemtoReg = 0 RegWrite, RegDst = 0, MemtoReg = 1 MemRead, lorD = 0 MemRead, lorD = 1 MemWrite, lorD = 1 PCSource = 00 PCWrite PCSource = 01, PCWriteCond PCSource = 10, PCWrite AddrCtl = 11 AddrCtl = 00 AddrCtl = 01 AddrCtl = 10 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 152 Maximally vs. Minimally Encoded No encoding: 1 bit for each datapath operation faster, requires more memory (logic) used for Vax 780 -- an astonishing 400K of memory! Lots of encoding: send the microinstructions through logic to get control signals uses less memory, slower Historical context of CISC: Too much logic to put on a single chip with everything else Use a ROM (or even RAM) to hold the microcode It's easy to add new instructions 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 153 Microcode: Trade-offs Distinction between specification and implementation is sometimes blurred Specification Advantages: Easy to design and write Design architecture and microcode in parallel Implementation (off-chip ROM) Advantages Easy to change since values are in memory Can emulate other architectures Can make use of internal registers Implementation Disadvantages, SLOWER now that: Control is implemented on same chip as processor ROM is no longer faster than RAM No need to go back and make changes 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 154 Historical Perspective In the `60s and `70s microprogramming was very important for implementing machines This led to more sophisticated ISAs and the VAX In the `80s RISC processors based on pipelining became popular Pipelining the microinstructions is also possible! Implementations of IA-32 architecture processors since 486 use: &quot;hardwired control&quot; for simpler instructions (few cycles, FSM control implemented using PLA or random logic) &quot;microcoded control&quot; for more complex instructions (large numbers of cycles, central control store) The IA-64 architecture uses a RISC-style ISA and can be implemented without a large central control store 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 155 Pentium 4 Pipelining is important (last IA-32 without it was 80386 in 1985) Control Control I/O interface Instruction cache Data cache Enhanced floating point and multimedia Chapter 7 Secondary cache and memory interface Integer datapath Control Chapter 6 Advanced pipelining hyperthreading support Control Pipelining is used for the simple instructions favored by compilers &quot;Simply put, a high performance implementation needs to ensure that the simple instructions execute quickly, and that the burden of the complexities of the instruction set penalize the complex, less frequently used, instructions&quot; 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 156 Pentium 4 Somewhere in all that &quot;control we must handle complex instructions Control Control I/O interface Instruction cache Data cache Enhanced floating point and multimedia Integer datapath Control Secondary cache and memory interface Advanced pipelining hyperthreading support Control Processor executes simple microinstructions, 70 bits wide (hardwired) 120 control lines for integer datapath (400 for floating point) If an instruction requires more than 4 microinstructions to implement, control from microcode ROM (8000 microinstructions) Its complicated! 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 157 Chapter 5 Summary If we understand the instructions... We can build a simple processor! If instructions take different amounts of time, multi-cycle is better Datapath implemented using: Combinational logic for arithmetic State holding elements to remember bits Control implemented using: Combinational logic for single-cycle implementation Finite state machine for multi-cycle implementation 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 158 Chapter Six 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 159 Pipelining Improve performance by increasing instruction throughput Program execution Time order (in instructions) 200 400 600 800 1000 1200 1400 1600 1800 lw $1, 100($0) Instruction R g e fetch lw $2, 200($0) lw $3, 300($0) AU L 800 ps Data access Rg e Instruction R g e fetch AU L 800 ps Data access Rg e Instruction fetch 800 ps Note: timing assumptions changed for this example Program execution Time order (in instructions) lw $1, 100($0) 200 400 600 800 1000 1200 1400 Instruction fetch Rg e AU L Rg e Instruction fetch Data access AU L Rg e Rg e Data access AU L Rg e Data access Rg e lw $2, 200($0) 200 ps Instruction fetch lw $3, 300($0) 200 ps 200 ps 200 ps 200 ps 200 ps 200 ps Ideal speedup is number of stages in the pipeline. Do we achieve this? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 160 Pipelining What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction We'll build a simple pipeline and look at these issues We'll talk about modern processors and what really makes it hard: exception handling trying to improve performance with out-of-order execution, etc. 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 161 Basic Idea IF: Instruction fetch ID: Instruction decode/ register file read EX: Execute/ address calculation MEM: Memory access WB: Write back Ad d 4 Shift let 2 f Read R a ed regist r 1 d t 1 e a a Read regist r 2 e Regst r i es Write Ra ed register dt 2 a a Wie r t data 1 6 2 Sign 3 et n xe d AD Add D result P C A de s drs Instruction Instruction memory Zr eo AU A U L L result A de s drs Ra ed data Data Mm r e oy Write data What do we need to add to actually split the datapath into stages? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 162 Pipelined Datapath IF/ID ID/EX EX/MEM MEM/WB Add 4 Shift left 2 Add Add result PC Address Read register 1 Instruction memory Read data 1 Read register 2 Registers Read Write data 2 register Write data Zero ALU ALU result Address Data memory Write data Read data 16 Sign extend 32 Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 163 Corrected Datapath IF/ID ID/EX EX/MEM MEM/WB Add 4 Shift left 2 Add Add result PC Address Read register 1 Instruction memory Read data 1 Read register 2 Registers Read Write data 2 register Write data Zero ALU ALU result Address Data memory Write data Read data 16 Sign extend 32 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 164 Graphically Representing Pipelines Time (in clock cycles) Program execution order (in instructions) lw $1, 100($0) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC7 IM Reg ALU DM Reg lw $2, 200($0) IM Reg ALU DM Reg lw $3, 300($0) IM Reg ALU DM Reg Can help with answering questions like: how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand datapaths 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 165 Pipeline Control PCSrc IF/ID ID/EX EX/MEM MEM/WB Add 4 Shift left 2 RegWrite PC Address Read register 1 Read data 1 Read register 2 Registers Read Write data 2 register Write data Write data Instruction (15D0) 16 Sign extend 32 6 ALU control MemWrite ALUSrc Zero Add ALU result MemtoReg Address Data memory Read data Add Add result Branch Instruction memory MemRead Instruction (20D16) Instruction (15D11) RegDst ALUOp 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 166 Pipeline control We have 5 stages. What needs to be controlled in each stage? Instruction Fetch and PC Increment Instruction Decode / Register Fetch Execution Memory Stage Write Back How would control be handled in an automobile plant? a fancy control center telling everyone what to do? should we use a finite state machine? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 167 Pipeline Control Pass control signals along just like the data Execution/Address Calculation Memory access stage stage control lines control lines Reg ALU ALU ALU Mem Mem Dst Op1 Op0 Src Branch Read Write 1 1 0 0 0 0 0 0 0 0 1 0 1 0 X 0 0 1 0 0 1 X 0 1 0 1 0 0 Write-back stage control lines Reg Mem to write Reg 1 0 1 1 0 X 0 X Instruction R-format lw sw beq WB Instruction M EX WB M WB Control IF/ID ID/EX EX/MEM MEM/WB 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 168 Datapath with Control PCSrc ID/EX WB EX/MEM WB Control M MEM/WB WB IF/ID EX M Add 4 Shift left 2 Add Add result ALUSrc Branch PC Address Read register 1 Instruction memory Read data 1 Read register 2 Registers Read Write data 2 register Write data Zero ALU ALU result Address Data memory Write data Read data Instruction [1500] Instruction [20016] Instruction [15011] 16 Sign extend 32 6 ALU control ALUOp MemRead RegDst 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 169 Dependencies Problem with starting next instruction before first is finished dependencies that &quot;go backward in time&quot; are data hazards Value of register $2: Program execution order (in instructions) sub $2, $1, $3 IM Reg DM Reg Time (in clock cycles) CC 1 CC 2 10 10 CC 3 10 CC 4 10 CC 5 10/020 CC 6 020 CC 7 020 CC 8 020 CC 9 020 and $12, $2, $5 IM Reg DM Reg or $13, $6, $2 IM Reg DM Reg add $14, $2, $2 IM Reg DM Reg sw $15, 100($2) IM Reg DM Reg 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 170 Software Solution Have compiler guarantee no hazards Where do we insert the &quot;nops&quot; ? sub and or add sw $2, $1, $3 $12, $2, $5 $13, $6, $2 $14, $2, $2 $15, 100($2) Problem: this really slows us down! 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 171 Forwarding Use temporary results, don't wait for them to be written register file forwarding to handle read/write to same register ALU forwarding Time (in clock cycles) CC 1 CC 2 Value of register $2: 10 10 Value of EX/MEM: X X Value of MEM/WB: X X Program execution order (in instructions) sub $2, $1, $3 IM Reg DM Reg CC 3 10 X X CC 4 10 020 X CC 5 10/020 X 020 CC 6 020 X X CC 7 020 X X CC 8 020 X X CC 9 020 X X and $12, $2, $5 IM Reg DM Reg or $13, $6, $2 IM Reg DM Reg add $14,$2 , $2 IM Reg DM Reg sw $15, 100($2) IM Reg DM Reg what if this $2 was $13? 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 172 Forwarding The main idea (some details not shown) ID/EX M u x Registers ForwardA M u x ALU EX/MEM MEM/WB Data memory M u x ForwardB R s R t R t R d EX/MEM.RegisterRd M u x Forwarding unit MEM/WB.RegisterRd 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 173 Can't always forward Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register. Time (in clock cycles) CC 1 CC 2 Program execution order (in instructions) lw $2, 20($1) IM Reg DM Reg CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 and $4, $2, $5 IM Reg DM Reg or $8, $2, $6 IM Reg DM Reg add $9, $4, $2 IM Reg DM Reg slt $1, $6, $7 IM Reg DM Reg Thus, we need a hazard detection unit to &quot;stall&quot; the load instruction 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 174 Stalling We can stall the pipeline by keeping an instruction in the same stage Time (in clock cycles) CC 1 CC 2 CC 3 Program execution order (in instructions) lw $2, 20($1) IM Reg DM Reg bubble and becomes nop IM Reg DM Reg CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 CC 10 add $4, $2, $5 IM Reg DM Reg or $8, $2, $6 IM Reg DM Reg add $9, $4, $2 IM Reg DM Reg 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 175 Hazard Detection Unit Stall by letting an instruction that won't write anything go forward Hazard detection unit ID/EX.MemRead ID/EX WB Control IF/ID 0 M u x M EX EX/MEM WB M MEM/WB WB M u x Registers ALU PC Instruction memory M u x Data memory M u x IF/ID.RegisterRs IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRd ID/EX.RegisterRt Rs Rt Forwarding unit Rt Rd M u x 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 176 Branch Hazards When we decide to branch, other instructions are in the pipeline! Time (in clock cycles) CC 1 Program execution order (in instructions) 40 beq $1, $3, 28 IM Reg DM Reg CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 44 and $12, $2, $5 IM Reg DM Reg 48 or $13, $6, $2 IM Reg DM Reg 52 add $14, $2, $2 IM Reg DM Reg 72 lw $4, 50($7) IM Reg DM Reg We are predicting &quot;branch not taken&quot; need to add hardware for flushing instructions if we are wrong 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 177 Flushing Instructions IF.Flush Hazard detection unit ID/EX WB Control 0 IF/ID + 4 Shift left 2 M u x ALU M u x Sign extend Data memory M u x + M u x M EX EX/MEM WB M EX/MEM WB M u x PC Instruction memory Registers = M u x Fowarding unit Note: we've also moved branch decision to ID stage 2004 <a href="/keyword/morgan-kaufmann/" >morgan kaufmann</a> Publishers 178 Branches If the branch is taken, we have a penalty of one cycle For our simple design, this is reasonable With deeper pipelines, penalty increases and static branch prediction drastically hurts performance Solution: dynamic branch prediction Taken Not taken Predict taken Ta...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Arizona - BME - 517
Keypoints, noise lecture 1. Know ten different types of noise, explain whether they are fundamental or added (external) noise. 2. Know the types of noise most likely to affect your analog ECG signal. 3. Know what factors white noise depends upon. 4.
Arizona - BME - 517
Keypoints, sampling theory lecture1. Know each of the steps involved in going from a continuous time, continuous amplitude signal to a digital output. 2. Know each of the steps in reconstruction and know what errors occur in the process of sampling
Washington - POLS - 398
Empire and Multilateralism: Maintaining Client States During Imperial DeclineDavid Sylvan Graduate Institute of International Studies, Geneva sylvan@hei.unige.ch Stephen Majeski University of Washington majeski@u.washington.eduPaper prepared for p
Washington - MENGR - 559
ME 559 Intro to Fracture MechanicsCourse Coordinator: Paul E. Labossire, Department of Mechanical Engineering, MEB 209, (206) 5435710, labossie@u.washington.edu, Office Hours: see website Course website: http:/courses.washington.edu/mengr559/labo
Washington - EHUF - 331
BOT/ESRM 331, Spring 2008 Landscape Plant Recognition Instructors: John Wott Katie Barndt Teaching Assistant: Patrick Schwartzkopf jwott@u.washington.edu, (206) 543-8602 35 Merrill Hall, Center for Urban Horticulture kbarndt@u.washington.edu schwap@u
East Los Angeles College - GEOG - 5041
Geog5041M Advanced Proprietary GIS Unit 2 PracticalArcCatalog IntroductionThis practical is designed to give you some practical experience in using ArcCatalog.ObjectivesBy the end of this practical session you should be able to use ArcCatalog to
Washington - STAT - 593
Lecture 1Markowitz / Mean-Variance Optimal Portfolios and Associated Computation Portfolio Investment Returns Risk Versus Return Trade-Offs Portfolio Mean and Variance The Optimality Problem05/16/09 Copyright R. Douglas Martin 1The Investment
Washington - STAT - 593
LECTURE 13 DISCRETE PORTFOLIO OPTIMIZATION PROBLEMS PORTFOLIO RESAMPLING (Michaud method) RESAMPLED EFFICIENCY (also due to Michaud) PROPER BOOTSTRAP RESAMPLING05/16/09Copyright R. Douglas Martin1DISCRETE PORTFOLIO OPT. PROBLEMS(Scherer,
Washington - STAT - 593
LECTURE 8 REVIEW MCTR/PCTR AND S-PLUS CODE IMPLIED RETURNS AND S-PLUS CODE ADVANCED PORTFOLIO OPTIMIZATION OVERVIEW INTRO. TO USING SIMPLE FOR ADVANCED PORTFOLIO OPTIMIZATION05/16/09Copyright R. Douglas Martin1MCTR/PCTR AND S-PLUS CODEDi
Washington - STAT - 593
LECTURE 7 PORTFOLIO RISK AND COVARIANCES GENERAL BETA'S MARGINAL CONTRIBUTION TO RISK IMPLIED RETURNS05/16/09Copyright R. Douglas Martin1PORTFOLIO RISK AND COVARIANCESDecomposition of portfolio variance:2 P = ww=i, jwi w j cov(ri
Washington - STAT - 593
STAT 593BMay 8, 2003MIDTERM EXAMClosed Book1. Sketch the trajectories of the efficient frontiers for the case of two risky assets that have the following values of correlations between the returns of the two assets: (a) = +1 , (b) = 0 , (c)
Washington - STAT - 593
STAT 593BSpring 2003HOMEWORK #3Due Thursday, May 1 Reading Lecture Notes for weeks three and four Scherer (2002) - Chapter 5 (skip page 139 through end of section 5.1, and skip 5.2.1) - Chapter 6.1 and 6.2 - Chapter 1.1.3. Scherer (2003), Nuop
Washington - STAT - 593
STAT 593BSpring 2003HOMEWORK #2Due Thursday, April 16 Reading Lecture Notes handouts Scherer (2002), sections 1.2.1, 1.3, 1.5, Appendix A The following sections of Scherer (2003), Nuopt for S-PLUS (1st Draft) - 1.2.1 (Classical Markowitz) - 1
Washington - STAT - 593
STAT 593BSpring 2003HOMEWORK #4Reading (Due Wed., June 4) Lecture Notes as needed Scherer (2002): Chapter 4.1 and 4.2, re-read Chapter 5.5 Scherer (2003): Chapter 2.1, 2.2 and 2.4 (skip 2.4.3), Chapter 3.1.1 and 3.6. Recommend skimming 3.4-3
Washington - STAT - 593
STAT 593BSpring 2003HOMEWORK #3Due Thursday, May 1 Reading Lecture Notes for weeks three and four Scherer (2002) - Chapter 5 (skip page 139 through end of section 5.1, and skip 5.2.1) - Chapter 6.1 and 6.2 - Chapter 1.1.3. Scherer (2003), Nuop
Arizona - ECOL - 320
134Budding Yeast (Saccharomyces cerevisiae) as a Model (Pg739-753. Om 747-749, &quot;m it ating typeswitching&quot;) Budding ye ast Hum ce an lls.~25% of human disease genes have ortholog in yeastOrthologs in ye XP, RAS C r ge s ast: - ance ne Orthologs
Illinois State - ITK - 327
Chomsky HierarchyScanner Parser Regular Expressions (type-3) Context-free languages (type-2) Context-sensitive languages (type-1)We don't need them all for PLComputable (formal) languages (type-0) Type-3 Type-2 Type-1 Type-0 The inclusions ar
Washington - ANTH - 313
Anthropology 313 Autumn 2006 Instructor: Megan Styles Midterm Essay Assignment: Representations of Conflict &amp; Child Soldiering in Africa In The Social and Cultural Context of Child Soldiering in Sierra Leone, ethnographer Susan Shepler argues that it
Washington - PBAF - 531
Douglass C. NorthThe Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 1993 Prize Lecture*Lecture to the memory of Alfred Nobel, December 9, 1993 Economic Performance through TimeIEconomic history is about the performa
Washington - POLS - 361
1Making Sense of Casey and Its Impact. Court says it is &quot;reaffirming&quot; Roe. But Court upholds every provision of the Pennsylvania Law except spousal consent. Every provision that the Court upheld would have been unconstitutional under the Roe sta
Washington - ART - 131
Art 131Project #1The Comic Strip Design a comic strip depicting a funny or annoying event that has happened to you. Specifications:-Use any number of frames between 5 and 10 to tell this event. The size, shape and layout of these frames are y
Arizona - Q - 208
LBT Q2 2008 Engineering ReviewSwing Arms (SWA)2008-Jul-21LBT Q2 2008 Engineering ReviewAll swing arms now commissioned with action itemsHighlights (Q1/2008)2008-Jul-21LBT Q2 2008 Engineering ReviewProgress (Q2/2008)From Plannin
Arizona - Q - 208
LBT Q2 2008 Eng/SW ReviewCIN/NIN -Computer and Network InfrastructureProgressBackup tape system for the mountain Complete for Windows. Linux backups are taking too long and interfere with observing. We are working on a solution for this
Arizona - Q - 307
LBT Q3 2007 Engineering ReviewHBSC Hydrostatic Bearing SystemHighlights (Q3/2007)WinCC (engineering diagnostic) can log data All 8 lateral pads modified to reduce oil consumption, and installed26-Oct-2007LBT Q3 2007 Engineering Review
UWO - CS - 9843
Map ReduceFunctional Programming Review Functional operations do not modify data structures: They always create new ones Original data still exists in unmodified form Data flows are implicit in program design Order of operations does not ma
Arizona - P - 20011102
Ground Control Points:GCP #13541.473983-2322.54385603989.99893548099.9981279Control0.135569132-0.0931921740.1645107020.408362489GCP #23449.491717-2216.486064601229.84843551280.1781295Control-0.131276598-0.0981664740.1639213280.
Arizona - SIE - 554
University of Arizona Department of Systems and Industrial Engineering Tucson, Arizona Dr. Terry BahillDesigned for SIE554a in the Fall of 2006 by: Isis RocheRos Mamadou Barro Jennifer LyallWilson Mark Fischer December 6, 2006INTELLIGENT BATTING
Arizona - MIS - 696
Question One Module 1: Data Base M1.1 It is claimed that object-oriented databases allow the inclusion of semantic information, which cannot be represented in relational databases. It is also claimed that the Extended Relational database model allows
McGill - C - 212
Depositional EnvironmentsFrancis, 2009Deposition occurs when fluid velocity decreasesBedforms also change systematically with the velocity of fluid flowFacies Models for Depositional EnvironmentsSedimentary Facies: Lithology or group of litho
W. Kentucky - CSC - 362
CSC 362 Programming Assignment #3 Due Date: Thursday, March 19 In this assignment, you are to create a program that will encrypt and decrypt messages. The encryption code is a simple rotation algorithm, adding or subtracting an int value to each lett
W. Kentucky - CSC - 425
CSC 425/525 Homework #5 (Chapter 8 and 9) Due: Monday, March 23 Word process all answers. Figures may be hand drawn. Undergraduates answer four of the five questions, all five questions for extra credit. Graduate students answer all five questions. 1
Arizona - MIS - 440
MIS 440 Midterm examinationDecember 20th, 2002Name: _ You have the remainder of the class period to complete the exam. You may refer to your books, papers and notes but no other sources. You may not, however, confer with anyone else regarding the
W. Kentucky - CIT - 470
CIT 470: Advanced Network and System AdministrationDocumentationCIT 470: Advanced Network and System AdministrationSlide #1Topics1. 2. 3. 4. Why document How to document External documentation Man pagesCIT 470: Advanced Network and System A
W. Kentucky - CIT - 470
CIT 470: Advanced Network and System AdministrationAccounts and NamespacesCIT 470: Advanced Network and System AdministrationSlide #1Topics1. 2. 3. 4. 5. Namespaces Policies: selection, lifetime, scope, security User Accounts Directories LDAP
ASU - CSE - 520
Reducing DRAM Latencies with an Integrated Memory Hierarchy DesignAuthors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University of Texas at AustinPresentation byPravin Dalale1OUTLINE Motivation Main idea in t
ASU - CSE - 520
Enforcing Performance Isolation Across Virtual Machines in XenAuthors: Diwakar Gupta, Ludmila Cherkasova, Rob Gardner, and Amin Vahdat Middleware 2006, LNCS 4290, pp. 342-362, 2006 Presenter: Venkatraghavan RameshAgenda Motivation XenMo
ASU - CSE - 434
CSE434/598 SP2006Homework AssignmentArizona State UniversityCSE 434/598 Computer Networks Homework Assignment 2 (10 Points) Due on 01/31/2006 at the start of the class. No late submissions will be accepted. No plagiarism. Submit a hardcopy of y
ASU - CSE - 434
CSE434/598 SP2006Homework AssignmentArizona State UniversityCSE 434/598 Computer Networks Homework Assignment 3 (10 Points) Due on 02/07/2006 at the start of the class. No late submissions will be accepted. No plagiarism. Submit a hardcopy of y
ASU - APH - 294
Culture of PlaceSummer I 2009Professor: Email: Office: Office Hours: Kim Steele kim.steele@asu.edu CDS 311 By AppointmentCOURSE OUTLINECulture of Place explores contemporary conditions in the built environment as they are shaped by cultural phen
ASU - FMS - 468
FMS 468 Crime and Violence in American FilmFall, Session 1, 2008Professor: Email: Office: Office Hours: Dr. Aaron Baker Aaron.Baker@asu.edu LL-645 By AppointmentCourse Description: Crime and violence have been central elements of American cinema
ASU - HST - 315
American National Biography Online Colquitt, Alfred Holt (20 Apr. 1824-26 Mar. 1894), Confederatemilitary officer and politician, was born in Walton County, Georgia,the son of Walter T. Colquitt,
ASU - CSE - 536
CSE 536ADVANCED OPERATING SYSTEMS PRELIMINARY COURSE OUTLINERevised March 22, 2005SPRING 2005WK1 1,2 2,3DATESJan 19 Jan 19 Jan 26 Jan 26 Feb 2 Feb 9TOPICSCourse Introduction File-System Interface File concept, access methods, directory
ASU - CSE - 591
AnnouncementsqTurn in HW#1 Reading Today 2.1, 2.2 (parallel application example) Next class: 4.1qCSE 591 (lect 4)1Application Example - WeatherqTypical of many scientific codes computes results for three dimensional space compute re
ASU - EEE - 241
ELECTROSTATICS Test 1Outlinectric Force Ele , ctric fie lds Ele ctric Flux and Gau law Ele ctric pote ntial Ele apacitors and die ctric (Ele le ctric storage ) CCoulomb lawThe physics of charged objectstudy of electricity aim to unde s rs
ASU - CSE - 591
Final Exam ReviewCSE 591 (Advanced Topics on Parallel and Distributed Systems) I. Time: 12/17 (Tue): 10:00 11:50 am, Closed Book II. Covered Materials Papers Homework and Projects Midterm Exam QuestionsIII. Review 1. Parallel System Architectu
ASU - CSE - 574
Homework 1. Assigned [Sep 9, 2004]Due [Sep 20, 2004]Qn I. Consider the following &quot;artificial&quot; planning domain, whichcontainsthe (artificial) operators described below: operator O1prec: PEff: R, ~S operator O2prec: QEff: S operator O
ASU - CSE - 494
9/7 Agenda Project 1 discussion Correlation Analysis PCA (LSI)9/7 Agenda Project 1 discussion Correlation Analysis PCA (LSI)The first rule of the social fabric that in times of crisis you protect the vulnerable was trampled. David Brook
ASU - CSE - 565
CSE565 Fall 2003Oct 1st, 03Announcements Mid term exam-1: October 8, 2003 Homework distribution for the rest of the semester: Implementation-2 weeks Verification patterns (black box testing)-2 weeks Safety testing-2 weeks Reliability testin
Colorado - SYST - 4050
Supply Chain ManagementLecture 13Outline Today Homework 3 Chapter 8 (up to Section 8.3) Midterm review Thursday MidtermHomework 3 Q1 Who should Unipart consider? Parts4U (5% commission) AllMRO ($10 million contract and 1% commission)t
Colorado - SYST - 4050
Supply Chain M anagementLecture 15Outline Today Chapter 8 Next week Chapter 9 Chapter 10 (10.1, 10.2, 10.6) Homework 4I nputs of an Aggr egate Plan Demand forecast in each period Production costs labor costs, regular time ($/hr) and ov
Colorado - SYST - 4080
Work Breakdown Structure and Activity List WorkshopsInspiring Your Next Success!Company Confidential - Copyright 2008 Hitachi ConsultingToday's Agenda&gt; Introductions &amp; Project Charter Review (15 minutes) &gt; Converting Project Charters into Work
Colorado - ACCT - 5240
&lt;http:/ads.businessweek.com/event.ng/Type=click&amp;FlightID=$FlightID$&amp;AdID=$AdID$&amp;TargetID=$TargetID$&amp;Segments=$Segments$&amp;Targets=$Targets$&amp;Values=$Values$&amp;RawValues=$RawValues$&amp;Redirect=http:%2f%2fwww.businessweek.com/newsletters.htm&gt;Business Week O
Colorado - BCOR - 4001
Chapter 2 (continued)Accounting Under Ideal Conditions (Continued) Power point lesson #3Copyright 2009 by Pearson Education Canada2-1Chapter 2 Accounting Under Ideal ConditionsCopyright 2009 by Pearson Education Canada2-2Relevance v.
Colorado - BCOR - 4001
Chapter 2Accounting Under Ideal Conditions Power point lesson #2Copyright 2009 by Pearson Education Canada2-1Chapter 2 Accounting Under Ideal ConditionsCopyright 2009 by Pearson Education Canada2-22.2 Ideal Conditions of Certainty As
Colorado - BCOR - 4001
Financial Accounting TheoryFifth EditionWilliam R. ScottPurpose: To create an awareness and understanding of the financial reporting environment in a market economyCopyright 2009 by Pearson Education Canada1-1Chapter 1 IntroductionCopyr
Colorado - CSCI - 5582
CSCI 5582 Artificial IntelligenceLecture 7 Jim Martin05/16/09CSCI 5582 Fall 20061Today 9/19 Review (and finish) search Break Game Playing Search05/16/09CSCI 5582 Fall 20062Review Optimization/Local Search Constraint Satisfaction
Colorado - ASEN - 5190
Results From Our Antenna vs. Garmin Antenna
Colorado - PHYS - 2130
Who I am: Dan Dessau Dessau@Colorado.edu Prof. Shepard will be back on Wed.Chapter 8. The 3D Schrodinger Equation 2 2 ( x) + V ( x) ( x) = E ( x) In 1D: - 2m x 2 2 2 2 + 2 ( x, y ) + V ( x, y ) ( x, y ) = E ( x, y ) In 2D: - 2m x y
ASU - EEE - 241
Today's agenda: Electric Current. Current Density.You must know the definition of current, and be able to use it in solving problems.You must understand the difference between current and current density, and be able to use current density in sol
ASU - EEE - 598
Multi-Grid MethodDragica VasileskaComplexity of linear solversTime to solve model problem (Poisson's equation) on regular mesh Sparse Cholesky: CG, exact arithmetic: CG, no precond: CG, modified IC: CG, support trees: n1/2 n1/32D O(n1.5 ) O(n2
ASU - MAT - 119
Arizona State University Department of Mathematics &amp; Statistics Spring 2005 MAT 119 Exam 3 Page 1 of 7MAT 119 Ryan MelendezExam #3 Directions: Attach name label hereSpring 2005Your exam consists of 13 numbered questions totaling 100