Chapter8 - Chapter 8: Registers and MIPS Chapter IPS...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Chapter 8: Registers and MIPS Chapter IPS Assembly Language (MAL) ssembly EEC 70 Fall 2010 Professor Wilken 1 Instructions Each machine instruction is represented by some number of bits some A field of bits is used to represent each part of an instruction of add A,B,C • Different binary codes, called ‘opcodes’, represent Different represent each operation (add, sll, beq, etc.) each An opcode is much like an ASCII code: the set of bits don’t have An have numeric value, they just act as a symbol numeric • Opcode size >= log2(number of operation types) There are somewhat more than 32 SAL/MIPS There operations, => 6-bit opcode, or about 1 byte. operations, • The instruction includes the addresses of the The operands (B,C) and the address of the result (A) operands 2 Instructions (cont.) • Each memory location is one byte (8 bits). • An instruction memory layout might be: 100 101 102 103 104 105 106 107 108 109 110 111 112 add opcode address of result A address of operand B address of operand C 3 Instruction Execution Processor executes an instruction in 5 phases: Processor • Fetch instruction (read opcode from memory) • Decode instruction (what opcode? what operation to Decode do?) do?) • Read Operands (read operand bytes from memory) • Perform operation (e.g., add) • Store result (write result bytes to memory) How long does instruction execution take? How • Lets assume: 3GHz processor (3 billion clock cycles/sec, or 1/3 ns per clock cycle) 3GHz 33ns memory access time (100 clock cycles) 33ns processor can read/write at most 4 bytes at a time processor 4 Instruction Execution (cont.) • Fetch Instruction: requires 4 memory accesses, = 4 x requires 100 = 400 clock cycles. 100 Decode: easily done in 1 clock cycle (see EEC170) • • Read Operands: requires 2 memory accesses = 2 x 100 requires = 200 clock cycles 200 • Operation: can be done in 1 clock cycle (see EEC180A, can EEC170) EEC170) • Store result: requires 1 memory access = 100 clock requires cycles cycles Net result: instruction takes 702 clock cycles. Much too slow. Much too What’s the problem? Too many slow memory What the accesses! 5 Faster Instruction Execution How can be make instruction execution faster? faster? • Use faster memory • Reduce the size of memory addresses (less Reduce to transfer from memory during instruction fetch) fetch) • Transfer 2 or more words from memory at Transfer once once 6 Registers Nearly all processors use a register file register Register file is a small, fast memory that sits between main memory (Random Access Memory, (RAM)) and the Access )) Arithmetic and Logic Unit (ALU): Arithmetic ): Processor Chip ALU ALU Reg. File Main Mem. 7 Registers (cont.) Register file is an integer array that has its own address space: address • add r[4],r[3],r[6] Processors use <= 256 registers, so a register address is 8 bits or less address 100 add opcode add • Allows instructions to be compact r[4] address 101 • Reduces example instruction fetch cycle count to 100 down from 400 102 103 r[3] address r[6] address Because register file is small and Because near the ALU, it can be accessed in 1 clock cycle • suggests execution time for add r[4],r[3],r[6] is suggests 100 cycles for instruction access, 1 for decode, 2 for operand fetch, 1 for operation, 1 for storing result = 105 cycles total, much better!! 8 Dual Ported Register File Register files are designed to read both operands at once, saves one cycle, down to 104: to Processor Chip 2 ALU Reg. File 1 9 Caches Processor chip includes fast on-chip Processor chip memory called a cache cache • Cache holds copies of recently used Cache instructions or data for fast access instructions • Memory access is slow only for first access, Memory e.g., first loop iteration for instruction in a loop loop Processor Chip ALU Reg. File Data Cache Main Memory Instr. Cache 10 Performance with Cache Assume Icache access is 1/3 ns and the cache instruction the processor wants is always in the cache in • Total execution time is now down to 5 cycles Total for each instruction, one cycle for each phase for By overlapping instruction execution (pipelining), execution time can get down ), to one cycle per instruction (see EEC170) to • A pipeline is an assembly line for instruction pipeline execution execution 11 Load/Store Architecture It is necessary to move variables between memory and register file MIPS Assembly Language does this with Load Word (lw) and Store Word (sw) Load instructions: instructions: lw r[7], M[12345] sw r[3], M[67890] MIPS only allows memory access using lw and sw. All other instruction (e.g., lw and sw All add, sll, beq, etc.) must access variables from registers 12 Load/Store (cont.) Thus the SAL instruction add A,B,C must be synthesized with a sequence of MIPS Assembly instructions: MIPS lw r[8], B lw r[9], C add r[4],r[8],r[9] sw r[4], A But this is slow: many instruction fetches, many memory accesses for data! fetches, 13 Register Assignment We would like to assign variables to registers so that the variable never goes never goes to memory: to add A,B,C add E,D,A by assigning variable A to $5 (MIPS syntax for by r[5]), A is born and spends its entire life in the born and life in register file. Thus A is never involved in a (slow) memory access add $5,B,C add add E,D,$5 add 14 Register Spilling What if there are more variables than there are registers? registers? • Some variables must be ‘spilled’ to a home location in Some to memory, reloaded into the register file when needed later later Assume a processor with only two registers: A = __ B = __ C = __ __ = B __ = C __ __ = A __ $1 = __ sw $1,home_A $1 = __ $2 = __ __ = $1 __ = $2 lw $2, home_A __ = $2 Assign B to $1, C to $2 => A must spill, Cannot fit all three variables! 15 MAL Instructions: Load All MAL instructions are 1 word (32 bits) All • Easy, fast to determine location of next instruction How can be we fit lw $2,A into 32 bits, when the memory address requires 32 bits?! the • Register indirect addressing store the memory address in a register store la $1, A lw $2, ($1) lw # ($1) means use contents of $1 as an address. ($1) # load data from that address • Often, we have an address and want to load various Often, data in the neighborhood of that address: data int1 int2 int3 int4 16 MAL Instructions: Load (cont.) • Displacement Addressing Memory address is computed by adding a small constant within the instruction to the contents of a register. The data at the computed address is loaded loaded la $1, int1 lw $2, 0($1) lw $3, 4($1) lw $4, 8($1) lw $5, 12($1) # load M[$1+0], i.e., load int1 # load M[$1+4], i.e., load int2 # load M[$1+8], i.e., load int3 # load M[$1+12], i.e., load int4 17 MAL Instructions: Load (cont.) MIPS Architecture uses 6-bit opcodes MIPS MIPS Architecture has 32 registers => 5 bits for a register specifier register lw instruction format: 6 bits for opcode, 5 bits lw instruction for destination register specifier, 5 bits for base address register specifier, remaining 16 bits for address offset: address 31 26 25 21 20 16 15 opcode rd rbase 0 address offset Offset is two’s complement, so we can access Offset complement, for -215 to +215-1 from the address in rbase. 18 Integer Register Usage MIPS has 32 registers. Some (by software convention) have special purposes. Avoid using them in your MIPS programs for now: them • $0 always contains the value 0, the most often used $0 value. Allows various instructions to be synthesized. value. TAL has no move instruction, instead we can use add: TAL has move add $10,$11,$0 # move $11 into $10 • $1 is used by the assembler (for us by SPIMSAL). • $2-$7 are used for subroutine calls and returns when $7 • • • functions are assembled or compiled separately. functions $26-27 are used by the operation system $29 is a stack pointer for “system stack” which is $29 which created and managed by the compiler (used for subroutine calls among other things) subroutine $31 is used for subroutine calls 19 MAL Instruction Types There are 4 basic instruction types: There • Load and Store • Integer Arithmetic and Logical • Branches • floating point arithmetic There are two separate register files: There • integer registers ($0-$31) • floating point registers ($f0-$f31) Data is typeless in memory, becomes typed only when loaded to one of the register files when Separate instructions specify integer and FP operations: add $10,$11,$12 operations: add.s $f10,$f12,$f14 20 Load/Store Types MIPS used separate instructions for loading and storing words (lw, sw), for bytes (lb, sb), and for floating point (lwc1). floating • Integer loads/stores go to/from an integer register lw $10,4($11) lw sw $10,8($11) sw • Byte loads/stores go to/from an integer register lb $12,1($13) lb sb $12,2($13) sb For lb, byte is placed in register’s lower 8 bits. For Byte’s MSB is sign extended, thus byte is converted into Byte MSB 32-bit integer. 32 bit Can use usual integer arithmetic/logical instructions to operate on bytes operate For sb, the register’s lower 8 bits are copied to For lower designated memory location designated 21 Load/Store Types (cont.) • Floating point loads/stores go to/from a floating point Floating register however, base address is an integer, comes from integer register register MIPS considers the floating point unit as a coprocessor (helper processor), designated as coprocessor #1 (helper lwc1 $f1,4($12) lwc1 swc1 $f2,8($13) swc1 • Separate instruction is used to move words between Separate floating point registers and integer registers floating Move From Coprocessor 1 to integer register: Move mfc1 $12, $f1 Move To Coprocessor 1 from integer register: Move mtc1 $f1, $12 22 Load/Store Assembly Instructions Similar MAL instructions for lw/sw, lb/sb, lwc1/swc1. Will only show lw here: lwc1/swc1. • rt = target register; rb = base-address register target rb • llw rt, label # put word at addr. label iin register rt w label n rt • llw rt, (rb) # put word at addr. (rb) in register rt w (rb) • llw rt, x(rb) # put word at addr. x+(rb) iin register rt w x+(rb) n rt note that lw rt, (rb) is the same as lw rt, 0(rb). note note that lw rt, label is the same as note la rb, label lw rt, 0(rb) 23 MAL Branch Instructions All branch instructions for MAL are just like those in SAL, expect that operands are registers rather than variables: rather • SAL: beq a, b, loop • MAL lw $10, a lw $11, b ... beq $10, $11, loop Branch instruction format includes opcode, registers and an address offset field registers • Offset is the distance from the current program Offset counter (PC) to the branch target. Calculated by the assembler assembler 31 26 25 21 20 16 15 beq rs rt 0 branch offset 24 Jump Instruction jump instruction, j , is similar to is unconditional branch, b , but provides but more bits to specify the target more j 6 target address 26 25 Larger Conditional Branch Range Conditional branch address offset is two’s Conditional complement 16-bit field allows only -215 to complement bit to 215-1 branch range Longer conditional branch can be synthesized using 2 instructions: short: short: beq $16,$17, branch_target <next> long: bne $16,$17 next j branch_target <next> 26 MAL Arithmetic/Logical Instructions Much like SAL equivalents, except all operands are in registers operands • SAL: add c, a, b • MAL: add $10, $11, $12 • MAL has separate instructions for MAL arithmetic/logic with a constant arithmetic/logic constant is called “immediate data”, is part of the is 32-bit instruction 32 addi $10, $11, 24 constant is a 16-bit integer in two’s complement constant complement representation representation 31 26 25 21 20 16 15 addi rs rd 0 immediate data 27 16-Bit Immediate Operands 16-bit or smaller immediate data are the most 16 bit common common • Architecture concept: make the common case fast 60 GCC Spice TeX 50 40 30 20 10 0 0 4 8 12 16 20 24 Bits Needed for Immediate Value 28 32 What about constants that are larger than 16 bits? bits? 28 Synthesizing Large Constants Synthesizing RISC fixed-length instructions do not allow large RISC length immediate constants (e.g., 32 bits) immediate MIPS uses special instruction in a two-instruction MIPS instruction sequence to create constants > 16 bits (uncommon case) case) • Load Upper Immediate (lui): (lui): lui 6 N/A rd immediate data 5 5 16 • Sets rd upper 16 bits to immediate data, lower 16 bits to 0s Sets rd • 32 bit constant: lui $15,1234H lui addi $15,$15,5678H $15,$15,5678H # $15 = 0x12340000 $15 # $15 = 0x12345678 $15 29 Load Address (la) Instruction Load address instruction is an example of creating a 32-bit constant creating • la $10, label => put the 32-bit constant address bit of label into $10 of Assembler translates la into two la into instructions: instructions: lui $10,<upper 16 bits label> addi $10,$10,<lower 16 bits> addi 30 MIPS Instruction Format Only three instruction formats (small is fast) Only All instructions 32 bits (simplicity) All Most field types at fixed position for fast processing processing Immediate Type: includes addi, lw, branches op 6 rs1 5 immediate rs2/rd 5 16 Jump Type op displacement 6 26 Register Type: alu instructions op rs1 rs2 rd 6 5 5 5 5 6 31 MAL Example swap: integers v[k] and v[k+1] temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; lw lw sll $10, k $10, $12, $10, 2 la la add $14, v $14, $12, $14, $12 lw lw $15, 0($12) $16, 4($12) sw sw $16, 0($12) $15, 4($12) # # # # # # # # # # # get the array index convert to byte offset reg $12 = k * 4 get base address of array reg $12 = v + k*4 reg $12 has the addr. of v[k] reg $15 (temp) = v[k] reg $16 = v[k+1] Refers to next element of v v[k] = reg $16 v[k+1] = reg $15 32 ...
View Full Document

Ask a homework question - tutors are online