HW SLNS

HW SLNS - 313 Chapter 15 Exercise 15.1 Addr Enable Rd Data...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 313 Chapter 15 Exercise 15.1 Addr Enable Rd Data Rdy t=0 t=65ns 5ns 30ns 50ns 10ns Figure 15.1: Single memory read - Exercise 15.1 Rdy becomes active at time t = 55 ns. The enable signal may become low at time t = 65 ns. For a sequence of read operations, one read may follow the other keeping enable = 1 and issuing one Rd pulse for each new access. Addr may change after Rdy = 1, so the timing diagram presented in Figure 15.2 is possible. The cycle time for this case is 55 ns and the number of memory accesses per second is given as: 1 55 10;9 18:2 106 memory accesses/sec Addr 5ns Rd Data Rdy t=0 30ns 50ns Figure 15.2: Multiple read access - Exercise 15.1 314 Exercise 15.2 Solutions Manual - Introduction to Digital Design - December 29, 1999 The format for the three instructions used in this exercise is shown in Figure 15.3. 31 25 RT 25 RT 25 --20 --20 RA 15 D 20 RA 15 D 0 15 RB 10 ---0 0 add Opcode 31 ldw Opcode 31 brn Opcode Figure 15.3: Instruction format - Exercise 15.2 add R5,R7,R9 | add RT,RA,RB Add opcode = 010000 Using the instruction format shown in the Figure we obtain: 010000 00101 00111 01001 00000000000 ldw R7,1200(R8) | ldw RT,D(RA) Ldw opcode = 100001 Using the instruction format shown in the Figure we obtain: 100001 00111 01000 0000010010110000 brn 1000 | brn D Brn opcode = 110001 Using the instruction format shown in the Figure we obtain: 110001 00000 00000 0000001111101000 Exercise 15.3 Consider these registers' contents: PC R1 R5 R7 for the execution of the following instructions: = = = = 5321 10 ;1250 ;110 (a) stw R5, 1300(R7) { store word { format: stw RS, D(RA), which performs the operation: MEM(RA+D):=RS(31 downto 0). The e ective address computed is: RA + D = R7 + 1300 = ;110 + 1300 = 1190 (b) brn 1250 { branch on negative { format: brn D, which performs the operation PC := PC + D + 4, changing the program counter value to: PC := 5321 + 1250 + 4 = 6575. Solutions Manual - Introduction to Digital Design - December 29, 1999 315 (c) bri R7 { branch indirect { format: bri RA, which performs the operation PC := RA. The number ;110 is represented in two's complement using 32 bits. However the PC precision is only 24 bits, thus the actual value stored in the PC is: 224 ; 110. Exercise 15.4 The sequence of branches to branch-on-negative a distance 220 from the present instruction location is given as: Address A: brn D . . . Address A'=A+D+4: br D . . . Address A''=A'+D+4: br D . . . After the instruction brn D, the PC value is adjusted to PC := PC + D +4. After the execution of n branches with the same displacement value D, the PC value is: PC := A + n(D + 4) For this exercise it is required that the last PC value be A + 220 , thus: Using the maximum positive value for D (signed integer using 16 bits), which is 215 ; 1, it is 220 possible to calculate the number of branch instructions as n = 215 = 25 = 32. As a result: Thus, a set of 32 branch instructions with the same displacement D will result in the movement of the program counter 220 locations from the address of the rst branch instruction. This solution does not depend on the location A. 20 ; 7 D = 2 32 2 = 32764 A + n(D + 4) = A + 220 20 ; D = 2 n 4n 316 Exercise 15.5 Solutions Manual - Introduction to Digital Design - December 29, 1999 Assembly code to add an array of 105 integers stored consecutively starting at location 1000. The result is stored in location 2000. ----R0 R1 R2 R3 - pointer to the elements in the array - number of elements already read - accumulator - value to add in each program iteration xor R0,R0,R0 -- R0=0 adi R1,R0,105 -- R1=105 xor R2,R2,R2 -- R2=0 A: ldw R4,1000(R0) -- load R4 with integer value located at R0+1000 add R2,R2,R4 -- add R4 to accumulator (R2) adi R0,R0,4 -- adjust pointer to reference next integer sbi R1,R1,1 -- decrement number of integers to be added A+16: bnz -20 -- branch to location A if there are more integers stw R2,1580(R0) -- store accumulator in location 2000 The value of the displacement (D1) for the bnz instruction was obtained using the relation: A + 16 + 4 + D1 = A, which results in ;20. The displacement (D2)for the stw instruction was calculated using the information that R0 = 105 4 = 420, then to reach location 2000, D2 = 2000 ; 420 = 1580. The binary code for this program is: xor 011000 00000 adi 010001 00001 xor 011000 00010 ldw 100001 00100 add 010000 00010 adi 010001 00000 sbi 010011 00001 bnz 110010 00000 stw 100011 00010 00000 00000 00010 00000 00010 00000 00001 00000 00000 00000 00000 00010 00000 00100 00000 00000 11111 00000 00000000000 00001101001 00000000000 01111101000 00000000000 00000000100 00000000001 11111101100 11000101100 Solutions Manual - Introduction to Digital Design - December 29, 1999 317 Exercise 15.6 Simulation of XMC instructions. Figures 15.4 and 15.5 shows the simulation results of the instructions given in Exercise 15.2. /computer/reset /computer/clk /computer/memaddr 000000 /computer/memlength /computer/memrd /computer/memwr /computer/memenable /computer/memrdy /computer/memdata /computer/ioaddr XXX /computer/iolength /computer/iord /computer/iowr /computer/ioenable /computer/iordy /computer/iodata XXXXXXXX /computer/status undef /computer/u3/p2/ng /computer/u3/p1/gpr/gpr(5) 00000000 /computer/u3/p1/gpr/gpr(7) 00000000 /computer/u3/p1/gpr/gpr(8) 00000000 /computer/u3/p1/gpr/gpr(9) 00000000 0 00000017 000000A5 200 400 000000A5 p_reset fetch execute fetch execute 00000000 40A74800 84E804B0 000004 000008 Entity:computer Architecture:structural Date: Tue Dec 21 18:38:17 PST 1999 Row: 1 Page: 1 Figure 15.4: Simulation of instructions using VHDL - 0-500 ns The contents of the memory was initialized with the binary code for the three instructions, by the following modi cation in the memory.vhd le: VARIABLE Mem : MemArrayT --******** memory contents, used for exercise 15.6 := (-- program -- add R5,R7,R9 3=>"01000000", 2=>"10100111", 1=>"01001000", -- ldw R7,1200(R8) 7=>"10000100", 6=>"11101000", 5=>"00000100", -- brn 1000 11=>"11000100", 10=>"00000000", 9=>"00000011", OTHERS => "00000000") 0=>"00000000", 4=>"10110000", 8=>"11101000", For the simulation the clock cycle time was set to 100ns to make it easier to read the memory data information. Recall that a minimum cycle time of 17.5ns would be possible, based on the 318 /computer/reset /computer/clk Solutions Manual - Introduction to Digital Design - December 29, 1999 /computer/memaddr 000008 /computer/memlength /computer/memrd /computer/memwr /computer/memenable /computer/memrdy /computer/memdata /computer/ioaddr XXX /computer/iolength /computer/iord /computer/iowr /computer/ioenable /computer/iordy 0004C7 000008 00000C 0003F4 0003F8 00000000 C40003E8 00000000 /computer/iodata XXXXXXXX /computer/status execute /computer/u3/p2/ng /computer/u3/p1/gpr/gpr(5) 000000A5 /computer/u3/p1/gpr/gpr(7) 00000000 /computer/u3/p1/gpr/gpr(8) 00000017 /computer/u3/p1/gpr/gpr(9) 000000A5 600 Entity:computer Architecture:structural Date: Tue Dec 21 18:39:12 PST 1999 Row: 1 Page: 1 800 1 us memop fetch execute fetch execute Figure 15.5: Simulation of instructions using VHDL - 500-1000 ns calculations done in the text (Example 15.1). The values in registers 8 and 9 were forced to R(8) = (17)16 and R(9) = (A5)16 . After the instruction add R5,R7,R9 is executed, the value in R(5) becomes a copy of the R(9) (time 350ns). The next instruction: ldw R7,1200(R8), at address 4 access the location R8 + (4B 0)16 = (4C 7)16 to read the value that is stored in R7. The contents of this memory location is zero, so there is no change in R7. The instruction at location 8 (brn 1000) should test the condition for negative result (signal ng), which is zero, indicating that the last operation on the ALU resulted in a positive value. However, the branch was taken, showing that there is a mistake in the processor's VHDL description. The target address for the branch is correct: 03E 8+0008+0004 = 03F 4 (values in hexadecimal), where the rst value is the displacement in the branch instruction (D), the second is the location of the brn instruction, and the last value is the automatic increment of the PC value during the fetch state. Solutions Manual - Introduction to Digital Design - December 29, 1999 Exercise 15.7 319 The new instruction ldwa automaticaly increments by 4 the index register used to compute the e ective address. Let us de ne the instruction assembly as: The modi cation of the VHDL code given on page 453 of the text adds the following condition to the Opcode test: CASE Opcode IS ... ... WHEN ``ldwa opcode'' => -- ldwa Phase <= MemOp Status <= MemOp tMemAddr <= RA_data + D -- mem. address GPR(RA_addr) <= RA_data + 4 -- increment index register WAIT until MemRdy='1' WHEN ... ... ldwa RT D(RA) Its execution results in RT (31 downto 0) := Mem(RA + D 4) and RA := RA + 4. Exercise 15.8 Figure 15.10 in the text shows a 16MB memory. A 4 times bigger memory is asked in this Exercise. Assuming that the 4MB modules are available (4 of these modules were used to design the 16MB memory in the text), 16 of them are required for this memory design, as shown in Figure 15.6. Each row of modules has 16MB. A row is selected by the two most signi cant bits of the address vector. The two least signi cant address bits are used to select one of the bytes that compose a word. The remaining bits are used to address a byte inside a 4MB module. Exercise 15.9 PC=1500 for all cases (a) add R7,R5,R8 instruction is read form the memory and the program counter is incremented by 4. The instruction value (signal instr) is sent to the control subsystem for decoding. { path is set: sin sout=0, ALU PC=1, PC RA=0 { PC=PC+4: ALUop=1110 (no condition coming from the ALU is stored) { values are stored in the destination registers: WrPC=1, WrIR=1 after decoding, the signals that control the data subsystem are activated to implement the requested operation (addition): { indicates the source and destination of the operation: AddrA=R5, AddrB=R8, AddrC=R7 { path control: Mem ALU=0, sin sout=0, PC RA=1, IR RB=1 { operation: ALUop=0001 (addition) { results and conditions are stored in registers: WrC=1, WrCR=1 (b) stw R3,11300(R2) | Mem(R2+11300):=R3 320 Solutions Manual - Introduction to Digital Design - December 29, 1999 Rd Wr Enable Adr(25:0) Adr(23:2) controls Byte 3 Adr(23:2) Byte 2 Byte 1 Byte 0 MRdy controller 4MB module Byte 3 Adr(23:2) Byte 2 Byte 1 Byte 0 Byte 3 Adr(23:2) Byte 2 Byte 1 Byte 0 Byte 3 Adr(25:24) 2 8 2 Byte 2 Byte 1 Byte 0 8 8 8 Adr(1:0) Selector/Distributor 32 data Figure 15.6: 64MB Memory - Exercise 15.8 instruction fetch is the same as presented for part (a) decode the instruction, compute and send the memory address to be used for memory write, combining the value in register R2 and the displacement value 11300 in the instruction. Send R3 to the data bus. { register access: AddrA=R2, AddrB = R3 { path setup: PC RA=1, IR RB=0, ALU PC=0, sin sout=1 { operation: ALUop=0001 (addition), ZE SE=1 (sign extension) (c) brn 10000 | PC := PC+4+10000 instruction fetch is the same as presented for part (a) decode the instruction, compute the branch target address, test the condition and write the target value into the program if condition is true. { register control: ( WrPC = 1 if N = 1 (15.1) 0 otherwise Solutions Manual - Introduction to Digital Design - December 29, 1999 { { 321 path setup: PC RA=0, IR RB=0, ALU PC=1. operation: ALUop=0001 (addition of PC+4 and 10000), ZE SE=1 (sign extension) Exercise 15.10 The signals that are active during the execution (no fetch cycle) of both instructions are shown in Figure 15.7. clk instr AddrA Mem_ALU AddrC ALUop WrC WrCR DataA IR_RB DataC ALU_PC MemAddr instruction decode register read delay ALU op delay ctr delay memory access delay addition execute load word instruction memop (a) ldw RT,D(RA) clk instr SE_ZE IR_RB PC_RA ALUop WrC WrCR WrPC ALU_PC ALUdata instruction decode ALU delay to compute nex PC value (b) br D tsu PC reg. addition branch instruction sign extension Figure 15.7: Timing diagram for signals in data subsystem - Exercise 15.10 322 Exercise 15.11 Solutions Manual - Introduction to Digital Design - December 29, 1999 Based on the VHDL description on page 474 of the text we obtain the statetransition diagram and one-hot implementation of the control subsystem as shown in Figure 15.8. The state change from ffetch execute memopg states to p reset is not synchronous, for this reason we are indicating the transition with dashed lines. reset reset P_reset reset Fetch others reset 0 clk D Execute load/store Memop Memop P Q P_reset D R Q fetch D R Execute Q D R Q P_reset Fetch Exec. Memop Opcode[5:2]=1000 load/store Figure 15.8: Control subsystem for Exercise 15.11 (without the output) The generation of control signals (control subsystem outputs) is based on the VHDL description on pages 476 and 477 of the text, and the control signal table Ctrl Table shown on page 475. The table is implemented in hardware by a ROM. A row of the table, addressed by the opcode value, is referenced in this exercise as ROMline = ROM (Opcode). Each line is composed of several control signals, and it is represented as: ROMline < controlsignal >]. The expressions for the control signals are: IR RB WrCR WrC ALUop MemRd MemWr MemEnable MemLength IORd IOWr IOEnable IOLength ALU PC sin sout PC RA = = = = = = = = = = = = = = = ROMline I R RB ] ROMline W rCR]:fetch0 ROMline W rC ]:(ROMline M emop]0:execute + memop) (\111000 ):fetch + ROMline ALUop]:fetch0 f ootnote 1 **** fetch + ROMline W rMem]0 :memop ROMline W rMem]:memop fetch + memop fetch + Opcode(0):memop 0 0 0 0 fetch + execute ROMline W rMem]:MemRdy0 :memop ROMline W rPC ]0:fetch0 Solutions Manual - Introduction to Digital Design - December 29, 1999 323 WrIR = fetch WrPC = fetch + ROMline W rPC ]:condition ****** NOT DONE IN uVHDL Mem ALU ZE SE AddrA AddrB AddrC = = = = = ROMline W rMem] ROMline Z E SE ] instr(20 : 16) instr(25 : 21):(RS RB =0 00) + instr(15 : 11):(RS RB =0 10 ) instr(25 : 21) ****** Error in page 477 of the text where the condition value is computed from the condition signals from the ALU and the value to be tested, which is speci ed in the conditional branch opcode. The network that implements the generation of the control signals is shown in Figure 15.9. Values of the control signals for the instructions in Exercise 15.2 are shown in the following table, for each processor state. Recall that the binary code for these instructions are: add = 01000000101001110100100000000000 ldw = 10000100111010000000010010110000 brn = 11000100000000000000001111101000 Signal add ldw brn Fetch Exec Fetch Exec Memop Fetch Exec IR RB 1 0 0 0 WrCR 0 1 0 0 0 0 0 WrC 0 1 0 0 1 0 0 ALUop 1110 0001 1110 0001 10001 1110 0001 MemRd 1 0 1 0 1 1 0 MemWr 0 0 0 0 0 0 0 MemEnable 1 0 1 0 1 1 0 MemLength 1 0 1 0 1 1 0 IORd 0 0 0 0 0 0 0 IOWr 0 0 0 0 0 0 0 IOEnable 0 0 0 0 0 0 0 IOLength 0 0 0 0 0 0 0 ALU PC 1 1 1 1 0 1 1 sin sout 0 0 0 0 0 0 0 PC RA 0 1 0 1 1 0 0 WrIR 1 0 1 0 0 1 0 WrPC 1 0 1 0 0 1 condition Mem ALU 0 0 0 0 ZE SE 0 1 1 1 AddrA 00111 01000 01000 00000 AddrB 01001 00111 00111 00000 AddrC 00101 00111 00111 00000 Both MemRd and MemWr signals should be implemented as pulses, according to the VHDL description. The solution for this exercise simpli es this aspect, making the signals active for a complete clock cycle. 1 324 Solutions Manual - Introduction to Digital Design - December 29, 1999 memop Opcode 6 ROMline ROM 12 1 1 1 1 1 1 1 1 4 execute Romline[Memop] Romline[WrMem] Romline[RS_RB] Romline[IR_RB] Romline[WrC] Romline[WrPC] Romline[WrCR] Romline[ZE_SE] Romline[ALUop] WrC IR_RB PC_RA fetch’ condition WrPC ZE_SE fetch memop MemEnable ROMline[ALUop] 0 4 opcode(0) ALUop Memlength 1110 1 MemRd fetch instr(25:2) instr(15:11) 0 5 MemWr ALU_PC AddrB MemRdy Sin_Sout execute 1 ROMline[RS_RB] WrIR=fetch AddrC = instr(25:21) AddrA = instr(20:16) fetch’ Romline[WrCR] Mem_ALU WrCR Figure 15.9: Network used for control signal generation - Exercise 15.11. The d.c.'s on the table represent information that is based on the instruction bits, and is valid only after the fetch state. The condition included in the table (signal WrPC) is given by the test of the ALU cond signal, based on the conditional branch opcode. For example, for the branch on zero instruction (brz): condition = 1 if (Opcode = 110011 and Z = 1). Solutions Manual - Introduction to Digital Design - December 29, 1999 Exercise 15.12 325 Cycle time (based on Example 15.1 in the text) tR = 2ns tRF = 4ns tALU = 4ns tmux = 0:5ns tZSE = 0:5ns tsw = 0:5ns tctl = 0:5ns tdec = 3ns tmem = 10ns tfetch = tR + tctl + tmux + tmem + tsw = 2 + 0:5 + 0:5 + 10 + 0:5 = 13:5ns texecute = tR + tctl + tRF + tmux + tALU + tmux + tRF = 2 + 0:5 + 4 + 0:5 + 4 + 0:5 + 4 = 15:5ns tmemop = tR + tctl + tmem + tsw + tmux + tRF = 2 + 0:5 + 10 + 0:5 + 0:5 + 4 = 17:5ns Based on these values, the critical path is in the memop state. The clock cycle is 17.5ns, limiting the clock frequency to 57MHz, same as the system in Example 15.1 in the text. Modi cation to XMC architecture to have only two instruction formats with 2 bytes (F1) and 4 bytes (F2). (a) The format for the instruction groups are: Group Operation type Number of bits (actual) Format Unary RT := op(RA) 16 F1 Binary RT := RAopRB 20 F2 RT := RAopSI 32 F2 RT := RAopUI 32 F2 Memory RT := M RA + D] 32 F2 M RA + D] := RS 32 F2 I/O RT := IO P N ] 27 F2 IO P N ] := RS 27 F2 Branch PC := PC + 4 + D 22 F2 (indirect) PC := RA 11 F1 NOP 6 F1 (b) One possible solution to modify the opcode is to use opcodes with the msbit = 1 for the 2-byte instructions and opcodes with the msbit = 0 for the 4-byte instructions. The opcodes used for the load/store/I/O operations and unary/nop/branchindirect operations could be exchanged, such that their opcode values would be: Exercise 15.13 load=store = 00xxxx I=O = 00xxx unary=nop=branchindirect = 10xxxx 326 Solutions Manual - Introduction to Digital Design - December 29, 1999 The instruction format would be detected testing the most signi cant bit of the opcode. (c) With two instruction formats there is a problem with instruction alignment. The memory is byte addressed but the processor reads words (4 bytes). Thus, an instruction may start in the middle of a word and go over the next word. This problem is di cult to solve and to simplify the solution of this exercise lets consider that instructions in the F2 format are always aligned. Thus, it is the responsibility of the compiler to insert NOP instructions when necessary to create groups with an even number of F1 format instructions, this way aligning instructions in the F2 format. It is also the case that the processor does not need to read an instruction from memory every fetch cycle, it may be available in a word already read from memory in the previous cycle. The processor doesn't read another instruction word from the memory when the present instruction is in the F1 format and its address has bit 1 is 0 (that means, the half-word instruction is aligned with the word address). The modi cation in the IR register part is shown in 15.10. The 16-shift register enables the data section to send to the control section the rst or second half word in the instruction register. from memory WrIR 32-bit register 32 clk reset 16-shift register 16 ls half word ms half word 0 16 0 16 1 Mux selhalfword 32 0 1 Mux to control section Figure 15.10: New Instruction Register - Exercise 15.13 Since now the adjustment of the PC value depends on the instruction format, it is not possible to update it at the end of the fetch cycle. During the execute state, it is not always possible to use the ALU for this purpose, thus, the solution is to include a dedicated adder to the data section, only to enable the addition of 2, 4, or D (displacement in the instruction register) to the PC value, during the instruction execution phase. Figure 15.11 shows the inclusion of the adder to the PC area of the data section. Observe that relieving the ALU from this task removes the need for MUX2. The path from the PC to the ALU input is not necessary. The PC may be loaded with a value coming from the register le (indirect branch), for this reason, Mux4 was used to create an alternative path to load the PC with the ALU output. Mux5 is used to select one of the three possible values added to PC. The selection is controlled by signal sel24D. To avoid a path through the ALU, the D value is obtained from the ZE SE module directly. Solutions Manual - Introduction to Digital Design - December 29, 1999 DataB DataA Mux3 from extender module 327 ALUop ALU Cond to condition register to Mux1 ALUdata 24 2 0 1 2 3 sel24D Mux5 Adder selPC Mux4 WrPC clk Reset PC ALU_PC Figure 15.11: Modi cations to the PC update network - Exercise 15.13 Exercise 15.14 If it is not possible to read the register le (RF) during the execution state, the delay computed in Example 15.1 of the text would be slightly modi ed. Instruction decoding and RF reading couldn't be done in parallel (as presented in Example 15.1), and the RF read will need to wait for the decoding phase to nish. Thus, the delay for the exec state would be: texec = tR + maxftctl tdec g + tRF + tmux + tALU + tmux + tRF The computation of tmemop doesn't change. The execution time gets worse for this situation, when compared to the case analyzed in Example 15.1. Again exec is the critical state. It imposes a minimum clock cycle time of 20.5ns, and a maximum clock frequency of 48.7MHz. Exercise 15.15 texec = 2 + 3 + 4 + 0:5 + 6 + 0:5 + 4 = 20:5ns Introduce a register at the ALU output. To avoid an increase in the number of cycles for the PC update also, we introduce the register in the path between the ALU and mux1, after the PC input. This way, the PC value continues to be updated in one clock cycle. Now there are two execute states: execute1: terminates when the ALU output (result) is ready. 328 (a) cycle time: Solutions Manual - Introduction to Digital Design - December 29, 1999 execute2: the result generated by the ALU is stored back into the register le. texec1 = tR + tctl + tRF + tmux + tALU = 2 + 0:5 + 4 + 0:5 + 6 = 13ns texec2 = tR + tmux + tRF = 2 + 0:5 + 4 = 6:5ns The time to execute a memory operation and perform an instruction fetch are still the same and are given at Example 15.1 as: tfetch = 11:5ns and tmemop = 15ns. The memory operation will limit the minimum clock cycle time to 15 ns. (b) Number of cycles per instruction type Instruction cycles Unary (one operand - register) 3 Binary (two operands - registers) 3 Memory 3 I/O 3 Branch 2 NOP 2 (c) Following the same calculation done in Example 15.1, the execution time for register and memory operations is 3 15 = 45ns. There was an improvement in the memory operations, but the register operations was degraded. The average execution time is 45 ns, which is worse than the case shown in Example 15.1. Spliting the memory operation cycle. (a) Insert a register after the switch, before mux1, isolating the access delay to the register le. The new memory cycles are: Exercise 15.16: tmemop1 = tR + tctl + tmem + tsw = 2 + 0:5 + 8 + 0:5 = 11ns tmemop2 = tR + tmux + tRF = 2 + 0:5 + 4 = 6:5ns For this modi cation, the minimum clock cycle is given by the exec1 cycle time (13 ns, computed in Exercise 15.15). (b) Number of cycles per instruction type Instruction cycles Unary (one operand - register) 3 Binary (two operands - registers) 3 Memory 4/3 I/O 4/3 Branch 2 NOP 2 where the two number x/y represent the number of cycles for reading data into a register (x) and number of cycles for writing data to the memory (y). Solutions Manual - Introduction to Digital Design - December 29, 1999 329 (c) Execution time: Following the same calculation done in Example 15.1, the execution time for register and memory write operations is 3 13 = 39ns, and for memory read operations is 4 13 = 52ns. There was an improvement in the register and memory write operations compared to Exercise 15.15. However, memory read operations present a worse execution time. Keeping the same proportion of memory and register operations in a given program mix, the same used in Example 15.1, and also considering that half of the memory operations are read operations, we obtain the average instruction execution time of: 0:9 39 + 0:1 52 = 40:3ns Thus, the average instruction execution for this alternative is worse than the one presented in Example 15.1, and better than the alternative analyzed in Exercise 15.15. Exercise 15.17 Using instruction and data caches. The access time to caches depends on the availability of the requested data. For this exercise, we are assuming that the data being accessed is in the cache. The time to read an instruction from the instruction cache is: tim = 6ns, and the time to read data from the data cache is: tdm = 8ns. Since only the time to fetch an instruction is a ected (the access time for data continues the same as tmem ), we get: tfetch = tR + tctl + tmux + tim + tsw = 2 + 0:5 + 0:5 + 6 + 0:5 = 9:5ns Thus, although the alternative reduces the instruction fetch time, it is not going to improve the clock cycle time, since the clock period is determined by the memop state time (provided in Example 15.1 and Exercise 15.15) and the execution state time (evaluated in Exercise 15.16). ...
View Full Document

Ask a homework question - tutors are online