chapter3-m2-ziavras

Ziavras cycles 3 2 1 0 loop processingunrolling

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: r. using result FP ALU op ALU op FP ALU op ALU op FP ALU op Store double Load double FP ALU op Load double double Store double double S. Ziavras cycles 3 2 1 0 Loop Processing/Unrolling Example: Adding a scalar to a vector For (i=1000; i>0; i=i-1) x[i] = x[i] + s; Straightforward MIPS code implementation Loop: L.D F0,0(R1) ;F0=array element ADD.D F4,F0,F2 ;add scalar in F2 S.D F4,0(R1) ;store result DADDUI R1,R1,#-8 ;decr. pointer (DW) BNE R1,R2,Loop ; branch R1!=R2 branch R1!=R2 Initial assumptions: R1=(highest addr. of array elem.); R2=(precomputed: 8(R2) is last elem.); F2=scalar S. Ziavras Loop Processing/Unrolling (2) Without any scheduling (simple MIPS pipeline) any scheduling MIPS pipeline) Clock cycle Loop: L.D F0,0(R1) 1 stall 2 ADD.D F4,F0,F2 3 stall stall S.D F4,0(R1) 6 DADDUI R1,R1,#-8 7 stall BNE R1,R2,Loop 9 stall 10 10 clock cycles per iteration clock cycles per iteration S. Ziavras Loop Processing/Unrolling (3) With loop scheduling Loop: L.D DADDUI ADD.D stall BNE S.D F0,0(R1) R1,R1,#-8 F4,F0,F2 R1,R2,Loop ;delayed branch F4,8(R1) Alt Altered & interchanged with DADDUI Compil...
View Full Document

Ask a homework question - tutors are online