MidtermSpring2011Solution

MidtermSpring2011Solution - EEL 4930(5934 Reconfigurable...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 10
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: EEL 4930(5934 Reconfigurable Computing Midterm Exam — Spring Semester 2011 1. Scalable Systolic Array paper 14 pts. The figure on the right is reproduced from the Scalable Systolic Array S(t— l,j—l)+.5'ub(i,j) paper. It shows a scoring matrix for S(i—l,j)+eag the Needleman-Wunsch algorithm in 503” “m” “(EU)” which the query sequence of length 556352;“ 1,1— +8 “m” on top (ATA .A in this example) is matched against a W Isisimax .ISJ'SJ'max sequence database of length "n" on Sit—ang the left (AGG...C in this example) “('WmimetJHeg} based on a dynamic programming m _)_max S(,-,j_,)+,,g method. Recall that, In hardware, a 'J ‘ mtjemeg systolic array architecture was used in which one processing element (PE) is used to process one column of the scoring matrix, using a wavefront approach. t S(0,i) = 80,0) = 0g+i-eg H(0, i) = V(i,0) = 0g Sub(i, j) = penalty matrix ic progr (a) yatggfirea’ , ,m fig” 1! Mrs) fl/ , . 3?” W (b) Hog? many PEs are used? (2 pts) (c) At the first clock cycle, which cell(s) of the array is(are) processed? flfl " 5 At the second clock cycle, which cell(s) of the array is(are) processed? g 7:4 ‘5' 'At the third clock cycle, which cell(s) of the array is(are) processed? “$75 [ 72-2 “A, flfl’ff At which clock cycle will all PEs be processing? 472 76g (4 pts for (c) 3. DIMEtalk uestions: (d) in terms of the C code, what is the difference between DllViEtalk BRAM and a BRAM produced by hand-written VHDL (or CoreGen)? (3 pts) DIMEtalk BRAM: if r ‘ Z M We W fl/xfififggfif We We fiW/tf, t ) eta- ffféflmm. (' ,9 VHDL BRAM: we? a? flZZW' [Wafigm (e) Briefly compare (i.e., similarity an difference of a DIMEtalk BRAM and a memory map. (3 pts) KM ; MZ/w/ M WWW, if my fir 5ft“ cw?“ , % Mfimmefi't’iwfi egg/6.434%? ere-tee: egawwzm 7 tr y t 3e"? are a” EEL 4930/5934 Reconfigurable Computing Midterm Exam — Spring Semester 2011 2. Smart Buffer Design a smart buffer to implement the foitowing algorithm (pseudo code). The datapath will Name unrotl 4 loops at a time, thus requiring a(i), a(i-1), a(i-7) for every clock cycle to generate y(i), y(i+1), y(i+3). for (i=4; i < MAX; i++) { y[i] a[i] + a[i-1]%+ (fig-2] + a[i-3] + a[i-—4] } Lfi ' Assumptions: 0 Input memory bandwidth is 32 bits. - Data item are 8 bits. 0 However, bandwidth into the datapath needs to be 64 bits (enough for a(i) a(i-7)) - You don’t have to worry about the output memory bandwidth. Input BRAM addr (a) Fill in the BRAM appropriately with a(O), a(1), ,a(19). I I I I J ’ // _—‘ L I 32 GAME-RU: Smart Buffer E521 momma/area: [4) wfiéfl‘fi-Efi’a #62: macaw 7(f):flx$’7‘£447‘45+££%£/ 47 M) mag ififisfff’ at! #2; 2%" 2;, [7) :47 Ma 24 flgtéqTM/j ‘ :197‘471P45 gaffla Z227); gy§£{%fl7%éé%; EEL 493015934 Reconfigurable Computing Midterm Exam - Spring Semester 2011 2. Smart Buffer W ,, Design a smart buffer to implement the following algorithm (pseudo-code). The-datapsdh-fl unroll 4 loops at a time, thus requiring a(i), a(i—1), a(i-T) for every clock cycle to generate y(i), y(i+1), ..., y(i+3). for (i=4; i < MAX; i++) { y[i] = aIi] + a[i—11) + (a[i-2] + anti-3] + a[i-4]); } Assumptions: 0 input memory bandwidth is 32 bits. - Data item are 8 bits. - However, bandwidth into the datapath needs to be 64 bits (enough for a(i) a(i-7)) - You don’t have to worry about the output memory bandwidth. (a) Fill in the BRAM appropriately with a(O), a(1). ,a(19). Datapath (h) Specify the contents of the smart buffer after each of the following clock cycle. ‘ XM_£d/zg, 37 - a_fte_rclock cyclez after clock cycle 3 " motockcycm a er clock cycle 1 EEL 4930/5934 Reconfigurable Computing Midterm Exam —- Spring Semester 2011 20 pts. 3' Name Systolic Architecture (a) Given the following algorithm in pseudo-code, draw one iteration of the datapath that is fully pipelned. (8 pts) for (i=0; i < 10000; i++) { if (a[i} < a[i+1]) z[i} = avg(a[i], a[i+1], a[i+2], a[i+3]); else g_E£i} : (a[i] + a[i+1]) * (a[i+2] + a[i+3]); }" £14») um n WW (b) Assume the input data items are 8 bits and in ut memo bandwid h is 64 bits; output data items are 16 bits and out ut memo bandw'dth is 48 bits; all operators (+, ~, I) have the same latency. What is the maximum number of loop-unroliing? (For creditI please 7 show ork.) (2 pts) A 3(4):” /$/M (9W %5//é- 34% — : 3 2 % ._ f .r 312 g : ¢mzwfiw / 52672:”. . firm/mire (c) With the above assumptions, calculate the speedup of the fully pipelined, m imally ' unrolled circuit (assuming the FPGA clock rate of 200 MHz) as compared to the corresponding software executing on a microprocessor (assume 25 instructions for each iteration, a CPI of 2, a d cl ck frequency oZfiHz). (4 pts) y/ #a/ g -‘ f We?» g f 5 + (AM: 23 :: é 7L asfigffi: 377sz {W flay/wa 2S M771?» X 2 4’22" = épéfl?’ 76% J: 5&4fl&6) : (Ax: r 5.337 3/5 X . 3 fly; @fia I EEL 4930/5934 Reconfigurable Computing Midterm Exam — Spring Semester 2011 ' Name 3(d) Assume the division (avg) op requires 15 clock cycles but fully pipelined, calculate the news eedu .(3 s) . p}; W Max .20 %%.:fl .3- ;5323‘2041’ 3553 = 92,229,»; m a W “am-i? M1 “I A (e) Assume the division (avg op requires 15 clock cycles but n_ot pipelined, calculate the > new speedup. (3 pts) I I /’/W I «20 , _ , W M / M /W § "7") ' fla/ : fl % flaw n3 - gr: 2& 7‘ 4f%% m, (7/3” .‘ 3 W 2 ;§M/§”*/§ Fi ure 1: To be used for Problems 4 and 5: Given below is a block diagram of a DiMEtaik memory map node and how it is interfaced to the Glue Logic. , _ _ _ _ _ _ _ _ _ _ _ _ _.. I Specified in VHDL . _ \_ I Memory map assugnments a, ‘_\I I,."._.:__:,:_.:_:_1- - ‘ _ martian-1.01. _ ‘ r - i Address- - ' _ Aderenerator ‘g ToinRAM ' - , - . 1 . I g 2 Memory Map ' Glue Logic Datapath 3 _ _ Addr Generator outRAM (hex reset (TolFrom (To/From dt_cik Memory ——-—en> size TopLevel PCIX bus) data__in mn~——.—Wen-> done VHDL data out data In|31..0} I moduie) _ ‘ data OUTI31..0| inRAM-Waddr reset inRAMuwdata dt_cik inRAM_wen Assumptions: - DIMEtalk BRAM node id (to be used in the C code) for outRAM = 2 0 Memory map; node id = 3 ‘- . MW” c/W a; EEL 493015934 Reconfigurable Computing Midterm Exam —— Spring Semester 2011 Name 4. 0 Code 16 pts. Based on the block diagram and information shown in Figure 1 (page 4), complete the following C code as specified. /* Remarks: (same remarks from. DIMEtalk tutorial in Lab 2) FPGA_write and FPGA_read take a DWORD pointer. A DWORD on delta is 64 bits. However, for some unknown reason, the API only transfers 32 bits instead of 64. Therefore, the easiest solution is to use integer pointers (which are 32 bits on this machine) and_ then cast the integer pointer to a DWORD pointer. NORMALLY, THIS WOULD BE A VERY BAD IDEA, but it is a useful workaround for the API. You can alternatively' leave out the cast, but doing so will result in a bunch of warnings. Parameters: datl: pointer to the data to write to the FPGA N: the number of 32 bit words to transfer 0: the starting address to write to BRAMl: the id of the node in the FPGA you are writing to 1000: timeout in milliseconds Example: FPGA_write( (DWORD*) datl, N, 0, BRAMl, 1000 ); */ unsigned int *go; unsigned int: *size; /* Assume appropriate malloc has been */ unsigned int *done; /* performed for each of these variables */ unsigned int *input; /* 16 elements */ unsigned int *Output; /* 16 elements */ unsigned int i; // pack each input array words with four 8-bit values (similar to Lab 4) *N = 15; for (i=0; i < *size; i++) { inputEi] = ((i*4) & Oxff) << 24 l ((i*4+1) & Oxff) << 16 l ((i*4+2) Sc Oxff) << 8 l ((i*4+3) & Oxff); } . . // transfer size and input array to Full" the blanks balow' FPGA_write((DWORD*J size, / , /2 , é , 1000),- FPGA_write((DWORD*) input, ,5 g; , § , a; , 1000),- /*Note this FPGA_write above is through the memory map, not a DIMEtalk BRAM*/ “3° = 1" / /é .3 FPGA_write((DWORD*) go, , , , 1000); // write a while loop to wait for completion, using variable “done” // read the results ,fi FPGA_read((DWORD*) outputm- , a , Z , 1000),- /*Note this FPGAmwme is through a DIMEtalk BRAM */ M (3 pts.) (3 pts.) (3 pts.) (4 pts.) (3 pts.) EEL 4930/5934 Reconfigurable Computing Midterm Exam — Spring Semester 2011 Name 5. Assume that the PORT statement has been defined and use the signal names and ifiifiiii assumptions shown in Figure 1 (page 4), complete the following VHDL code (a part of the Glue Logic module) that handles the “go” , inRAMwwaddr, inRAM_wdata, inRAMuwen). ARCHITECTURE Behavioral OF GlueLogic IS “size”, “done”. and the inRAM signals (i.e., SIGNAL 2:23:23“ flé/ , piz’fm} BEGIN [1/ gigs??? t I—IF (;:t<: .110"); THEN Jfl/ (597%,. %{fl{g2 size <= (OTHERS => '0'); InRAmeen <= '0'; InRAM_waddr <= {OTHERS => '0'); InRAM wdata <= {OTHERS => '0'); 4/ MM [fl ELSIF (thik'event AND dtclk = '1') THEN IF (en = '1' AND wen = '0') THEN -- read “done” i (3 ptS) 1/? M #747” yam” fag/y gfifi “ W £2 [51. t W) /; Mo: X ”W4eflfl/§” 77%7/ {WM 52% Zaamflam; ._ . g; 5 / W == X “mag, Wfl ”r//é/i/ v em aye-r - w; in? M7; —— write to input memory if address is in appropriate range IF (unsigned(addr) >= 0 AND unsigned(addr) <= 15) THEN (6 pts) ELSIF (en = ‘1' AND wen = ‘1') THEN /* write to “go” and “size” if appropriate (6 pts) ‘ f / Z7/(r'fl/7_//€/7<f:e //‘ .Zgflfl/W. MMaa M . ,2 Z/yfl/MZ. WM as" MW) END IF; L 'END IF; END IF; END PROCESS; END Behavioral; _19 pts. EEL 4930/5934 Reconfigurable Computing Midterm Exam — Spring Semester 2011 Name 6. Analysis of a testbench Given below is a simplified version of the Lab 4 testbench. Analyze it and provide an explanation for the indicated sections of the code. entity tb is end tb; architecture behavior of tb is constant TEST_WIDTH : positive := 32; constant TESTWSIZE : integer = 256; constant MAX_CYCLES : integer := TEST_SIZE*4; signal clk std_logic :5 '0'; signal rst . std_logic := '1'; signal addr : std_logic_vector(TEST_WIDTH—1 downto O} := (others => '0'); signal en - std_logic := '0'; signal wen std_logic I := '0‘; signal din std_logic_vector(TEST_WIDTH~1 downto O) := (others => '0‘); signal dout : stdmlogic_vector(TEST_WIDTH—l downto O); begin Explai the functio of this‘statemant. x v UUT : entity work.pipeline_h101 port map ( dt_clk => reset => , ,r ,In " * /Zzi addr => ‘Qaggfiff §§Q§§%§6cn-.= Jg?7£;62( en => 1 2%; /6jle wen => ‘ din => _, . dout => Draw the corresponding waveform and accurately label it clk <= not clk after 5 ns;:}— Ueqinn8)(2pm) process function check ( i : integer) , return integer is / 9&5 begin return ((i*4) mod 256)*((i*4+l) mod 256) + l(i*4+2) mod 256)*((i*4+3) mod 256); end check; variable result : std_logic_vector(TEST_WIDTH—l downto 0); variable done std_logic; Draw the corresponding waveforms accurately. You can use notation, but label how many clock cycles. (3 pts) begin rst <= '1'; wait for 200 ns; rst <= '0'; wait until clk'event and clk _ wait until clk'event and Clk for i in G to TEST_SIZE—1 loop addr <= stddlogic_vector(unsignediC_M _IN_START_ADDR)+i); en <= '1',- wen <= '1',- E aint efunction ofthese statements. (.2 ts) din <= std_logic_vector M % £2"; . ééagkffl afigii I _,‘ aaéslf. M y wwm‘ ‘ ' to_unsigned((i*4+1) mod 256, 8) to_unsigned((i*4+2) mod 256, 8) to_unsigned((i*4+3) mod 256, 8)); wait until clk'event and clk = ‘1'; end loop; —— i {to_unsigned((i*4) mod 256, 8) & -1 EEL 4930/5934 Reconfigurable Computing Midterm Exam — Spring Semester 2011 Name addr <= CfiSIZE_ADDR; en <= '1‘; wen <= '1‘; . din <= std_logic_vector(to_unsigned{TEST_SIZE 32)); wait until clk'event and clk = '1'; addr <= C_GO_ADDR; en <= '1'; wen <= '1'; din <2 std_logic_vector(to_unsigned(1, 32)); wait until clk‘event and clk = '1'; done -= '0'; while done = ‘0' loop addr <= C_DONE_ADDR; en <= '1'; wen <2 '0'; wait until clk'event and clk '1'; —— give entity one cycle to respond wait until clk'event and clk = '1'; done -= dout(0); end loop; for i in 0 to TEST_SIZE—l loop addr <= std_logic_vector(unsigned(C_MEM OUT_START_ADDR)+i); en '1'; wen <= '0'; wait until clk'event and clk '1'; —— give entity one cycle to respond wait until clk'event and clk '1'; result dout; <3 if (unsigned(result) /= check(i)) then errors errors + 1; report "Result for " & integer'image{i) & ” is incorrect. The output is " & integer'image(to_integer(unsigned(res lt))) " but should be " & integer'image(che k0utput(i)); end if; end loop; i report "SIMULATION FINISHEDlll“; wait; end process; end; EEL 4930/5934 Reconfigurable Computing Midterm Exam — Spring Semester 2011 Name ENTITY _entity_name IS PORT(_input_name, _input__name : IN STD_LOGIC; __input_vector_name : IN STD_LOGIC_VECTOR(___high downto mlow); _bidir_name, _bidir_name : INOUT STDHLOGIC; _output_name, _output_name : OUT STD_LOGIC); END _entity_name; ARCHITECTURE a OF __entity_name IS SIGNAL _signal__name : STD_LOG|C; SIGNAL __signal_name : STD_LOG|C; BEGIN —— Process Statement -- Concurrent Signal Assignment -- Conditional Signal Assignment -- Selected Signal Assignment -- Component Instantiation Statement END 3; _instance_name: floomponentfiname PORT MAP (_component_port => __connect__port, _component_port => ___conneot_port); WITH ___expression SELECT _signal <= _expression WHEN wconstantjalue. _expression WHEN _constant_value, _expression WHEN _constant_value, _expression WHEN _constant_value; __signal <= _expression WHEN _boolean_expression ELSE “expression WHEN ___boolean_expression ELSE “expression; IF _expression THEN ____statement; _statement; ELSIF _expression THEN _statement; _statement; ELSE _statement; _statement; END IF; <generate_labet>: FOR <Ioop__id> IN <range> GENERATE —— Concurrent Statement(s) END GENERATE; CASE mexpression IS WHEN “constanLvalue => _statement; _statement; WHEN __constant_value => __statement; __statement; WHEN OTHERS => _statement; _statement; END CASE; WAIT UNTIL _expression; ...
View Full Document

Page1 / 10

MidtermSpring2011Solution - EEL 4930(5934 Reconfigurable...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online