This preview shows page 1. Sign up to view the full content.
Unformatted text preview: elements that
are zero
(3)
Reduction tree:
• In parallel, have each oddnumbered processor Px send its
count to its neighbor P(x1), and have the neighbor compute a
subtotal
• In parallel, have P2 send its count to P0 and P6 to P4; have P0
and P4 compute subtotals.
• Have P4 send its count to P0, and have P0 compute the final
total.
(b) How much time will each step take? (1) We need to send (N/8) elements from P0 to each of 7 processors. It
takes X cycles to send A elements:
7 * X * ceil(N/8A)
(2) Each processor must compare each of its N/8 elements to zero. As
stated in the problem, we do not need to worry about the time to
increment the counter. All processors do this step in parallel, so we
only need the time for a single processor to do this step:
Y * (N/8)
(3) For the reduction tree, the amount of time to compute a subtotal is
not specified; full credit was given for either Y or zero cycles. In any
event, each of the 3 steps of the reduction tree requires one value to
be communicated, which takes X cycles (= X*ceil(1/A)), so the answer
is either
3X or 3(X+Y), depending on your assumptions.
9 9.3 917 Multiprocessors Connected by a Single Bus Invalid
(not valid
cache block)
Processor write miss Processor read miss
Processor miss Processor
miss
(write dirty
block to
memory) at (S d
en d
ali
inv e) Shared
(clean) Processor
read hit Processor
write hit Modified
(dirty)
Processor read hit
or write hit
a. Cache state...
View
Full
Document
This note was uploaded on 02/08/2014 for the course CS 351 taught by Professor Dr.suzannerivoire during the Fall '13 term at Sonoma.
 Fall '13
 Dr.SuzanneRivoire
 Computer Architecture

Click to edit the document details