This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CSE721HW2 S M Faisal January 26, 2011 1 Answer to question number 1 (Problem 4.3) (a) The standard ring algorithm with p processors runs in p 1 steps where each processor sends a message to its neighbor in a fixed direction (clockwise or anticlock wise) in each step. Hence, there are total p 1 steps each step taking time ( t s + mt w ) where m is the size of the message. So, Time standardring = ( t s + mt w )( p 1) (b) The hypercube algorithm tuns in (log p) steps with each processor exchanging messages of increasing sizes. As for the ring of 8 processors numbered through 0..7, the communication pattern is as follows: Step Message Size Between Path Length 1 m < , 1 >,< 2 , 3 >,< 4 , 5 >,< 6 , 7 > 1 2 2m < , 2 >,< 1 , 3 >,< 4 , 6 >,< 5 , 7 > 2 3 4m < , 4 >,< 1 , 5 >,< 2 , 6 >,< 3 , 7 > 4 Now, with assumption of S&F network, there is a nice symmetry about the way these messages are sent. The symmetry is that at each step, the message sizes and path length are same. Hence, for instance, in step 3, each individual link is used by only one node for sending the message of size 4m. Thus there is no simultaneous message on the same link from two different nodes. Now, formulating the cost we get, Time hypercube = ∑ log p 1 i =0 ( t s + 4 i mt w ) = t s log p + ∑ log p 1 i =0 4 i mt w = t s log p + mt w 3 ( p 2 1) Given, t s = 100 × t w (1) If the message size m is very large, model (a) is better because here m is multiplied by (p1) where in (b) it is multiplied by a p2 term. Hence, as long as p¿2, (a) is cheaper. (2) If m is very small (=1), assuming ts is significant and p is moderate, model (b) may be better. But we need to remember, model (b) still is asymptoticly (p2). So, it might yet get costly even though m is very small. 2 Answer to question number 2 (Problem 4.5) Given the conditions, we can devise a log p step algorithm (where p is the number of pro cessors) that satisfy the given time restriction/boundary. The idea is that at each step we divide the tree into half of previous step and corresponding processors of the halves exchange message with each other. Hence, in the first step, we divide the p processors at the root of the tree and each part contains p 2 processors each with a message of size m. These p 2 processors exchange messages in the first step. In the second step we divide each part into halves yielding p 4 processors in each half but with 2 m messages and so forth. The steps are shown below with an example balanced binary tree with p = 8 Figure 1: Alltoall broadcast pattern on a tree Here Black lines are communication in step 1, blue in step 2 and red in step 3. Now, in step one, total p 2 processors exchange message of size m with p 2 processors sharing the link through the root. Hence, according to the given cost model, the time is: Step 1 : t s + mt w p 2 Step 2 : t s + 2 mt w p 2 2 ....
View
Full
Document
This note was uploaded on 03/08/2012 for the course CSE 721 taught by Professor Saday during the Winter '11 term at Ohio State.
 Winter '11
 Saday

Click to edit the document details