4 cpu alpha at 128 cpus it delivers a peak bisection

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: MP = $181k + $20k * P What speedup for constant performance/cost? 8 proc = $341k; Cost(8/1)=$341k/$138k= 2.5 (cost increase) 16 proc = $501k; Cost(16/1)=$501k/$138k= 3.6 Cost(16/8)=3.6/2.5 = 1.44 44% (<100%) speedup needed ONLY needed ONLY LESS THAN LINEAR SPEEDUP NEEDED THAN LINEAR SPEEDUP NEEDED • Even if need some more memory for MP, not linear 57 • • • • • • • • Fallacy: Scalability is almost free Fallacy: Scalability is almost free • “Build scalability into a multiprocessor and then simply offer the multiprocessor at any point on the scale from a small number of processors to a large number” • Cray T3E scales to 2048 CPUs vs. 4 CPU Alpha – At 128 CPUs, it delivers a peak bisection BW of 38.4 GB/s, or At 128 CPU it bi BW 38 GB/ (38400 MB/s)/128CPUs= 300 MB/s per CPU (uses Alpha microprocessor) – Compaq Alphaserver ES40 up to 4 CPUs and has 5.6 GB/s of Al ES40 CPU GB/ interconnect BW, or 1400 MB/s per CPU • Build apps that scale requires significantly more attention – – – – load balance, locality, potential contention, contention, serial (or partly parallel) portions of program 10X is very hard 58 Pitfall: Not developing SW to take advantage (or optimize for) multiprocessor architecture (or optimize for) multip...
View Full Document

This document was uploaded on 02/09/2014.

Ask a homework question - tutors are online