cse8803-pna-sp08-13 - Interactions between parallelism and...

Info iconThis preview shows pages 1–16. Sign up to view the full content.

View Full Document Right Arrow Icon
Interactions between parallelism and numerical stability, accuracy Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel Numerical Algorithms [L.13] Tuesday, February 19, 2008 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Source: Dubey, et al ., of Intel (2005) 2
Background image of page 2
Problem: Seamless image cloning. (Source: Pérez, et al. , SIGGRAPH 2003) 3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
… then reconstruct. (Source: Pérez, et al. , SIGGRAPH 2003) 4
Background image of page 4
Review: Multigrid 5
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Exploiting structure to obtain fast algorithms for 2-D Poisson Dense LU : Assume no structure O(n 6 ) Sparse LU : Sparsity O(n 3 ), need extra memory, hard to parallize CG : Symmetric positive deFnite O(n 3 ), a little extra memory RB SOR : ±ixed sparsity pattern O(n 3 ), no extra memory, easy to parallelize FFT : Eigendecomposition O(n 2 log n) Multigrid : Eigendecomposition O(n 2 ) [Optimal!] 6
Background image of page 6
Problem: Slow convergence RHS True solution Best possible 5-step 5 steps of Jacobi 7
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Error “frequencies” ± ( t ) = R t w · ± (0) =( I - w 2 Z Λ Z T ) t · ± (0) = Z ± I - w 2 Λ ² t Z T · ± (0) Z T · ± ( t ) = ± I - w 2 Λ ² t Z T · ± (0) ± Z T · ± ( t ) ² j = ± I - w 2 Λ ² t jj ± Z T · ± (0) ² j 8
Background image of page 8
Initial error “Rough” Lots of high frequency components Norm = 1.65 Error after 1 weighted Jacobi step “Smoother” Less high frequency component Norm = 1.06 Error after 2 weighted Jacobi steps “Smooth” Little high frequency component Norm = .92, won’t decrease much more 9
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
“Multigrids” in 2-D P (3) P ( i ) = Problem on ( 2 i +1 ) × ( 2 i ) grid T ( i ) x ( i ) = b ( i ) P (1) P (2) 10
Background image of page 10
Full multigrid algorithm FMG ± b ( k ) ,x ( k ) ² x (1) Exact solution to P (1) for i =2 to k do x ( i ) MGV ± b ( i ) ,L ( i - 1) ± x ( i - 1) ² k=2 3 4 5 11
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Interactions between parallelism and numerical stability, accuracy Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel Numerical Algorithms [L.13] Tuesday, February 19, 2008 12
Background image of page 12
Example 1: When single-precision is faster than double On STI Cell SPEED(single) = 14x SPEED(double) : 204.8 GFop/s vs. 14.6 GFop/s SPEs fully IEEE-compliant for double, but only support round-to-zero in single On regular CPUs with SIMD units SPEED(single) ~ 2x SPEED(double) SSE2: S(single) = 4 Fops / cycle vs. S(double) = 2 Fops/cycle PowerPC Altivec: S(single) = 8 Fops / cycle; no double (4 Fops / cycle) On a GPU, might not have double-precision support 13
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Example 2: Parallelism and foating- point semantics: Bisection on GPUs Bisection algorithm computes eigenvalues oF symmetric tridiagonal matrix Inner-kernel is a routine, Count(x) , which counts the number oF eigenvalues less than x Correctness 1: Count(x) must be “ monotonic Correctness 2: (Some approaches) IEEE ±P-compliance ATI Radeon X1900 XT GPU does not strictly adhere to IEEE foating-point standard, causing error in some cases But workaround possible 14
Background image of page 14
The impact of parallelism on numerical algorithms Larger problems magnify errors: Round-off, ill-conditioning, instabilities Reproducibility : a + (b + c) (a + b) + c Fast parallel algorithm may be much less stable than fast serial algorithm Flops cheaper than communication Speeds at different precisions may vary signi±cantly [ e.g. , SSE k , Cell] Perils of arithmetic heterogenity , e.g. , CPU vs. GPU support of IEEE 15
Background image of page 15

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 16
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/06/2010 for the course CS 8803 taught by Professor Staff during the Spring '08 term at Georgia Institute of Technology.

Page1 / 86

cse8803-pna-sp08-13 - Interactions between parallelism and...

This preview shows document pages 1 - 16. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online