cse8803-pna-sp08-13

# cse8803-pna-sp08-13 - Interactions between parallelism and...

This preview shows pages 1–16. Sign up to view the full content.

Interactions between parallelism and numerical stability, accuracy Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel Numerical Algorithms [L.13] Tuesday, February 19, 2008 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Source: Dubey, et al ., of Intel (2005) 2
Problem: Seamless image cloning. (Source: Pérez, et al. , SIGGRAPH 2003) 3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
… then reconstruct. (Source: Pérez, et al. , SIGGRAPH 2003) 4
Review: Multigrid 5

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Exploiting structure to obtain fast algorithms for 2-D Poisson Dense LU : Assume no structure O(n 6 ) Sparse LU : Sparsity O(n 3 ), need extra memory, hard to parallize CG : Symmetric positive deFnite O(n 3 ), a little extra memory RB SOR : ±ixed sparsity pattern O(n 3 ), no extra memory, easy to parallelize FFT : Eigendecomposition O(n 2 log n) Multigrid : Eigendecomposition O(n 2 ) [Optimal!] 6
Problem: Slow convergence RHS True solution Best possible 5-step 5 steps of Jacobi 7

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Error “frequencies” ± ( t ) = R t w · ± (0) =( I - w 2 Z Λ Z T ) t · ± (0) = Z ± I - w 2 Λ ² t Z T · ± (0) Z T · ± ( t ) = ± I - w 2 Λ ² t Z T · ± (0) ± Z T · ± ( t ) ² j = ± I - w 2 Λ ² t jj ± Z T · ± (0) ² j 8
Initial error “Rough” Lots of high frequency components Norm = 1.65 Error after 1 weighted Jacobi step “Smoother” Less high frequency component Norm = 1.06 Error after 2 weighted Jacobi steps “Smooth” Little high frequency component Norm = .92, won’t decrease much more 9

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
“Multigrids” in 2-D P (3) P ( i ) = Problem on ( 2 i +1 ) × ( 2 i ) grid T ( i ) x ( i ) = b ( i ) P (1) P (2) 10
Full multigrid algorithm FMG ± b ( k ) ,x ( k ) ² x (1) Exact solution to P (1) for i =2 to k do x ( i ) MGV ± b ( i ) ,L ( i - 1) ± x ( i - 1) ² k=2 3 4 5 11

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Interactions between parallelism and numerical stability, accuracy Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel Numerical Algorithms [L.13] Tuesday, February 19, 2008 12
Example 1: When single-precision is faster than double On STI Cell SPEED(single) = 14x SPEED(double) : 204.8 GFop/s vs. 14.6 GFop/s SPEs fully IEEE-compliant for double, but only support round-to-zero in single On regular CPUs with SIMD units SPEED(single) ~ 2x SPEED(double) SSE2: S(single) = 4 Fops / cycle vs. S(double) = 2 Fops/cycle PowerPC Altivec: S(single) = 8 Fops / cycle; no double (4 Fops / cycle) On a GPU, might not have double-precision support 13

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Example 2: Parallelism and foating- point semantics: Bisection on GPUs Bisection algorithm computes eigenvalues oF symmetric tridiagonal matrix Inner-kernel is a routine, Count(x) , which counts the number oF eigenvalues less than x Correctness 1: Count(x) must be “ monotonic Correctness 2: (Some approaches) IEEE ±P-compliance ATI Radeon X1900 XT GPU does not strictly adhere to IEEE foating-point standard, causing error in some cases But workaround possible 14
The impact of parallelism on numerical algorithms Larger problems magnify errors: Round-off, ill-conditioning, instabilities Reproducibility : a + (b + c) (a + b) + c Fast parallel algorithm may be much less stable than fast serial algorithm Flops cheaper than communication Speeds at different precisions may vary signi±cantly [ e.g. , SSE k , Cell] Perils of arithmetic heterogenity , e.g. , CPU vs. GPU support of IEEE 15

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 10/06/2010 for the course CS 8803 taught by Professor Staff during the Spring '08 term at Georgia Institute of Technology.

### Page1 / 86

cse8803-pna-sp08-13 - Interactions between parallelism and...

This preview shows document pages 1 - 16. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online