{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

cse8803-pna-sp08-13

cse8803-pna-sp08-13 - Interactions between parallelism and...

Info icon This preview shows pages 1–17. Sign up to view the full content.

View Full Document Right Arrow Icon
Interactions between parallelism and numerical stability, accuracy Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel Numerical Algorithms [L.13] Tuesday, February 19, 2008 1
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Source: Dubey, et al ., of Intel (2005) 2
Image of page 2
Problem: Seamless image cloning. (Source: Pérez, et al. , SIGGRAPH 2003) 3
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
… then reconstruct. (Source: Pérez, et al. , SIGGRAPH 2003) 4
Image of page 4
Review: Multigrid 5
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Exploiting structure to obtain fast algorithms for 2-D Poisson Dense LU : Assume no structure O(n 6 ) Sparse LU : Sparsity O(n 3 ), need extra memory, hard to parallize CG : Symmetric positive definite O(n 3 ), a little extra memory RB SOR : Fixed sparsity pattern O(n 3 ), no extra memory, easy to parallelize FFT : Eigendecomposition O(n 2 log n) Multigrid : Eigendecomposition O(n 2 ) [Optimal!] 6
Image of page 6
Problem: Slow convergence RHS True solution Best possible 5-step 5 steps of Jacobi 7
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Error “frequencies” ( t ) = R t w · (0) = ( I - w 2 Z Λ Z T ) t · (0) = Z I - w 2 Λ t Z T · (0) Z T · ( t ) = I - w 2 Λ t Z T · (0) Z T · ( t ) j = I - w 2 Λ t jj Z T · (0) j 8
Image of page 8
Initial error “Rough” Lots of high frequency components Norm = 1.65 Error after 1 weighted Jacobi step “Smoother” Less high frequency component Norm = 1.06 Error after 2 weighted Jacobi steps “Smooth” Little high frequency component Norm = .92, won’t decrease much more 9
Image of page 9

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
“Multigrids” in 2-D P (3) P ( i ) = Problem on ( 2 i + 1 ) × ( 2 i + 1 ) grid T ( i ) x ( i ) = b ( i ) P (1) P (2) 10
Image of page 10
Full multigrid algorithm FMG b ( k ) , x ( k ) x (1) Exact solution to P (1) for i = 2 to k do x ( i ) MGV b ( i ) , L ( i - 1) x ( i - 1) k=2 3 4 5 11
Image of page 11

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Interactions between parallelism and numerical stability, accuracy Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel Numerical Algorithms [L.13] Tuesday, February 19, 2008 12
Image of page 12
Example 1: When single-precision is faster than double On STI Cell SPEED(single) = 14x SPEED(double) : 204.8 Gflop/s vs. 14.6 Gflop/s SPEs fully IEEE-compliant for double, but only support round-to-zero in single On regular CPUs with SIMD units SPEED(single) ~ 2x SPEED(double) SSE2: S(single) = 4 flops / cycle vs. S(double) = 2 flops/cycle PowerPC Altivec: S(single) = 8 flops / cycle; no double (4 flops / cycle) On a GPU, might not have double-precision support 13
Image of page 13

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Example 2: Parallelism and floating- point semantics: Bisection on GPUs Bisection algorithm computes eigenvalues of symmetric tridiagonal matrix Inner-kernel is a routine, Count(x) , which counts the number of eigenvalues less than x Correctness 1: Count(x) must be “ monotonic Correctness 2: (Some approaches) IEEE FP-compliance ATI Radeon X1900 XT GPU does not strictly adhere to IEEE floating-point standard, causing error in some cases But workaround possible 14
Image of page 14
The impact of parallelism on numerical algorithms Larger problems magnify errors: Round-off, ill-conditioning, instabilities Reproducibility : a + (b + c) (a + b) + c Fast parallel algorithm may be much less stable than fast serial algorithm Flops cheaper than communication Speeds at different precisions may vary significantly [ e.g. , SSE k , Cell] Perils of arithmetic heterogenity , e.g. , CPU vs. GPU support of IEEE 15
Image of page 15

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
A computational paradigm Use fast algorithm that may be unstable (or “less” stable) Check result at the end If needed, re-run or fix-up using slow-but-safe algorithm 16
Image of page 16
Image of page 17
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern