Lecture 8 Notes

Invariant 1

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ce) For problems with fixed structure, communication and computation can be optimized at compile time. The key parallelization problem here is to find the optimal granularity, balance computation and communication, and reduce synchronization overhead. The optimal UE granularity can be determined by autotuning for register/cache size and memory prefetch capabilities for a particular platform. Ghost cells can be used around a partition to bundle communications between UE to amortize overhead. The existence of regular structures can allow interchanging phases of load balanced computation and communication to take place, such that results of sub‐problems can be pulled by the parents after a global barrier. The techniques used are very similar to those of Structured Grid pattern. iii. Fixed problem structure (large fan‐in, inter‐dependent sub‐problems, e.g. Viterbi algorithm on trellis) Sometime the algorithm requires entire levels of sub‐problems to be solved, where all sub‐problems at each level depends on the solution of the previous level. Viterbi algorithm for finding the most likely sequence in a symbol space is an example. In this case, the problem constraints naturally imply the use of barrier between iteration for synchronization. The key parallelization challenge is to efficiently handle the special all‐to‐all reduction in each iteration where the transition edges may carry different weights. Typically, there will be some default edge weights, but a significant portion of the edges that will have unique weights. The solution often calls for sink‐indexed adjacency‐list sparse representation for transition edges data layout. Sink‐index allows the current iteration to efficiently pull results from previous iteration, and the adjacency‐list sparse representation allow the use of Sparse Matrix pattern techniques for parallelization. Invariant 1. Pre­condition. The problem has a provable optimal substructure, which can be used to get a globally optimal solution 2. Invariant. Locally optimal solutions are compu...
View Full Document

Ask a homework question - tutors are online