Lecture 8 Notes

inthecircuittimingexamplethelongestpathseensofarateac

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: our three versions of code. (7) Parallelize. Even you did up to (6) and it takes too long to compute the global optimal solution, you have two choices. 1) Find a more efficient recursive relation in terms of time complexity which starts from step (1), or 2) parallelize your program. There are three types of Solve and Reduce problems that require difference emphasis on solution approaches: i. Input specific problem structure (e.g. circuit timing analysis) In problems such as circuit timing analysis, the structure of the problem is problem instance specific. For each circuit, the structure of the sub‐problem corresponds to the structure of the circuit. The key parallelization challenge is to discover parallelism in the structure, partition and load balance the Units of Execution (UEs) at runtime. The execution sequence constraints are usually trivially derived from the problem structure. To increase computation granularity, blocks of sub‐ problems are computed in serial in UE. These blocks could be discovered by lookahead of a few levels sub‐problems or by global partitioning on the entire set of the sub‐problems. Since the problem defines the number of in‐ degree of the sub‐problems, one can allocate distinct memory for storing sub‐problem solutions, such that each child can push its result to its parent without memory conflict. (Note: care must be taken with memory allocation of the result container, as memory location in the same cache line may still experience false sharing.) In the circuit‐timing example, the longest path seen so far at each gate, including gate and wire delays can be accumulated, and pushed onto the fan‐ in of the next gate. The reduction can occur at the granularity of individual blocks of execution. Natural data layouts for this type of problems usually involve a graph container with adjacency list representation storing the problem structure. Parallel graph partitioning techniques discussed in the Graph Traversal pattern can be used to increase the amount of parallelism in problem. ii. Fixed problem structure (small fan‐in, independent local sub‐problems, e.g. string edit distan...
View Full Document

This document was uploaded on 03/17/2014 for the course CS 4800 at Northeastern.

Ask a homework question - tutors are online