PDC N_D 2017.doc - SUBJECT CODE NO E-241 FACULTY OF ENGINEERING AND TECHNOLOGY B.E(CSE Examination Nov\/Dec 2017 Parallel Distributed Computing(REVISED

PDC N_D 2017.doc - SUBJECT CODE NO E-241 FACULTY OF...

This preview shows page 1 - 2 out of 46 pages.

SUBJECT CODE NO: E-241 FACULTY OF ENGINEERING AND TECHNOLOGY B.E.(CSE) Examination Nov/Dec 2017 Parallel & Distributed Computing (REVISED) PAPER SOLUTION Q.1 a) Explain Task graph model. ANS: The computations in any parallel algorithm can be viewed as a task dependency graph. The task-dependency graph may be either trivial, as in the case of matrix multiplication, or nontrivial. However, in certain parallel algorithms, the task dependency graph is explicitly used in mapping. In the task graph model , the interrelationships among the tasks are utilized to promote locality or to reduce interaction costs. This model is typically employed to solve problems in which the amount of data associated with the tasks is large relative to the amount of computation associated with them. Usually, tasks are mapped statically to help optimize the cost of data movement among tasks. Sometimes a decentralized dynamic mapping may be used, but even then, the mapping uses the information about the task-dependency graph structure and the interaction pattern of tasks to minimize interaction overhead. Work is more easily shared in paradigms with globally addressable space, but mechanisms are available to share work in disjoint address space. Typical interaction-reducing techniques applicable to this model include reducing the volume and frequency of interaction by promoting locality while mapping the tasks based on the interaction pattern of tasks, and using asynchronous interaction methods to overlap the interaction with computation. Examples of algorithms based on the task graph model include parallel quicksort , sparse matrix factorization, and many parallel algorithms derived via divide-and conquer decomposition. This type of parallelism that is naturally expressed by independent tasks in a task-dependency graph is called task parallelism . b) Draw & explain CUDA device memory types. ANS: DEVICE MEMORIES AND DATA TRANSFER In CUDA, the host and devices have separate memory spaces. This reflects the reality that devices are typically hardware cards that come with their own dynamic random access memory (DRAM). For example, the NVIDIA T10 processor comes with up to 4 GB (billion bytes, or gigabytes) of DRAM. In order to execute a kernel on a device, the programmer needs to allocate memory on the device and transfer pertinent data from the host memory to the allocated device memory. This corresponds to Part 1 of Figure 3.6. Similarly, after device execution, the programmer needs to transfer result data from the device memory back to the host memory and free up the device memory that is no longer needed. This corresponds to Part 3 of Figure. The CUDA runtime system provides application programming interface (API) functions to perform these activities on behalf of the programmer. From this point on, we will simply say that a piece of data is transferred from host to device as shorthand for saying that the piece of data is transferred from the host memory to the device memory. The same holds for the opposite data transfer direction. Figure 3.7 shows an overview of the CUDA device memory model for
Image of page 1
Image of page 2

You've reached the end of your free preview.

Want to read all 46 pages?

  • Spring '16
  • 213
  • CUDA

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes