Performance Analysis
Introduction
Analysis of execution time for parallel algorithm to dertmine if it is worth the effort to code and debug in parallel
Understanding barriers to high performance and predict improvement
Goal: to gure out whether a progr
CS 5740
Parallel and Distributed Computing
Assignment # 3
Due Date: October 25, 2011
Cooperative Communication
You will be required to simulate heat transfer across a rod of uniform material and use message broadcasting to communicate
between processes.
Y
CS 5740
Parallel and Distributed Computing
Assignment # 4
Due Date: November 08, 2011
Sieve of Eratosthenes
Modify the parallel Sieve of Eratosthenes program to incorporate the following three improvements.
1. Delete even integers. Make sure that you do n
CS 5740
Parallel and Distributed Computing
Assignment # 5
Due Date: December 08, 2011
CUDA
1. In-place swapping can be performed by using the following function:
void swap ( int * a, int * b )
cfw_
*a ^= *b ^= *a ^= *b;
Modify the code to reverse array o
CS5740
High Performance Computing
Assignment # 1
Due Date: September 22, 2011
Important: Please do all assignments on stovokor
Starting with MPI
The goal of this homework is to become familiar with the MPI environment in stovokor cluster.
Problem: Start w
Floyds Algorithm
Introduction
Used to nd shortest paths in a weighted graph
Travel maps containing driving distance from one point to another
Represented by tables
Shortest distance from point A to point B given by intersection of row and column
Rout
Document Classication
Introduction
Search engine on web
Search directories, subdirectories for documents
Search for documents with extensions .html, .txt, and .tex
Using a dictionary of key words, create a prole vector for each document
Store prole v
CUDA
GPU
vs Multicore computers
Multicore machines
Emphasize multiple full-blown processor cores, implementing the complete instruction set of the CPU
The cores are out-of-order implying that they could be doing different tasks
They may additionally s
Motivation and History
Introduction
Speed of computers
Faster machines mean faster computation
Running a machine ten times faster implies solving bigger problems in less time
Before 2002, the performance of microprocessors increased by about 50% per y
Parallel Hardware and Parallel Software
von Neumann Architecture
Describes a computer system as a CPU (or core) connected to the main memory through an interconnection network
Executes only one instruction at a time, with each instruction operating on o
Message-Passing Programming
Introduction
MPI
Message Passing Interface standard
Most popular message-passing specication to support parallel programming
Standardized and portable to function on a wide variety of parallel computers
Allowed for the deve
Monte Carlo Methods
Introduction Solve a problem using statistical sampling First important use in development of atomic bomb during WWII Appliactions of Monte Carlo Methods Evaluating integrals of arbitrary functions of 6+ dimensions Predicting future va
Matrix-Vector Multiplication
Multiplying a square matrix by a vector
Sequential algorithm
Simply a series of dot products
Input: Matrix mat[m][n]
Vector vec[n]
Output: out[m]
for ( i = 0; i < m; i+ )
cfw_
out[i] = 0;
for ( j = 0; j < n; j+ )
out[i] += ma
CS5740
Parallel and Distributed Computing
Assignment # 2
Due Date: October 06, 2010
Important: Please do all assignments on stovokor
Using MPI
The goal of this homework is to compute the value of .
Problem: Consider a circle of diameter (2r = 1) embedded