CS 267
Unified Parallel C (UPC)
Kathy Yelick
http:/upc.lbl.gov
http:/upc.gwu.edu
12/27/11
CS267 Lecture: UPC
1
Whats Wrong with MPI Everywhere
We can run 1 MPI process per core (flat MPI)
- This works now on dual and quad-core machines
- It will work on

CS 267
Unified Parallel C (UPC)
Kathy Yelick
http:/upc.lbl.gov
http:/upc.gwu.edu
12/27/11
CS267 Lecture: UPC
1
Whats Wrong with MPI Everywhere
We can run 1 MPI process per core (flat MPI)
- This works now on dual and quad-core machines
- It will work on

Scalable Parallel Primitives for
Massive Graph Computation
!"#$% Bulu
University of California, Santa Barbara
1
Sources of Massive Graphs
Graphs naturally arise
from the internet and
social interactions
(WWW snapshot, courtesy Y. Hyun)
Many scientific (bi

Twelve Ways to Fool the Masses When Giving
Performance Results on Parallel Computers
David H. Bailey
June 11, 1991
Ref: Supercomputing Review, Aug. 1991, pg. 54-55
Abstract
Many of us in the field of highly parallel scientific computing
recognize that it

Programming in the
Programming
Partitioned Global Address
Space Model
Bill Carlson, IDA
Tarek El-Ghazawi, GWU
Robert Numrich, U. Minnesota
Kathy Yelick, UC Berkeley
Table of Contents
Topic
Welcome and Introductions
3 - 28
Programming with UPC
29 - 121
Pro

Programming in the
Programming
Partitioned Global Address
Space Model
Bill Carlson, IDA
Tarek El-Ghazawi, GWU
Robert Numrich, U. Minnesota
Kathy Yelick, UC Berkeley
Table of Contents
Topic
Welcome and Introductions
3 - 28
Programming with UPC
29 - 121
Pro

SIAM J. COMPUT.
Vol. 1, No. 2, June 1972
DEPTH-FIRST SEARCH AND LINEAR GRAPH ALGORITHMS*
ROBERT
TARJAN"
Abstract. The value of depth-first search or "bacltracking" as a technique for solving problems is
illustrated by two examples. An improved version of

Parallel Sparse Operations in Matlab:
Exploring Large Graphs
John R. Gilbert
University of California at Santa Barbara
Aydin Buluc (UCSB)
Brad McRae (NCEAS)
Steve Reinhardt (Interactive Supercomputing)
Viral Shah (ISC & UCSB)
with thanks to Alan Edelman (

Array Based Betweenness Centrality
Eric Robinson Northeastern University MIT Lincoln Labs Jeremy Kepner MIT Lincoln Labs
Vertex Betweenness Centrality Which Vertices are Important?
Vertex Betweenness Centrality Which Vertices are Important?
Slower Communi

Array Based Betweenness Centrality
Eric Robinson Northeastern University MIT Lincoln Labs Jeremy Kepner MIT Lincoln Labs
Vertex Betweenness Centrality Which Vertices are Important?
Vertex Betweenness Centrality Which Vertices are Important?
Slower Communi

Enabling Rapid Development and
Execu5on of Advanced GraphAnalysis
Algorithms on Very Large Graphs
Aydin Buluc, LBL (abuluc@lbl.gov)
John Gilbert and Adam Lugowski, UCSB (cfw_gilbert,alugowski@cs.ucsb.edu)
Steve Reinhardt, Mi

Introducing the Cray XMT
Petr Konecny
November 29th 2007
Agenda
Shared memory programming model
Benefits/challenges/solutions
Origins of the Cray XMT
Cray XMT system architecture
Cray XT infrastructure
Cray Threadstorm processor
Basic programming envir

CS 240A Assignment 4:
Betweenness Centrality in Graphs
Assigned April 26, 2011
Due by 11:59 pm Monday, May 9
In this assignment youll parallelize a sequential program that explores a sparse graph. The
program computes a value called the betweenness centra

CS 240A Assignment 4:
Betweenness Centrality in Graphs
Assigned April 26, 2011
Due by 11:59 pm Monday, May 9
In this assignment youll parallelize a sequential program that explores a sparse graph. The
program computes a value called the betweenness centra

CS 240A: Applied Parallel Computing / Homework 3
Assigned April 12, 2010
Due by 11:59pm Monday, April 26
You may do this homework in groups of twoin fact, I prefer that you do so. You may
form groups however you want, but I encourage groups that have stud

CS 240A Assignment 2:
Solving Poissons Equation by Conjugate Gradients
Assigned April 6, 2011
Due by 11:59 pm Monday, April 18
This assignment is to write a parallel program to implement the congugate gradient algorithm
(CG for short), which solves a syst

Challenges in
Combinatorial Scientific Computing
John R. Gilbert
University of California, Santa Barbara
Georgia Tech CSE Colloquium
March 12, 2010
1
Support: NSF, DARPA, SGI
Combinatorial Scientific Computing
Iobservedthatmostofthe
coefficientsinourmatri

% Matlab transcript from CS 240A, 20 April 2011
%
% You also need file CGmats.mat from the course web site
%
% "lugui" is part of the ncm package at
% http:/www.mathworks.com/moler/ncmfilelist.html
A = randn(8)
lugui(A)
clc
clear
load CGmats
whos
Name S

load heatproblem
whos
Name Size Bytes Class Attributes
A 10000x10000 873608 double sparse
b 10000x1 80000 double
spy(A)
B = reshape(b,100,100);
surfc(B)
t = A \ b;
T = reshape(t,100,100);
surfc(T)

% Matlab diary for CS 240A Wednesday, May 4, 2011
%
% Routines and matrix files are from the meshpart toolbox:
% http:/www.cerfacs.fr/algor/Softs/MESHPART/
% or from the CS 240A matlab directory:
% http:/www.cs.ucsb.edu/~gilbert/cs240aSpr2011/matlab/
%
%

Breadth First Search
2
s
4
5
3
8
7
6
9
1
Breadth First Search
Shortest path
from s
0
1
2
s
4
5
3
8
7
6
9
Undiscovered
Discovered
Queue: s
Top of queue
Finished
2
Breadth First Search
1
2
0
s
4
5
3
8
7
6
9
1
Undiscovered
Discovered
Queue: s 2
Top of queue

CS 240A: Solving Ax = b in parallel
Dense A: Gaussian elimination with partial pivoting (LU)
Same flavor as matrix * matrix, but more complicated
Sparse A: Gaussian elimination Cholesky, LU, etc.
Graph algorithms
Sparse A: Iterative methods Conjugate

CS 240A: Solving Ax = b in parallel
CS
Dense A: Gaussian elimination with partial pivoting (LU)
Same flavor as matrix * matrix, but more complicated
Sparse A: Gaussian elimination Cholesky, LU, etc.
Graph algorithms
Sparse A: Iterative methods Conjug

CS 240A:
Sources of Parallelism
in Physical Simulation
Based on slides from David Culler,
Jim Demmel, Kathy Yelick, et al., UCB
CS267
Parallelism and Locality in Simulation
Real world problems have parallelism and locality:
Some objects may operate inde

Sources of
Parallelism
in Physical Simulation
Based on slides from David Culler,
Jim Demmel, Kathy Yelick, et al., UCB CS267
Parallelism and Locality in Simulation
Real world problems have parallelism and locality:
Some objects may operate independently

CS 240A Final Project
What is it?
Significant parallel implementation / experiment by team of students
Performance / tuning will be part of grade
Can parallelize existing serial code (from research, public domain, etc.)
Can leverage your other classes or

CS 240A:
Parallel Prefix Algorithms
or
Tricks with Trees
Some slides from Jim Demmel,
Kathy Yelick, Alan Edelman,
and a cast of thousands
Parallel Vector Operations
Vector add: z = x + y
Embarrassingly parallel if vectors are aligned
DAXPY: z = a*x +

CS 240A:
Parallel Prefix Algorithms
or
Tricks with Trees
Some slides from Jim Demmel,
Kathy Yelick, Alan Edelman,
and a cast of thousands
Parallel Vector Operations
Vector add: z = x + y
Embarrassingly parallel if vectors are aligned
DAXPY: z = a*x +

Parallel Prefix Algorithms,
or
Tricks with Trees
Some slides from Jim Demmel,
Kathy Yelick, Alan Edelman,
and a cast of thousands
Parallel Vector Operations
Vector add: z = x + y
Embarrassingly parallel if vectors are aligned
DAXPY: z = a*x + y (a is