Unformatted Document Excerpt
Coursehero >>
California >>
USC >>
CS 596
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Landscape The of Parallel Computing Research: A View from Berkeley 2.0
Krste Asanovic, Ras Bodik, Jim Demmel, John Kubiatowicz, Kurt Keutzer, Edward Lee, George Necula, Dave Patterson, Koushik Sen, John Shalf, John Wawrzynek, and Kathy Yelick
June, 2007
Parallel Revolution, Ready or Not
Power Wall + Memory Wall + ILP Wall = Brick Wall
End
of uniprocessors
New Moores Law is 2X cores per chip every technology generation, same clock
not
a breakthrough but a retreat from even greater challenges that thwart efficient silicon implementation of traditional uniprocessor architectures
Sea change for HW & SW industries since changing programmer model, responsibilities HW/SW industries bet their future that parallelism will succeed despite track record of failures
Dead
Computer Society: Illiac IV, Encore, Sequent, Transputer, Thinking Machines, MasPar, NCUBE, Kendall Square Research, Convex, (SGI)
2
Need a Fresh Approach
Berkeley researchers from many backgrounds meeting since Feb. 2005 to discuss parallelism
Krste Asanovic, Ras Bodik, Jim Demmel, John Kubiatowicz, Kurt Keutzer, Edward Lee, George Necula, Dave Patterson, Koushik Sen, John Shalf, John Wawrzynek, Kathy Yelick, Circuit design, computer architecture, massively parallel computing, computer-aided design, embedded hardware and software, programming languages, compilers, scientific programming, and numerical analysis Tried to learn from successes in parallel embedded (BWRC) and high performance computing (LBNL) Berkeley View Tech. Report: see view.eecs.berkeley.edu
This talk next step in ideas: Berkeley View 2.0 A. The computing platforms of the future B. Novel approach to design of future systems C. 7 questions to frame parallel research
3
Bells Evolution Of Computer Classes
Technology enables two paths:
1. increasing performance, same cost (and form factor)
onstant price, increasing performance Mainframes
Log price
WS PC
Time
2010s
Bells Evolution Of Computer Classes
Mainframe
Technology enables two paths: 2. constant performance, decreasing cost (and form factor)
Mini
Log price
WS PC Laptop
Handset
Ubiquitous
Time
A: Re-inventing Client/Server
The Datacenter is the Computer
Building
sized computers: Google, MS
The Laptop/Handheld is the Computer
Dell no. laptops > desktops? 06 Profit? 1B Cell phones/yr, increasing in function Apple iPhone Otellini demoed "Universal Communicator
08:
Combination
cell phone, PC and video device
Laptop/Handheld as future client, Datacenter as future server
6
Laptop/Handheld Reality
Use with large displays at home or office What % time disconnected? 10%? 30% 50%?
Disconnectedness
due to Economics
Cell towers and support system expensive to maintain charge for use to recover costs costs to communicate Policy varies, but most countries allow wireless investment where make most money Cities well covered Rural areas will never be covered
Disconnectedness
due to Technology
No coverage deep inside buildings Satellite communication uses up batteries
Need computation & storage in L/H
7
B:Try Application Driven Research?
Conventional Wisdom in CS Research
Users
dont know what they want Computer Scientists solve individual parallel problems with clever language feature (e.g., dataflow), new compiler pass, or novel hardware widget (e.g., SIMD) Approach: Push (foist?) CS nuggets/solutions on users Problem: Stupid users dont learn/use proper solution
Another Approach
Work
with domain experts developing compelling apps Provide HW/SW infrastructure necessary to build, compose, and understand parallel software written in multiple languages Research guided by commonly recurring patterns actually observed in compelling application codes
8
C: 7 Questions for Parallelism
1. 2.
3.
4.
5. 6.
7.
Applications: What are the apps? What are kernels of apps? Hardware: What are the HW building blocks? How to connect them? Programming Model & Systems Software: How to describe apps and kernels? (Inspired by a view of the How to program the HW? Golden Gate Bridge from Berkeley) Evaluation: How to measure success?
9
Apps and Kernels Tower: What are the problems?
Who needs 100 cores to run M/S Word?
Failure
of imagination? (CS education?) Need compelling apps that use 100s of cores
What about parallel benchmarks?
Few
examples (e.g., SPLASH, NAS)
Optimized to old models, languages, architectures
How
invent parallel systems of future when tied to old code, programming models of the past?
10
Coronary Heart Disease
Before
After
Modeling to help patient compliance?
400k deaths/year, 16M w. symptom, 72MBP
Massively parallel, Real-time variations
Blood pressure, activity, habitus, cholesterol
11
CFD FE solid (non-linear), fluid (Newtonian), pulsatile
Content Based Image Retrieval
Query by example Relevance Feedback
Image Database
Similarity Metric
Candidate Results
Final Result
Built around Key Characteristics of personal databases
Very large number of pictures (>5K) Non labeled images Many pictures of few people Complex pictures including people, events,
1000s of images
and objects
places,
12
Compelling Laptop/Handheld Apps
Health Coach
Since laptop/handheld always with you, Record images of all meals, weigh plate before and after, analyze calories consumed so far
What if I order a pizza for my next meal? A salad?
Since laptop/handheld always with you, record amount of exercise so far, show how body would look if maintain this exercise and diet pattern next 3 months
What would I look like if I regularly 2 miles? 4 miles?
Face Recognizer/Name Whisperer
Laptop/handheld scans faces, matches image database, whispers name in ear (relies on Content Based Image Retreival)
13
Compelling Laptop/Handheld Apps
Meeting Diarist
Laptops/
Handhelds at meeting coordinate to create speaker identified, partially translated text diary of meeting
Teleconference speaker identifier, speech helper
used for teleconference, identifies who is speaking, closed caption hint of what being said
L/Hs
14
Compelling Laptop/Handheld Apps
Music Enhancer
Enhanced sound delivery systems for home sound systems using large microphone and speaker arrays Laptop/Handheld recreate 3D sound over ear buds
Laptop/Handheld as accelerator for hearing aide New composition and performance systems beyond keyboards New forms of bodily engagement when performing musical material Input device for Laptop/Handheld
Hearing Augmenter
Novel Instrument User Interface
Berkeley Center for New Music and Audio Technology (CNMAT) created a compact loudspeaker array in a single location with a radiation pattern controlled by onboard FPGA. 10inch-diameter icosahedron incorporating 120 1.25inch drivers developed by Meyer Sound, each driven from a separate audio channel, plus circuit boards for all of the control and class-D amplification functions. Laptop controls array via Gigabit Ethernet.
15
Apps and Kernels Tower: What are the problems?
Who needs 100 cores to run M/S Word?
Failure
of imagination (CS education?) Need compelling apps that can use 100 cores
What about parallel benchmarks?
Few
examples (e.g., SPLASH, NAS)
Optimized to old models, languages, architectures
How
invent parallel systems of future when tied to old code, programming models of the past?
16
Can find patterns widely used?
1. 2. 3. 4. 5. 6.
Look for common patterns of communication and computation Embedded Computing (EEMBC benchmark) Desktop/Server Computing (SPEC2006) Data Base / Text Mining Software
Advice from Jim Gray of Microsoft and Joe Hellerstein of UC
Games/Graphics/Vision Machine Learning
Advice from Mike Jordan and Dan Klein of UC Berkeley
High Performance Computing (Original 7 Dwarfs) Result: 13 Dwarfs
17
Dwarf Popularity (Red Hot Blue Cool)
Embed SPEC 1 2 3 4 5 6 7 8 9 10 11 12 13 Finite State Mach. Combinational Graph Trav ersal Structured Grid Dense Matrix Sparse Matrix Spectral (FFT) Dynamic Prog N-Body MapReduce Backtrack/ B&B Graphical Models Unstructured Grid
DB
Games
ML
HPC
18
13 Dwarfs (so far)
1. Finite State Machine 8. 2. Combinational Logic 9. 3. Graph Traversal 10. 4. Structured Grids 11. 5. Dense Linear Algebra 6. Sparse Linear Algebra 12. 7. Spectral Methods (FFT) 13. Dynamic Programming N-Body Methods MapReduce Back-track/ Branch & Bound Graphical Model Inference Unstructured Grids
Claim: parallel arch., lang., compiler must do at least these well to do future parallel apps well Note: MapReduce is embarrassingly parallel; perhaps FSM is embarrassingly sequential?
19
Dwarf Popularity (Red Hot Blue Cool)
How do compelling apps relate to 13 dwarfs?
Games Embed SPEC
QuickTime and a TIFF (LZW) decompressor are needed to see this picture.
HPC
DB
ML
Health Image
Speech Music
Finite State Mach. Combinational Graph Trav ersal Structured Grid Dense Matrix Sparse Matrix Spectral (FFT) Dynamic Prog N-Body MapReduce Backtrack/ B&B Graphical Models Unstructured Grid
20
Dwarf Popularity (Red Hot Blue Cool)
Stanford Transactional Apps for Multi Processing?
Kmeans Clustering
(http://stamp.stanford.edu)
Speech Games Embed Health Image Music SPEC HPC DB ML
Finite State Mach. Combinational Graph Trav ersal Structured Grid Dense Matrix Sparse Matrix Spectral (FFT) Dynamic Prog N-Body MapReduce Backtrack/ B&B Graphical Models Unstructured Grid
Vacation Reservation System
21
Gene Sequencing
Delaunay mesh generatiom
4 Valuable Roles of Dwarfs
1.
2. 3. 4.
Anti-benchmarks not tied to code or language artifacts encourage innovation in algorithms, languages, data structures, and/or hardware Universal, understandable vocabulary to talk across disciplinary boundaries Define building blocks for creating libraries & frameworks that cut across app domains They decouple research, allowing analysis of HW & SW engineering and programming without waiting years for full apps
22
Berkeley View 2.0: Apps Tower
Application-Driven Research vs. CS Solution-Driven Research Drill down on 4 app areas to guide research agenda Dwarfs to represent broader set of apps to guide research agenda Dwarfs help break through traditional interfaces
Benchmarking,
multidisciplinary conversations, target for libraries, and parallelizing parallel research
23
7 Questions for Parallelism
1. 2.
3. 4.
5.
6.
7.
Applications: What are the apps? What are kernels of apps? Hardware: What are the HW building blocks? How to connect them? Programming Model & Systems Software: How to describe apps and kernels? How to program the HW? Evaluation: How to measure success?
(Inspired by a view of the Golden Gate Bridge from Berkeley)
24
How do we describe apps and kernels?
Observation: Use Dwarfs. Dwarfs are of 2 types
Algorithms in the dwarfs can either be implemented as: Compact parallel computations within a traditional library Compute/communicate pattern implemented as framework
Libraries
Dense matrices Sparse matrices Spectral Combinational Finite state machines
Patterns/Frameworks
MapReduce Graph traversal, graphical models Dynamic programming Backtracking/B&B N-Body (Un) Structured Grid
Computations may be viewed a multiple levels: e.g., an FFT library may be built by instantiating a Map-Reduce framework, mapping 1D FFTs and then transposing (generalize reduce)
25
Composing dwarfs to build apps
Any parallel application of arbitrary complexity may be built by composing parallel and serial components
code invoking parallel libraries, e.g., FFT, matrix ops.,
Parallel
Serial
patterns with serial plug-ins e.g., MapReduce
Composition is hierarchical
26
Programming the Hardware
2 types of programmers 2 layers Productivity Layer (90% of todays programmers)
Domain
experts / Nave programmers productively build parallel apps using frameworks & libraries Frameworks & libraries composed to form app frameworks
Efficiency Layer (10% of todays programmers)
Expert programmers build: Frameworks: software that supports general structural patterns of computation and communication: e. g. MapReduce Libraries: software that supports compact computational expressions: e.g. Sketch for Combinational or Grid computation Bare
metal efficiency possible at Efficiency Layer
Effective composition techniques allows the efficiency programmers to be highly leveraged
27
Content Based Image Retrieval Extract Features
250 200 150 100 50 0
RGB Histogram
Summarize image meaningfully and distinctively
and
Global Segment RGB and HSV Histograms Average gray value Color moments (mean, variance, kurtosis) DCT coefficients Edge direction histogram RGB and HSV Correlagrams Wavelet transforms(tree structured and pyramid) Scale Invariant Feature Transform histogram Color Coherence Vector Geometry from extracted faces
28
Coordination & Composition in CBIR Application
Parallelism in CBIR is hierarchical Mostly independent tasks/data with combining
DCT extractor feature extraction Face Recog ? DCT Output: stream of feature vectors data parallel map DCT over tiles combine reduction on histograms from each tile combine concatenate feature vectors output one histogram (feature vector)
29
Input: stream of images
stream parallel over images
DWT ?
task parallel over extraction algorithms
Coordination & Composition Language
Coordination & Composition language for productivity 2 key challenges
Correctness: ensuring independence using decomposition operators, copying and requirements specifications on frameworks 2. Efficiency: resource management during composition; domain-specific
1.
OS/runtime support
Language control features hide core resources, e.g.,
Map DCT over tiles in language becomes set of DCTs/tiles per core Hierarchical parallelism managed using OS mechanisms
Data structure hide memory structures
Partitioners on arrays, graphs, trees produce independent data Framework interfaces give independence requirements: e.g., map-reduce function must be independent, either by copying or application to partitioned data object (set of tiles from partitioner)
30
How do we program the HW? What are the problems?
For parallelism to succeed, must provide productivity, efficiency, and correctness
simultaneously for scalable hardware
Correctness Productivity
Cant make SW productivity even worse! Why do in parallel if efficiency doesnt matter?
Most programmers not ready to produce correct parallel programs
IBM
usually considered orthogonal problem slows if code incorrect or inefficient
SP customer escalations: concurrency bugs worst, can take months to fix
31
Ensuring Correctness
Productivity Layer:
Enforce independence of tasks using decomposition (partitioning) and copying operators Goal: Remove concurrency errors (nondeterminism from execution order, not just low level data races)
E.g., the race-free program atomic delete + atomic insert does not compose to an atomic replace; need higher level properties, not solved with transactions (or locks)
Efficiency Layer: Check for subtle concurrency bugs (races, deadlocks, and so on)
Mixture of verification and automated directed testing Error detection on framework and libraries; some techniques applicable to third-party software
32
Support Software: What are the problems?
Compilers and Operating Systems are large, complex, resistant to innovation
(Worst possible time to be resistant to innovation!)
Takes a decade for compiler innovations to show up in production compilers? Time for idea in SOSP to appear in production OS? Traditional OSes brittle, insecure, memory hogs
Traditional
monolithic OS image uses lots of precious memory * 100s - 1000s times (e.g., AIX uses GBs of DRAM / CPU)
33
Deconstructing Operating Systems
Resurgence of interest in virtual machines
Hypervisor:
thin SW layer btw guest OS and HW
Future OS: libraries where only functions needed are linked into app, on top of thin hypervisor providing protection and sharing of resources
Opportunity
for OS innovation
Leverage HW partitioning support for very thin hypervisors, and to allow software full access to hardware within partition
34
21st Century Code Generation
Problem: generating optimal code is like searching for a needle in a haystack New approach: Auto-tuners 1st run variations of program on computer to heuristically search for best combinations of optimizations (blocking, padding, ) and data structures, then produce C code to be compiled for that computer
Search space for matmul block sizes: Axes are block dim Temp is speed
E.g., PHiPAC (BLAS), Atlas (BLAS), Spiral (DSP), FFT-W
Example: Sparse Matrix (SPMv) for 3 multicores
Fastest SPMv: 2X OSKI/PETSc Intel Clovertown, 4X AMD Opteron Optimization space: register blocking, cache blocking, TLB blocking, prefetching/DMA options, NUMA, BCOO v. BCSR data structures, 16b v. 32b indices,
35
Example: Sparse Matrix * Vector
Name
Chips*Cores Architecture
Clovertown
Opteron
Cell
Clock Rate Peak MemBW Peak GFLOPS Nave SPMv
(median of many matrices)
2*4 = 8 2*2 = 4 1*8 = 8 4-/3-issue, 2-/1-SSE3, 2-VLIW, SIMD, OOO, caches, prefetch local store, DMA 2.3 GHz 2.2 GHz 3.2 GHz 21.3 GB/s 21.3 25.6 GB/s 74.6 GF 17.6 GF 14.6 (DP Fl. Pt.) 1.0 GF 0.6 GF -1% 1.5 GF 1.5X 3% 1.9 GF 3.2X -3.4 GF
Efficiency % Autotune SPMv Auto Speedup
36
Software Span Summary
Greater productivity and efficiency for SPMv?
Parallelizing compiler + multicore + multilevel caches + SW and HW prefetching Autotuner + multicore + local store + DMA
SPMv: Easier to autotune single local store + DMA than multilevel caches + HW and SW prefetching
Deconstructed OS relying on thin hypervisors Productivity Layer & Efficiency Layer C&C Language to compose Libraries/Frameworks
37
7 Questions for Parallelism
1. 2.
3.
4.
5. 6.
7.
Applications: What are the apps? What are kernels of apps? Hardware: What are the HW building blocks? How to connect them? Programming Model & Systems Software: How to describe apps and kernels? (Inspired by a view of the How to program the HW? Golden Gate Bridge from Berkeley) Evaluation: How to measure success?
38
Hardware Tower: What are the problems?
Multicore (2, 4) v. Manycore (64, 128)? How can novel architectural support improve productivity, efficiency, and correctness for scalable hardware?
Efficiency
Also, power, design and verification costs, low yield, higher error rates
instead of performance to capture energy as well as performance
39
HW Solution: Small is Beautiful
Expect modestly pipelined (5- to 9-stage) CPUs, FPUs, vector, SIMD PEs
Small cores not much slower than large cores Lower threshold and supply voltages lowers energy per op Cisco Metro 188 CPUs + 4 spares; Sun Niagara sells 6 or 8 CPUs
Parallel is energy efficient path to performance:CV2F
Redundant processors can improve chip yield
Small, regular processing elements easier to verify One size fits all?
Law Heterogeneous processors Special function units to accelerate popular functions
Amdahls
40
HW features supporting Parallel SW
Want Composable Primitives, Not Packaged Solutions
Transactional
Memory is usually a Packaged Solution
Partitions Fast Barrier Synchronization & Atomic Fetch-and-Op Active messages plus user-level event handling
Used
by parallel language runtimes to provide fast communication, synchronization, thread scheduling
Configurable Memory Hierarchy (Cell v. Clovertown)
Can
configure on-chip memory as cache or local store Programmable DMA to move data without occupying CPU Cache coherence: Mostly HW but SW handlers for complex cases Hardware logging of memory writes to allow rollback
41
Partitions
InfiniCore chip with 16x16 tile array
Partition: hardware-isolated group
Chip divided into hardware-isolated partition, under control of supervisor software User-level software has almost complete control of hardware inside partition Power-of-2 size, naturally aligned
42
Multiple Fast Barrier Networks
Tap for 8-way partition Tap for 4-way partition Distributed Barrier Logic
Multiple parallel barrier networks (1 bit each) at each level Signals propagate combinationally, with varying clock divisor (optimize latency not bandwidth) Hypervisor sets taps saying where partition sees barrier
Tiles
43
7 Questions for Parallelism
1. 2.
3.
4.
5. 6.
Applications: What are the apps? What are kernels of apps? Hardware: What are the HW building blocks? How to connect them? Programming Model & Systems Software: How to describe apps and kernels? How to program the HW?
Evaluation: 7. How to measure success?
(Inspired by a view of the Golden Gate Bridge from Berkeley)
44
Measuring Success: What are the problems?
1.
2.
Only companies can build HW, and it takes years Software people dont start working hard until hardware arrives
3 months after HW arrives, SW people list everything that must be fixed, then we all wait 4 years for next iteration of HW/SW
3.
4.
How to quickly get 100-1000 CPU systems in hands of researchers to let them innovate in algorithms, compilers, languages, OS, architectures, ASAP? Can avoid waiting 4 years + 3 months between HW/SW iterations?
45
Build Academic Manycore from FPGAs
As 10 CPUs will fit in Field Programmable Gate Array (FPGA), 1000-CPU system from 100 FPGAs?
8 32-bit simple soft core RISC at 100MHz in 2004 (Virtex-II) FPGA generations every 1.5 yrs; 2X CPUs, 1.2X clock rate
HW research community does logic design (gate shareware) to create out-of-the-box, Manycore
E.g., 1000 processor, standard ISA binary-compatible, 64-bit, cache-coherent supercomputer @ 150 MHz/CPU in 2007 Ideal for heterogeneous chip architectures RAMPants: 10 faculty at Berkeley, CMU, MIT, Stanford, Texas, and Washington
Research Accelerator for Multiple Processors as a vehicle to lure more researchers to parallel challenge and decrease time to parallel salvation
46
768 CPU RAMP Blue
(Wawrzynek, Krasnov, at Berkeley)
768 = 12 32-bit RISC cores / FPGA, 4 FGPAs/board, 16 boards, $10k/bd
Simple
MicroBlaze soft cores @ 90 MHz
Full star-connection between modules
1008
node RAMP Blue soon (21 boards)
NASA Advanced Supercomputing (NAS) Parallel Benchmarks (all class S)
UPC versions (C plus shared-memory abstraction) CG, EP, IS, MG; DEMO?
RAMPants creating HW & SW for manycore community using next gen FPGAs
Chuck Thacker & Microsoft designing next boards 3rd party to manufacture and sell boards: 1H08
47
$1B+ Industry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Timesharing Client/server Graphics Entertainment Internet LANs Workstations GUI VLSI design RISC processors Relational DB Parallel DB Data mining Parallel computing RAID disk arrays Portable comm. World Wide Web Speech recognition Broadband last mile
Total
7
C al C al t C ec ER h CN M U Il lin M oi IT s P ur R du oc e St he an s t To f o . ky rd Uo C L UA ta Wh is c.
Innovation in Information Technology,
Source: National Research Council Press, 2003.
48
Universities help start 19 1B$+ Industries
2
2
3
2
5
1
1
4
1
2
1
2
Change directions of research funding for parallelism?
Historically: Get leading experts per discipline (across US) working together to work on parallelism
Cal UIUC MIT Application Language Compiler Libraries Stanford
Networks
Architecture Hardware CAD
49
Change directions of research funding for parallelism?
To increase crossdisciplinary bandwidth, get experts per site collaborating on (app-driven) parallelism
Cal UIUC MIT
Application Language Compiler
Stanford
Libraries
Networks Architecture Hardware CAD
50
Summary: A Berkeley View 2.0
Our industry has bet its future on parallelism (!) Goal: Productive, Efficient, Correct Parallel Progs while doubling number of cores every 2 years (!) Try Apps-Driven vs. CS Solution-Driven Research
Laptop-Handhelds/Datacenters as modern Client/Server
13 Dwarfs as lingua franca , anti-benchmarks Composition is critical to parallel computing success
Composition is open problem for Transactional Memory Use C&C Lang to reuse experts code Create libraries, frameworks, for use in productivity layer
Productivity layer for 90% todays programmers
Efficiency layer for 10% todays programmers
Autotuners over Parallelizing Compilers OS & HW: Composable Primitives over CS Solutions
51
Acknowledgments
Berkeley View material comes from discussions with:
Profs
Krste Asanovic, Ras Bodik, Jim Demmel, Kurt Keutzer, John Kubiatowicz, John Wawrzynek, Kathy Yelick UCB Grad students Bryan Catanzaro, Jike Chong, Joe Gebis, William Plishker, and Sam Williams LBNL:Parry Husbands, Bill Kramer, Lenny Oliker, John Shalf
See view.eecs.berkeley.edu RAMP based on work of RAMP Developers:
Krste
Asanovic (MIT/Berkeley), Derek Chiou (Texas), James Hoe (CMU), Christos Kozyrakis (Stanford), Shih-Lien Lu (Intel), Mark Oskin (Washington), David Patterson (Berkeley, Co-PI), and John Wawrzynek (Berkeley, PI)
52
See ramp.eecs.berkeley.edu
Textbooks related to the document above:
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more.
Course Hero has millions of course specific materials providing students with the best way to expand
their education.
Below is a small sample set of documents:
USC - CS - 596
IntroductionComputer Experiment Computational science: An area of scientific investigation, where computers play a central role. Scientific computing: An area in computer science to support computational sciences by innovative use of computer syst
USC - CS - 596
Molecular Dynamics BasicsBasics of the molecular-dynamics (MD) method1-3 are described, along with corresponding data structures in program, md.c.Newtons Second Law of MotionTRAJECTORY, COORDINATE, AND ACCELERATION Physical system = a set of ato
USC - CS - 596
Linked-List Cell Molecular DynamicsThis chapter explains the linked-list cell MD algorithm, the computational time of which scales as O(N) for N atoms. We will replace function computeAccel() in the O(N2) program md.c by this algorithm. Recall that
USC - CS - 596
Message Passing Interface (MPI) ProgrammingMPI (Message Passing Interface) is a standard message passing system that enables us to write and run applications on parallel computers. In 1992, MPI Forum was formed to develop a portable message passing
USC - CS - 596
Message Passing Interface (MPI) Programming Aiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of Sout
USC - CS - 596
Calculating in Parallel Using MPI Aiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of Southern Cali
USC - CS - 596
Parallel Molecular DynamicsThis chapter explains the example parallel MD program, pmd.c, in detail.Spatial Decomposition Spatial decomposition: The physical system to be simulated is partitioned into subsystems of equal volume. Processors in a pa
USC - CS - 596
Visualizing Molecular Dynamics IWindowingGraphic LibraryOpenGL OpenGL: A standard, hardware-independent interface to graphics hardware (e.g., SGI InfiniteReality2 Graphics Engine). A set of graphics routines defined on the OpenGL virtual graphics
USC - CS - 596
Quantum Dynamics BasicsIn this chapter, we will simulate the dynamics of a particle, such as an electron, which follows the law of quantum mechanics [1]. Basics of the quantum-dynamics (QD) method [2-5] are described, along with corresponding data s
USC - CS - 596
Parallel Quantum Dynamics Aiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of Southern California Em
USC - CS - 596
Hybrid MPI+OpenMP Parallel MD Aiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of Southern Californi
USC - CS - 596
Optimizing Molecular Dynamics Aiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of Southern Californi
USC - CS - 596
Advanced Topics in Parallel Molecular Dynamics Aiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of S
USC - CS - 596
Divide-&-Conquer Parallelism Aiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of Southern California
USC - CS - 596
CSCI596: Scientic Computing & VisualizationSummaryAiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University o
USC - CS - 596
Least Square Fit of a LineProblem: Given a set of N pairs of numbers, {(xi, yi) | i = 1, ., N}, what is the best linear fit, y = ax + b,Nin the sense that it minimizes the square error, S = # ( axi + b " yi ) .i=12Answer: S is a quadratic fu
USC - CS - 596
How to Install OpenGLWINDOWSInstall your favorite integrated development environment (IDE). This tutorial assumes that you have Microsoft Visual Studio installed on your machine. 1. Install OpenGL OpenGL software runtime is included as part of oper
USC - CS - 596
Scientific Virtual-Reality ApplicationsThis lecture note describes interactive three-dimensional visualization of scientific datasets, using the earliest immersive and interactive visualization platforms, CAVE and ImmersaDesk, and the associated CAV
USC - CS - 596
Massive Dataset VisualizationAiichiro Nakano Collaboratory for Advanced Computing & Simulations Dept. of Computer Science, Dept. of Physics & Astronomy, Dept. of Chemical Engineering & Materials Science University of Southern California Email:
USC - CS - 596
Ashish Sharma sharmaa@usc.edu Collaboratory for Advanced Computing and Simulations Dept. of Computer Science Aiichiro Nakano anakano@usc.edu Dept. of Computer Science Rajiv K. Kalia rkalia@usc.edu Dept. of Physics & Astronomy Priya Vashishta priyav@u
USC - CS - 596
A Spatially Decomposed Parallel Visualization AlgorithmInternational Journal of Computational Science 1992-6669 (Print) 1992-6677 (Online) Global Information Publisher 2007, Vol. 1, No. 4, 407-421ParaViz: A Spatially Decomposed Parallel Visualiza
USC - CS - 653
International Journal of Parallel Programming, Vol. 29, No. 3, 2001Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings 1John Mellor-Crummey, 2, 4 David Whalley, 3 and Ken Kennedy 2Received Octo
USC - CS - 653
124IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,VOL. 13,NO. 1, JANUARY/FEBRUARY 2001Analysis of the Clustering Properties of the Hilbert Space-Filling CurveBongki Moon, H.V. Jagadish, Christos Faloutsos, Member, IEEE, and Joel H. Salt
USC - CS - 653
Cache-Oblivious AlgorithmsEXTENDED ABSTRACT SUBMITTED FOR PUBLICATION.Matteo Frigo Charles E. Leiserson Harald Prokop Sridhar Ramachandran MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 021393 0 8 ( B8 & 8 3 & 2
USC - CS - 596
CSCI596 (Scientific Computing and Visualization) Assignment 1Molecular Dynamics Due: September 15 (Mon), 2008, at the class The purpose of this assignment is to familiarize yourself with the simple molecular dynamics (MD) program, md.c (and its linke
USC - CS - 596
CSCI596 Assignment 2Message Passing Interface Due: September 24 (Wed), 2008, at the class In this assignment, you will write your own global summation program (equivalent to using MPI_Send and MPI_Recv. Your program should run on P = 2l processors (l
USC - CS - 596
CSCI596 Assignment 3Parallel Computation of Due: October 1 (Wed), 2008 Part I: Programming Write a message passing interface (MPI) program, global_pi.c, to compute the value of based on the lecture note on Parallel Computation of Pi and using the g
USC - CS - 596
CSCI596 Assignment 4Parallel Molecular Dynamics Due: October 8 (Wed), 2008 The purpose of this assignment is to get familiar with the message-passing scheme used in the parallel molecular dynamics (MD) program, pmd.c, and asynchronous messages in the
USC - CS - 596
CSCI596 Assignment 5: Molecular Dynamics Animation Due: October 22 (Wed), 2008 Combine md.c and atomv.c to write a C/OpenGL program that animates molecular dynamics (MD) simulation in real time. (Follow the lecture note on Visualizing Molecular Dynam
USC - CS - 596
CSCI596 Assignment 6: Parallel Quantum Dynamics Due: November 3 (Mon), 2008 (at the class) Parallelize the one-dimensional quantum dynamics (QD) simulation program qd1.c, using MPI. Use spatial decomposition, so that processor p [0, P-1] (P is the n
USC - CS - 596
CSCI596 (Scientific Computing and Visualization) Assignment 7 Hybrid MPI+OpenMP Parallel Molecular Dynamics Due: November 12 (Wed), 2008 (at the class) 1. Write a hybrid MPI+OpenMP parallel molecular dynamics (MD) program (name it hmd.c), starting fr
USC - CS - 596
CSCI596 (Scientific Computing and Visualization) Final Project Anything Related to What You Have Learned in the Class Due: December 17 (Wed), 2008 Submit the following project by Wednesday, December 17. In addition, at 3:30-4:50pm on Wednesday, Decem
USC - CS - 596
Guidelines for the Final Project I. II. Programming Critical review (>2-3 pages) Dont repeat what the paper says. 1. 2. 3. 4. * III. Problem: Whats the problem? Method: How to solve it? Results: Bottom line? So what? Critique: Flaw? Improvement (how
USC - CS - 653
84Computer Physics Communications 63 (1991) 8494 North-HollandVisualizing quantum scattering on the CM-2 supercomputerJohn L. Richardson1Received 2 February 1990 Thinking Machines Corporation, 245 First Street, Cambridge, MA 02142-1214, USAWe
USC - CS - 653
CSCI653: High Performance Computing & SimulationsAiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of
USC - CS - 653
Molecular Dynamics I: PrinciplesBasics of the molecular-dynamics (MD) method1-3 are described, along with corresponding data structures in program, md.c.Newtons Second Law of MotionTRAJECTORY, COORDINATE, AND ACCELERATION Physical system = a set
USC - CS - 653
Minimal Complex Analysis Complex function: A mapping from a complex variable z = x + iy (i = f(z) C."1) to a complex numberDifferentiation: A complex function f(z) at z is differentiable if the quantity ! f (z + "z) # f (z)"zconverges to a
USC - CS - 653
JOURNALOF COMPUTATIONALPHYSICS73, 315-348 (1987)A Fast Algorithmfor ParticleANDSimulations*L. GREENCARDV. ROKHLINDepartment of Computer Science, Yale Lnipersiry, New Haven, Connecticut 06520 Received June 10. 1986; revised February
USC - CS - 653
ELSEYIER2s -/.-I@Computer Physics Communications Computer Physics Communications ( 1997)59-69 104Parallel multilevel preconditioned conjugate-gradient approach to variable-charge molecular dynamicsAiichiro Nakano Depurtment of Computer Scienc
USC - CS - 653
Computer Physics Communications 153 (2003) 445461 www.elsevier.com/locate/cpcScalable and portable implementation of the fast multipole method on parallel computers Shuji Ogata a , Timothy J. Campbell b , Rajiv K. Kalia c,d , Aiichiro Nakano c,d,
USC - CS - 653
Reversiblemultipletime scale moleculardynamicsM. Tuckermar?) G. J. Martynaand B. J. BerneDepartment of Chemistry, Columbia University, New York, New York 10027 Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvani
USC - CS - 653
Computer Physics CommunicationsELSEVIERComputer Physics Communications 83 (1994) 197214Multiresolution molecular dynamics algorithm for realistic materials modeling on parallel computersAiichiro Nakano, Rajiv K. Kalia, Priya VashishtaConcurrent
USC - CS - 653
Message Passing Interface (MPI) ProgrammingMPI (Message Passing Interface) is a standard message passing system that enables us to write and run applications on parallel computers. In 1992, MPI Forum was formed to develop a portable message passing
USC - CS - 653
OpenMP ProgrammingAiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of Southern California Email: ana
USC - CS - 653
Hybrid MPI+OpenMP Parallel MDAiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of Southern California
USC - CS - 653
Parallel Molecular DynamicsThis chapter explains the example parallel MD program, pmd.c, in detail.Spatial Decomposition Spatial decomposition: The physical system to be simulated is partitioned into subsystems of equal volume. Processors in a pa
USC - CS - 653
Parallel Molecular DynamicsAiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of Southern California E
USC - CS - 653
Scalability Metrics for Parallel Molecular DynamicsParallel EfficiencyWe define the efficiency of a parallel program running on P processors to solve a problem of size W. Let T(W, P) be the execution time of this parallel program. Speed of the prog
USC - CS - 653
Anton, a Special-Purpose Machine for Molecular Dynamics SimulationDavid E. Shaw, Martin M. Deneroff, Ron O. Dror, Jeffrey S. Kuskin, Richard H. Larson, John K. Salmon, Cliff Young, Brannon Batson, Kevin J. Bowers, Jack C. Chao, Michael P. Eastwood,
USC - CS - 653
A Fast, Scalable Method for the Parallel Evaluation of Distance-Limited Pairwise Particle InteractionsDAVID E. SHAWD. E. Shaw Research and Development, LLC and Center for Computational Biology and Bioinformatics, Columbia University, 120 W. 45th S
USC - CS - 653
Divide-and-Conquer Parallelization ParadigmDivide-and-Conquer Simulation Algorithms Divide-and-conquer (DC) algorithms: Recursively partition a problem into subprogram of roughly equal size. If subprogram can be solved independently, there is a pos
USC - CS - 653
Divide-&-Conquer ParallelismAiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of Southern California
USC - CS - 653
Computational Materials Science 38 (2007) 642652 www.elsevier.com/locate/commatsciA divide-and-conquer/cellular-decomposition framework for million-to-billion atom simulations of chemical reactionsAiichiro Nakano a,*, Rajiv K. Kalia a, Ken-ichi No
USC - CS - 653
DE NOVO ULTRASCALE ATOMISTIC SIMULATIONS ON HIGH-END PARALLEL SUPERCOMPUTERSAiichiro Nakano Rajiv K. Kalia1 Ken-ichi Nomura1 Ashish Sharma1, 2 Priya Vashishta1 Fuyuki Shimojo1,3 4 Adri C. T. van Duin 4 William A. Goddard, III 5 Rupak Biswas Deepak S
USC - CS - 653
Load BalancingAiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of Southern California Email: anakano
USC - CS - 653
Lanczos Method for EigensystemsAiichiro NakanoCollaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science University of Southern Californ
USC - CS - 653
Supplementary Derivations for the Lanczos-Algorithm LectureSpectral representation The eigenvalues and eigenvectors satisfyn n$ Aij q (" ) = #" qi(" ) = $ qi(" ) #%&%" , jj=1%=1()(1)where "#$ = 1 ($ = #); 0 ($ % #). Define an orthogona
USC - CS - 653
C3P 913June 1990Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh CalculationsRoy D. WilliamsConcurrent Supercomputing Facility California Institute of Technology Pasadena, CaliforniaAbstract If a nite element mesh has a
USC - CS - 653
SIAM J. SCI. COMPUT. Vol. 20, No. 1, pp. 359392c 1998 Society for Industrial and Applied MathematicsA FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHSGEORGE KARYPIS AND VIPIN KUMAR Abstract. Recently, a number of researc