This preview shows page 1. Sign up to view the full content.
Unformatted text preview: le, and show how you would determine what fraction of the
original program is inherently sequential.
You should set up the math, but you do not need to actually do any arithmetic. 9 Problem 7: Cache coherence (15 points)
You have a dual-core processor with private, write-back L1 caches, a shared L2, and a shared
bus to the L2 cache.
The following references to some location X happen in order. There are no intervening references
to location X.
1. P1 reads the value 2 from location X in L2 (L1 miss, L2 hit)
2. P2 writes the value 3 to location X (L1 miss, L2 hit)
3. P1 writes the value 5 to location X (L1 hit)
4. A signiﬁcant of amount of time later, P2 reads location X (L1 hit)
5. P1 reads location X (L1 hit)
Part A (5 points)
If there is no hardware cache coherence mechanism, what values will P1, P2, and the L2 cache
have for location X after this sequence of references is complete? Part B (10 points)
Assuming an MSI (modiﬁed-shared-invalid) cache coherence protocol, track the state of location
X’s block in each processor’s L1 cache using the table below. You do not need to write down
what happens on the shared bus to L2.
Ref. P1 cache’s state P2 cache’s state Initial state Invalid Invalid After ref. 1 After ref. 2 After ref. 3 After ref. 4 After ref. 5 10 Problem 8: GPUs (10 points)
Part A (5 points)
Write a snippet of C-like code that is well suited to being rewritten in CUDA and run on a GPU,
and explain why. Part B (3 points)
If you were to port this code to CUDA, how would you divide up the work among threads? Part C (2 points)
If CUDA allows 1024 threads per block, how many threads and how many blocks would you
View Full Document
This note was uploaded on 02/08/2014 for the course CS 351 taught by Professor Dr.suzannerivoire during the Fall '13 term at Sonoma.
- Fall '13
- Computer Architecture