Unformatted text preview: false sharing. To avoid it, compiler/programmer needs to analyze sharing pattern, and put each shared variable on a page with other variables used at the same time/in the same way. Here’s another example: granularity of sharing. mesh computation. Shared data at the edges between each CPU’s region. What if mesh is large, so that each page of data represents a row – then need to transfer the entire mesh on each time step! (Possible solution, used in SVM systems: send only the diffs between versions of the page.) Earlier I said RPC and cache coherence are duals – move computation to data versus move data to computation. Which is the more efficient way to implement the program above? Other questions: **What state do we need to keep for the various levels of coherence? TTL: none! ***Scalability of directory information: to be serializable, need to have a bit per processor, or a list of processors caching each block. (why there isn’t cache coherence of web pages.) We could try to scale by distributing the callback state as a tree of c...
View Full Document
This document was uploaded on 04/04/2014.
- Spring '14