This preview shows page 1. Sign up to view the full content.
Unformatted text preview: e) then the processor will start to move the data to memory using “partial write” transactions on the system bus. There will be a maximum of 4 partial write transactions for one WC buffer of data sent to memory. Once data in the WC buffer has started to be propagated to memory, the data is 9-7 MEMORY CACHE CONTROL subject to the weak ordering semantics of its definition. Ordering is not maintained between the successive allocation/deallocation of WC buffers (for example, writes to WC buffer 1 followed by writes to WC buffer 2 may appear as buffer 2 followed by buffer 1 on the system bus. When a WC buffer is propagated to memory as partial writes there is no guaranteed ordering between successive partial writes (for example, a partial write for chunk 2 may appear on the bus before the partial write for chunk 1 or vice versa). The only elements of WC propagation to the system bus that are guaranteed are those provided by transaction atomicity. For the P6 family processors, a completely full WC buffer will always be propagated as a single burst transaction using any of the chunk orders. In a WC buffer propagation where the data will be propagated as partials, all data contained in the same chunk (0 mod 8 aligned) will be propagated simultaneously. 9.3.2. Choosing a Memory Type The simplest system memory model does not use memory-mapped I/O with read or write side effects, does not include a frame buffer, and uses the write-back memory type for all memory. An I/O agent can perform direct memory access (DMA) to write-back memory and the cache protocol maintains cache coherency. A system can use uncacheable memory for other memory-mapped I/O, and should always use uncacheable memory for memory-mapped I/O with read side effects. Dual-ported memory can be considered a write side effect, making relatively prompt writes desirable, because those writes cannot be observed at the other port until they reach the memory agent. A system can use uncacheable, write-through, or write-combining memory for frame buffers or dual-ported memory that contains pixel values displayed on a screen. Frame buffer memory is typically large (a few megabytes) and is usually written more than it is read by the processor. Using uncacheable memory for a frame buffer generates very large amounts of bus traffic, because operations on the entire buffer are implemented using partial writes rather than line writes. Using write-through memory for a frame buffer can displace almost all other useful cached lines in the processor’s L2 cache and L1 data cache. Therefore, systems should use writecombining memory for frame buffers whenever possible. Software can use page-level cache control, to assign appropriate effective memory types when software will not access data structures in ways that benefit from write-back caching. For example, software may read a large data structure once and not access the structure again until the structure is rewritten by another agent. Such a large data structure should be marked as uncacheable, or reading it will evict cached lines that the processor will be referencing again. A similar example would be a write-only data structure that is written to (to export the data to another agent), but never read by software. Such a structure can be marked as uncacheable, because software never reads the values that it writes (though as uncacheable memory, it will be written using partial writes, while as write-back memory, it will be written using line writes, which may not occur until the other agent reads the structure and triggers implicit write-backs). On the Pentium® III processor, new capabilities exist that may allow the programmer to perform similar functions with the prefetch and streaming store instructions. For more information on these instructions, see Section 3.2., “Instruction Reference” in Chapter 3, Instruction Set Reference. 9-8 MEMORY CACHE CONTROL 9.4. CACHE CONTROL PROTOCOL The following section describes the cache control protocol currently defined for the Intel Architecture processors. This protocol is used by the P6 family and Pentium® processors. The Intel486™ processor uses an implementation defined protocol that does not support the MESI four-state protocol, but instead uses a two-state protocol with valid and invalid states defined. In the L1 data cache and the P6 family processors’ L2 cache, the MESI (modified, exclusive, shared, invalid) cache protocol maintains consistency with caches of other processors. The L1 data cache and the L2 cache has two MESI status flags per cache line. Each line can thus be marked as being in one of the states defined in Table 9-3. In general, the operation of the MESI protocol is transparent to programs. The L1 instruction cache implements only the “SI” part of the MESI protocol, because the instruction cache is not writable. The instruction cache monitors changes in the data cache to maintain consistency between the caches when instructions are modified. See Section 9.7., “Self-Modifying Code”, for more information on the implications of caching instructions.
Table 9-3. MESI Cache Line States
Cache Line State This cache line is valid? The memory copy is… Copies exist in caches of other processo...
View Full Document
This note was uploaded on 06/07/2013 for the course ECE 1234 taught by Professor Kwhon during the Spring '10 term at University of California, Berkeley.
- Spring '10