This preview shows page 1. Sign up to view the full content.
Unformatted text preview: See Section 9.3.1., “Buffering of Write Combining Memory Locations”, for more information about caching the WC memory type. The preferred method is to use the new SFENCE (store fence) instruction introduced in the Pentium® III processor. The SFENCE instruction ensures weakly ordered writes are written to memory in order, i.e., it serializes only the store operations. Write-through (WT)—Writes and reads to and from system memory are cached. Reads come from cache lines on cache hits; read misses cause cache fills. Speculative reads are allowed. All writes are written to a cache line (when possible) and through to system memory. When writing through to memory, invalid cache lines are never filled, and valid cache lines are either filled or invalidated. Write combining is allowed. This type of cachecontrol is appropriate for frame buffers or when there are devices on the system bus that access system memory, but do not perform snooping of memory accesses. It enforces coherency between caches in the processors and system memory. Write-back (WB)—Writes and reads to and from system memory are cached. Reads come from cache lines on cache hits; read misses cause cache fills. Speculative reads are allowed. Write misses cause cache line fills (in the P6 family processors), and writes are performed entirely in the cache, when possible. Write combining is allowed. The writeback memory type reduces bus traffic by eliminating many unnecessary writes to system memory. Writes to a cache line are not immediately forwarded to system memory; instead, • • 9-6 MEMORY CACHE CONTROL they are accumulated in the cache. The modified cache lines are written to system memory later, when a write-back operation is performed. Write-back operations are triggered when cache lines need to be deallocated, such as when new cache lines are being allocated in a cache that is already full. They also are triggered by the mechanisms used to maintain cache consistency. This type of cache-control provides the best performance, but it requires that all devices that access system memory on the system bus be able to snoop memory accesses to insure system memory and cache coherency. • Write protected (WP)—Reads come from cache lines when possible, and read misses cause cache fills. Writes are propagated to the system bus and cause corresponding cache lines on all processors on the bus to be invalidated. Speculative reads are allowed. This caching option is available in the P6 family processors by programming the MTRRs (seeTable 9-5). 9.3.1. Buffering of Write Combining Memory Locations Writes to WC memory are not cached in the typical sense of the word cached. They are retained in an internal buffer that is separate from the internal L1 and L2 caches. The buffer is not snooped and thus does not provide data coherency. The write buffering is done to allow software a small window of time to supply more modified data to the buffer while remaining as nonintrusive to software as possible. The size of the buffer is not architecturally defined, However the Pentium® Pro and Pentium® II processors implement a single concurrent 32-byte buffer. The size of this buffer was chosen by implementation convenience. In the Pentium® III processor there are 4 write combine buffers. The size is the same as for the Pentium® Pro and Pentium® II processors. Buffer size and quantity changes may occur in future generations of the P6 family processors and so software should not rely upon the current 32-byte WC buffer size or the existence of a single concurrent buffer or the 4 buffers in the Penitum III processor. The WC buffering of writes also causes data to be collapsed (for example, multiple writes to the same location will leave the last data written in the location and the other writes will be lost). For the Pentium® Pro and Pentium® II processors, once software writes to a region of memory that is addressed outside of the range of the current 32-byte buffer, the data in the buffer is automatically forwarded to the system bus and written to memory. Therefore software that writes more than one 32-byte buffers worth of data will ensure that the data from the first buffers address range is forwarded to memory. The last buffer written in the sequence may be delayed by the processor longer unless the buffers are deliberately emptied. Software developers should not rely on the fact that there is only one active WC buffer at a time. Software developers creating software that is sensitive to data being delayed must deliberately empty the WC buffers and not assume the hardware will. Once the processor has started to move data into the WC buffer, it will make a bus transaction style decision based on how much of the buffer contains valid data. If the buffer is full (for example, all 32 bytes are valid) the processor will execute a burst write transaction on the bus that will result in all 32 bytes being transmitted on the data bus in a single transaction. If one or more of the WC buffer’s bytes are invalid (for example, have not been written by softwar...
View Full Document
- Spring '10