This preview shows page 1. Sign up to view the full content.
Unformatted text preview: the instruction. In addition, the P6 family and Pentium® processors check whether a write to a code segment may modify an instruction that has been prefetched for execution. If the write affects a prefetched instruction, the prefetch queue is invalidated. This latter check is based on the linear address of the instruction. In practice, the check on linear addresses should not create compatibility problems among Intel Architecture processors. Applications that include self-modifying code use the same linear address for modifying and fetching the instruction. Systems software, such as a debugger, that might possibly modify an instruction using a different linear address than that used to fetch the instruction, will execute a serializing operation, such as a CPUID instruction, before the modified instruction is executed, which will automatically resynchronize the instruction cache and prefetch queue. See Section 7.1.3., “Handling Self- and Cross-Modifying Code”, in Chapter 7, Multiple-Processor Management, for more information about the use of self-modifying code. For Intel486™ processors, a write to an instruction in the cache will modify it in both the cache and memory, but if the instruction was prefetched before the write, the old version of the instruction could be the one executed. To prevent the old instruction from being executed, flush the instruction prefetch unit by coding a jump instruction immediately after any write that modifies an instruction. 9-15 MEMORY CACHE CONTROL 9.8. IMPLICIT CACHING (P6 FAMILY PROCESSORS) Implicit caching occurs when a memory element is made potentially cacheable, although the element may never have been accessed in the normal von Neumann sequence. Implicit caching occurs on the P6 family processors due to aggressive prefetching, branch prediction, and TLB miss handling. Implicit caching is an extension of the behavior of existing Intel386™, Intel486™, and Pentium® processor systems, since software running on these processor families also has not been able to deterministically predict the behavior of instruction prefetch. To avoid problems related to implicit caching, the operating system must explicitly invalidate the cache when changes are made to cacheable data that the cache coherency mechanism does not automatically handle. This includes writes to dual-ported or physically aliased memory boards that are not detected by the snooping mechanisms of the processor, and changes to pagetable entries in memory. The code in Example 9-1 shows the effect of implicit caching on page-table entries. The linear address F000H points to physical location B000H (the page-table entry for F000H contains the value B000H), and the page-table entry for linear address F000 is PTE_F000.
Example 9-1. Effect of Implicit Caching on Page-Table Entries mov mov mov mov EAX, CR3 ; Invalidate the TLB CR3, EAX ; by copying CR3 to itself PTE_F000, A000H; Change F000H to point to A000H EBX, [F000H]; Because of speculative execution in the P6 family processors, the last MOV instruction performed would place the value at physical location B000H into EBX, rather than the value at the new physical address A000H. This situation is remedied by placing a TLB invalidation between the load and the store. 9.9. EXPLICIT CACHING The Pentium® III processor introduced a new instruction designed to provide some control over caching of data. The prefetch instruction is a “hint” to the processor that the data requested by the prefetch instruction should be read into cache, even though it is not needed yet. The processor assumes it will be needed soon. Explicit caching occurs when the application program executes a prefetch instruction. The programmer must be judicious in the use of the prefetch instruction. Overuse can lead to resource conflicts and hence reduce the performance of an application. For more detailed information on the proper use of the prefetch instruction, refer to Chapter 6, “Optimizing Cache Utilization for Pentium® III Processors”, in the Intel Architecture Optimization Reference Manual (Order Number 245127-001). Prefetch can be used to read data into the cache prior to the application actually requiring it. This helps to reduce the long latency typically associated with reading data from memory and causing the processor to “stall”. It is important to remember that prefetch is only a hint to the processor 9-16 MEMORY CACHE CONTROL to fetch the data now or as soon as possible. It will be used soon. The prefetch instruction has different variations that allow the programmer to control into which cache level the data will be read. For more information on the variations of the prefetch instruction refer to Section 22.214.171.124., “Cacheability Hint Instructions”, Chapter 9, Programming with the Streaming SIMD Extensions, if the Intel Architecture Software Developer’s Manual, Volume 2. 9.10. INVALIDATING THE TRANSLATION LOOKASIDE BUFFERS (TLBS)
The processor updates its address translation caches (TLBs) transparentl...
View Full Document
This note was uploaded on 06/07/2013 for the course ECE 1234 taught by Professor Kwhon during the Spring '10 term at University of California, Berkeley.
- Spring '10