Unformatted text preview: r weakening the memory ordering model to handle special programming situations. These mechanisms include: • • The I/O instructions, locking instructions, the LOCK prefix, and serializing instructions force stronger ordering on the processor. The memory type range registers (MTRRs) can be used to strengthen or weaken memory ordering for specific area of physical memory (refer to Section 9.12., “Memory Type 7-9 MULTIPLE-PROCESSOR MANAGEMENT Range Registers (MTRRs)”, in Chapter 9, Memory Cache Control). MTRRs are available only in the P6 family processors. These mechanisms can be used as follows. Memory mapped devices and other I/O devices on the bus are often sensitive to the order of writes to their I/O buffers. I/O instructions can be used to (the IN and OUT instructions) impose strong write ordering on such accesses as follows. Prior to executing an I/O instruction, the processor waits for all previous instructions in the program to complete and for all buffered writes to drain to memory. Only instruction fetch and page tables walks can pass I/O instructions. Execution of subsequent instructions do not begin until the processor determines that the I/O instruction has been completed. Synchronization mechanisms in multiple-processor systems may depend upon a strong memory-ordering model. Here, a program can use a locking instruction such as the XCHG instruction or the LOCK prefix to insure that a read-modify-write operation on memory is carried out atomically. Locking operations typically operate like I/O operations in that they wait for all previous instructions to complete and for all buffered writes to drain to memory (refer to Section 7.1.2., “Bus Locking”). Program synchronization can also be carried out with serializing instructions (refer to Section 7.4., “Serializing Instructions”). These instructions are typically used at critical procedure or task boundaries to force completion of all previous instructions before a jump to a new section of code or a context switch occurs. Like the I/O and locking instructions, the processor waits until all previous instructions have been completed and all buffered writes have been drained to memory before executing the serializing instruction. The MTRRs were introduced in the P6 family processors to define the cache characteristics for specified areas of physical memory. The following are two examples of how memory types set up with MTRRs can be used strengthen or weaken memory ordering for the P6 family processors: • The uncached (UC) memory type forces a strong-ordering model on memory accesses. Here, all reads and writes to the UC memory region appear on the bus and out-of-order or speculative accesses are not performed. This memory type can be applied to an address range dedicated to memory mapped I/O devices to force strong memory ordering. For areas of memory where weak ordering is acceptable, the write back (WB) memory type can be chosen. Here, reads can be performed speculatively and writes can be buffered and combined. For this type of memory, cache locking is performed on atomic (locked) operations that do not split across cache lines, which helps to reduce the performance penalty associated with the use of the typical synchronization instructions, such as XCHG, that lock the bus during the entire read-modify-write operation. With the WB memory type, the XCHG instruction locks the cache instead of the bus if the memory access is contained within a cache line. • It is recommended that software written to run on P6 family processors assume the processorordering model or a weaker memory-ordering model. The P6 family processors do not implement a strong memory-ordering model, except when using the UC memory type. Despite the fact that P6 family processors support processor ordering, Intel does not guarantee that future processors will support this model. To make software portable to future processors, it is recom- 7-10 MULTIPLE-PROCESSOR MANAGEMENT mended that operating systems provide critical region and resource control constructs and API’s (application program interfaces) based on I/O, locking, and/or serializing instructions be used to synchronize access to shared areas of memory in multiple-processor systems. Also, software should not depend on processor ordering in situations where the system hardware does not vsupport this memory-ordering model. 7.3. PROPAGATION OF PAGE TABLE ENTRY CHANGES TO MULTIPLE PROCESSORS In a multiprocessor system, when one processor changes a page table entry or mapping, the changes must also be propagated to all the other processors. This process is also known as “TLB Shootdown.” Propagation may be done by memory-based semaphores and/or interprocessor interrupts between processors. One naive but algorithmically correct TLB Shootdown sequence for the Intel Architecture is: 1. Begin barrier: Stop all processors. Cause all but one to HALT or stop in a spinloop. 2. Let the active processor change the PTE(s). 3. Let all processors invalidate the PTE(s) modified in their TLBs. 4. End barrier: Resume all processors. Alternate, performance-optimized, TBL Shootdo...
View Full Document
This note was uploaded on 06/07/2013 for the course ECE 1234 taught by Professor Kwhon during the Spring '10 term at Berkeley.
- Spring '10