This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ecommended to insure compatibility with the P6 family processors.) It should be noted that self-modifying code will execute at a lower level of performance than nonself-modifying or normal code. The degree of the performance deterioration will depend upon the frequency of modification and specific characteristics of the code. 7-5 MULTIPLE-PROCESSOR MANAGEMENT The act of one processor writing data into the currently executing code segment of a second processor with the intent of having the second processor execute that data as code is called cross-modifying code. As with self-modifying code, Intel Architecture processors exhibit model-specific behavior when executing cross-modifying code, depending upon how far ahead of the executing processors current execution pointer the code has been modified. To write cross-modifying code and insure that it is compliant with current and future Intel Architectures, the following processor synchronization algorithm should be implemented.
; Action of Modifying Processor Store modified code (as data) into code segment; Memory_Flag ← 1; ; Action of Executing Processor WHILE (Memory_Flag ≠ 1) Wait for code to update; ELIHW; Execute serializing instruction; (* For example, CPUID instruction *) Begin executing modified code; (The use of this option is not required for programs intended to run on the Intel486™ processor, but is recommended to insure compatibility with the Pentium®, and P6 family processors.) Like self-modifying code, cross-modifying code will execute at a lower level of performance than noncross-modifying (normal) code, depending upon the frequency of modification and specific characteristics of the code. 7.1.4. Effects of a LOCK Operation on Internal Processor Caches For the Intel486™ and Pentium® processors, the LOCK# signal is always asserted on the bus during a LOCK operation, even if the area of memory being locked is cached in the processor. For the P6 family processors, if the area of memory being locked during a LOCK operation is cached in the processor that is performing the LOCK operation as write-back memory and is completely contained in a cache line, the processor may not assert the LOCK# signal on the bus. Instead, it will modify the memory location internally and allow it’s cache coherency mechanism to insure that the operation is carried out atomically. This operation is called “cache locking.” The cache coherency mechanism automatically prevents two or more processors that have cached the same area of memory from simultaneously modifying data in that area. 7.2. MEMORY ORDERING The term memory ordering refers to the order in which the processor issues reads (loads) and writes (stores) out onto the bus to system memory. The Intel Architecture supports several memory ordering models depending on the implementation of the architecture. For example, the Intel386™ processor enforces program ordering (generally referred to as strong ordering), 7-6 MULTIPLE-PROCESSOR MANAGEMENT where reads and writes are issued on the system bus in the order they occur in the instruction stream under all circumstances. To allow optimizing of instruction execution, the Intel Architecture allows departures from strong-ordering model called processor ordering in P6-family processors. These processorordering variations allow performance enhancing operations such as allowing reads to go ahead of writes by buffering writes. The goal of any of these variations is to increase instruction execution speeds, while maintaining memory coherency, even in multiple-processor systems. The following sections describe the memory ordering models used by the Intel486™, Pentium®, and P6 family processors. 7.2.1. Memory Ordering in the Pentium® and Intel486™ Processors The Pentium® and Intel486™ processors follow the processor-ordered memory model; however, they operate as strongly-ordered processors under most circumstances. Reads and writes always appear in programmed order at the system bus—except for the following situation where processor ordering is exhibited. Read misses are permitted to go ahead of buffered writes on the system bus when all the buffered writes are cache hits and, therefore, are not directed to the same address being accessed by the read miss. In the case of I/O operations, both reads and writes always appear in programmed order. Software intended to operate correctly in processor-ordered processors (such as the P6 family processors) should not depend on the relatively strong ordering of the Pentium® or Intel486™ processors. Instead, it should insure that accesses to shared variables that are intended to control concurrent execution among processors are explicitly required to obey program ordering through the use of appropriate locking or serializing operations (refer to Section 7.2.4., “Strengthening or Weakening the Memory Ordering Model”). 7.2.2. Memory Ordering in the P6 Family Processors The P6 family processors also use a processor-ordered memory ordering model that can be further refined defined as “write ordered with store-buffer forwardi...
View Full Document
This note was uploaded on 06/07/2013 for the course ECE 1234 taught by Professor Kwhon during the Spring '10 term at University of California, Berkeley.
- Spring '10