IntelSoftwareDevelopersManual

The final values in location a b and c would possibly

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ng.” This model can be characterized as follows. In a single-processor system for memory regions defined as write-back cacheable, the following ordering rules apply: 1. Reads can be carried out speculatively and in any order. 2. Reads can pass buffered writes, but the processor is self-consistent. 3. Writes to memory are always carried out in program order. 4. Writes can be buffered. 5. Writes are not performed speculatively; they are only performed for instructions that have actually been retired. 7-7 MULTIPLE-PROCESSOR MANAGEMENT 6. Data from buffered writes can be forwarded to waiting reads within the processor. 7. Reads or writes cannot pass (be carried out ahead of) I/O instructions, locked instructions, or serializing instructions. The second rule allows a read to pass a write. However, if the write is to the same memory location as the read, the processor’s internal “snooping” mechanism will detect the conflict and update the already cached read before the processor executes the instruction that uses the value. The sixth rule constitutes an exception to an otherwise write ordered model. In a multiple-processor system, the following ordering rules apply: • • • Individual processors use the same ordering rules as in a single-processor system. Writes by a single processor are observed in the same order by all processors. Writes from the individual processors on the system bus are globally observed and are NOT ordered with respect to each other. The latter rule can be clarified by the example in Figure 7-1. Consider three processors in a system and each processor performs three writes, one to each of three defined locations (A, B, and C). Individually, the processors perform the writes in the same program order, but because of bus arbitration and other memory access mechanisms, the order that the three processors write the individual memory locations can differ each time the respective code sequences are executed on the processors. The final values in location A, B, and C would possibly vary on each execution of the write sequence. Order of Writes From Individual Processors Each processor is guaranteed to perform writes in program order. Processor #1 Write A.1 Write B.1 Write C.1 Processor #2 Write A.2 Write B.2 Write C.2 Processor #3 Write A.3 Write B.3 Write C.3 Example of Order of Actual Writes From All Processors to Memory Writes are in order with respect to individual processors. Write A.1 Write B.1 Write A.2 Write A.3 Write C.1 Write B.2 Write C.2 Write B.3 Write C.3 Writes from all processors are not guaranteed to occur in a particular order. Figure 7-1. Example of Write Ordering in Multiple-Processor Systems 7-8 MULTIPLE-PROCESSOR MANAGEMENT The processor-ordering model described in this section is virtually identical to that used by the Pentium® and Intel486™ processors. The only enhancements in the P6 family processors are: • • • Added support for speculative reads. Store-buffer forwarding, when a read passes a write to the same memory location. Out of order store from long string store and string move operations (refer to Section 7.2.3., “Out of Order Stores From String Operations in P6 Family Processors” below). 7.2.3. Out of Order Stores From String Operations in P6 Family Processors The P6 family processors modify the processors operation during the string store operations (initiated with the MOVS and STOS instructions) to maximize performance. Once the “fast string” operations initial conditions are met (as described below), the processor will essentially operate on, from an external perspective, the string in a cache line by cache line mode. This results in the processor looping on issuing a cache-line read for the source address and an invalidation on the external bus for the destination address, knowing that all bytes in the destination cache line will be modified, for the length of the string. In this mode interrupts will only be accepted by the processor on cache line boundaries. It is possible in this mode that the destination line invalidations, and therefore stores, will be issued on the external bus out of order. Code dependent upon sequential store ordering should not use the string operations for the entire data structure to be stored. Data and semaphores should be separated. Order dependent code should use a discrete semaphore uniquely stored to after any string operations to allow correctly ordered data to be seen by all processors. Initial conditions for “fast string” operations: • • • • • Source and destination addresses must be 8-byte aligned. String operation must be performed in ascending address order. The initial operation counter (ECX) must be equal to or greater than 64. Source and destination must not overlap by less than a cache line (32 bytes). The memory type for both source and destination addresses must be either WB or WC. 7.2.4. Strengthening or Weakening the Memory Ordering Model The Intel Architecture provides several mechanisms for strengthening o...
View Full Document

This note was uploaded on 06/07/2013 for the course ECE 1234 taught by Professor Kwhon during the Spring '10 term at University of California, Berkeley.

Ask a homework question - tutors are online