Unformatted text preview: number of interrupts received, or number of cache loads. Appendix A, PerformanceMonitoring Events lists all the events that can be counted (Table A-1 for the P6 family processors and Table A-2 for the Pentium® processors). The counters are set up, started, and stopped using two MSRs and the RDMSR and WRMSR instructions. For the P6 family processors, the current count for a particular counter can be read using the new RDPMC instruction. The performance-monitoring counters are useful for debugging programs, optimizing code, diagnosing system failures, or refining hardware designs. Refer to Chapter 15, Debugging and Performance Monitoring for more information on these counters. 18-40 A
PerformanceMonitoring Events PERFORMANCE-MONITORING EVENTS APPENDIX A PERFORMANCE-MONITORING EVENTS
This appendix contains list of the performance-monitoring events that can be monitored with the Intel Architecture processors. In the Intel Architecture processors, the ability to monitor performance events and the events that can be monitored are model specific. Section A.1., “P6 Family Processor Performance-Monitoring Events” lists and describes the events that can be monitored with the P6 family of processors. Section A.2., “Pentium® Processor Performance-Monitoring Events” lists and describes the events that can be monitored with Pentium® processors. A.1. P6 FAMILY PROCESSOR PERFORMANCE-MONITORING EVENTS
Table A-1 lists the events that can be counted with the performance-monitoring counters and read with the RDPMC instruction for the P6 family of processors. The unit column gives the microarchitecture or bus unit that produces the event; the event number column gives the hexadecimal number identifying the event; the mnemonic event name column gives the name of the event; the unit mask column gives the unit mask required (if any); the description column describes the event; and the comments column gives additional information about the event. These performance-monitoring events are intended to be used as guides for performance tuning. The counter values reported are not guaranteed to be absolutely accurate and should be used as a relative guide for tuning. Known discrepancies are documented where applicable. Some performance events are model specific. Those added in later generations of the P6 family processors are listed in this table. Performance events are not architecturally guaranteed in future versions of the P6 family processors. All performance event encodings not listed in Table A-1 are reserved and their use will result in undefined counter results. Refer to the end of the table for notes related to certain entries in the table. A-1 PERFORMANCE-MONITORING EVENTS Table A-1. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters
Data Cache Unit (DCU) Event Num.
43H Mnemonic Event Name
DATA_MEM_REFS Unit Mask
All loads from any memory type. All stores to any memory type. Each part of a split is counted separately. The internal logic counts not only memory loads and stores, but also internal retries. Note: 80-bit floating-point accesses are double counted, since they are decomposed into a 16-bit exponent load and a 64bit mantissa load. Memory accesses are only counted when they are actually performed (such as a load that gets squashed because a previous cache miss is outstanding to the same address, and which finally gets performed, is only counted once). Does not include I/O accesses, or other nonmemory accesses. Comments 45H 46H 47H DCU_LINES_IN DCU_M_LINES_IN DCU_M_LINES_OUT 00H 00H 00H Total lines allocated in the DCU. Number of M state lines allocated in the DCU. Number of M state lines evicted from the DCU. This includes evictions via snoop HITM, intervention or replacement. Weighted number of cycles while a DCU miss is outstanding, incremented by the number of outstanding cache misses at any particular time. Cacheable read requests only are considered. Uncacheable requests are excluded. Read-for-ownerships are counted, as well as line fills, invalidates, and stores. An access that also misses the L2 is shortchanged by 2 cycles (i.e., if counts N cycles, should be N+2 cycles). Subsequent loads to the same cache line will not result in any additional counts. Count value not precise, but still useful. 48H DCU_MISS_ OUTSTANDING 00H Instruction Fetch Unit (IFU) 80H IFU_IFETCH 00H Number of instruction fetches, both cacheable and noncacheable, including UC fetches. Number of instruction fetch misses. All instruction fetches that do not hit the IFU (i.e., that produce memory requests). Includes UC accesses. 81H IFU_IFETCH_MISS 00H 85H ITLB_MISS 00H Number of ITLB misses. A-2 PERFORMANCE-MONITORING EVENTS Table A-1. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.)
Unit Event Num.
86H Mnemonic Event Name
IFU_MEM_STALL Unit Mask
Number of cycles instruction fetch is stalled, for any reason. Includes IFU cache misses, ITLB misses, ITLB faults, and other minor stalls. Comments 87H ILD_STALL 00H Number of cycles that...
View Full Document
This note was uploaded on 06/07/2013 for the course ECE 1234 taught by Professor Kwhon during the Spring '10 term at Berkeley.
- Spring '10