{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

rusu - Please click on paper title to view Visual...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
56 2009 IEEE International Solid-State Circuits Conference ISSCC 2009 / SESSION 3 / MICROPROCESSOR TECHNOLOGIES / 3.1 3.1 A 45nm 8-Core Enterprise Xeon ® Processor Stefan Rusu, Simon Tam, Harry Muljono, Jason Stinson, David Ayers, Jonathan Chang, Raj Varada, Matt Ratta, Sailesh Kottapalli Intel, Santa Clara, CA The next-generation enterprise Xeon ® server processor consists of eight dual- threaded 64b Nehalem cores and a shared L3 cache. The system interface includes two on-chip memory controllers and supports multiple system topologies. Figure 3.1.1 shows the processor block diagram. This design has 2.3B transistors and is implemented in 45nm CMOS using metal-gate high- κ dielectric transistors and nine Cu interconnect layers [1]. The thermal design power is 130W. The L3 cache is built from eight slices, each having 2048 sets and 24 ways. Each cache line is 64 bytes constructed into 2 chunks. There are a total of 48 sub-arrays in each slice [2], as shown in Fig. 3.1.2. To reduce the L3-cache- slice active power, only 3.125% of the arrays are powered up for each access. The data array uses a 0.3816μm 2 cell and is protected by inline double-error- correction and triple-error-detection (DECTED) ECC scheme with variable latency. Tag arrays are built with a 0.54μm 2 cell and are protected by inline sin- gle-error-correction and double-error detection (SECDED) ECC scheme with fixed latency. Data array has both column and row redundancy, while the tag arrays have only column redundancy. The L3-cache-redundancy fuses are stored in an on-package serial EEPROM and their values are shifted into on- die radiation-hardened storage elements during reset. If any defect occurs in the non-repairable area (such as tag data-path) of a cache slice, the die can still be recovered by disabling the defective slice. Similarly, if a defect falls onto a processor core, the die is recovered by de-featuring the defective core. Figure 3.1.3 shows an example of this core-and-cache recovery scheme. All disabled cores and cache slices are clock gated to reduce active power and are also placed in shut-off mode to minimize leakage. For the data array, SRAM cells and peripheral circuitry (sense amplifiers, column multiplexers and write drivers) and wordline drivers can be placed in the shut-off mode. For the tag arrays, a shut-off option is implemented in the SRAM cells and wordline driv- ers only. During shut-off, the power supply of SRAM drops from the 0.90V nominal operating voltage to 0.36V, providing an 83% leakage-power saving.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.
  • Spring '10
  • Cheng
  • Xeon, Central processing unit, CPU cache, Visual Supplement, independent clock domains, un-core clock distribution

{[ snackBarMessage ]}