Unformatted text preview: oad, store and texture) –  Unified path to global for loads and stores Shared Memory L1 Cache L2 Cache Global Memory Perhaad Mistry & Dana Schaa, Northeastern Univ Computer Architecture Research Lab, with Ben 33 Intel’s Larrabee – Xeon Phi •  •  •  •  •  Project to build a high ­end GPU that bridges the gap to conven(onal mul(core Each core is a simple in ­order 4 ­way SMT x86 Extended with a SIMD instruc(on set (16 floats wide) Special ­purpose hardware for texture cache, it much else Both L1 (32KB per core) and L2 (256KB per core) data caches are coherent •  Larrabee GPU project cancelled but re ­emerged as a compute accelerator  ­ Intel's Many Integrated Core (MIC), codenamed “Knights Corner”, launched as “Xeon Phi” Diagram: Intel SIMT vs SIMD – GPUs without the hype •  GPUs combine many architectural tec...
