424set7w16 - BYU ECEn Chapter 6 The Memory Hierarchy BYU...

This preview shows page 1 out of 23 pages.

Unformatted text preview: BYU ECEn Chapter 6: The Memory Hierarchy ¢  BYU ECEn Memory Systems Topics §  Apprecia-ng modern technology §  Storage technologies and trends §  Locality of reference §  Caching in the memory hierarchy §  Cache design principles §  Implica-ons of caches for programmers ¢  ¢  Our Y86-­‐64 processor could access memory in 1 cycle §  Harder to do than you might think Historically challenging to build reliable, effecCve memory systems §  What alterna-ves did system designers use along the way? §  What was the first really effec-ve memory technology? §  What is used in systems today? VII:1 VII:2 BYU ECEn Obsolete Memory Technologies ¢  BYU ECEn Obsolete Memory Technologies Vacuum tubes ¢  Vacuum tubes Storage for the IBM 701 was based on cathode ray tubes that could each store 1024 bits. The Williams-­‐ Kilburn tube was patented in 1946. The system used 72 of these tubes, giving a total memory of 2048 words of 36 bits each. The memory worked by drawing dots on the tube, crea-ng a slight posi-ve charge at each dot, with a slight nega-ve charge surrounding it. Memory contents were read using a metal pickup plate covering the face of the CRT. A change in the charge on the tube induced a voltage pulse in the plate. Sensing was synchronized with the loca-on of the beam so the memory address could be determined. This was a random access memory, but reads were destruc-ve, so the value had to be rewri^en. Moreover, the charge leaked away, so it had to be read and wri^en back frequently. The photo shows an electronic panel of the IBM 701, announced to the public in 1952. This was IBM’s first commercial scien-fic computer. During development, it was known as the Defense Calculator. IBM did not sell the machines; they were rented, for $12,000+ per month. A total of 19 were installed. The logic circuitry and storage were made of vacuum tubes. The arithme-c unit was organized into 8-­‐tube pluggable units. An addi-on required 60 ms; a mul-plica-on or division took 456 ms. VII:3 VII:4 BYU ECEn Obsolete Memory Technologies BYU ECEn Obsolete Memory Technologies Delay lines A delay line memory consisted of a pipe filled with an ¢  ¢  acous-cally conduc-ve fluid such as mercury. A speaker on one end generated pulses (corresponding to individual bits), and a microphone on the other read those pulses, which were then reinserted into the tube via the speaker. This was an extremely primi-ve Serial Access Memory. Drum memory Drums were large metal cylinders coated on the outside surface with a ferromagne-c recording material. Unlike modern hard drives, drums had a track per head so heads did not have to move. In most cases a row of fixed read-­‐write heads ran along the long axis of the drum. Drums were used primarily as main memory. They were popular in cheaper computers from the 1950s to the 1960s. To boost performance, programmers would -me the delay from loading one instruc-on into the computer un-l the next was needed, and then place the next instruc-on on the drum so that it would arrive under a head just in -me. The average access -me for any par-cular bit in the tube was approximately 220 ms. The UNIVAC (1951) had 1000 (72-­‐bit) words of memory requiring 7 memory units, each with 18 mercury columns about 2 feet long. (That’s about 570 bits per column.) The memory units alone were the size of a small room. To func-on properly, the mercury had to be kept at 40° C, and required a full team of engineers to constantly tune them in order to maintain consistent acous-c proper-es. VII:5 VII:6 BYU ECEn Obsolete Memory Technologies ¢  BYU ECEn Memory Technologies What was the first big memory breakthrough? ¢  MagneCc core memory was the predominant form of RAM in computer systems from 1955 to 1975. Horizontal, ver-cal, and diagonal wires are threaded through -ny magne-c toroids, each of which stores 1 bit. The value of each bit is determined by the direc-on of the core’s magne-za-on. The process of reading a memory loca-on caused its value to be reset to 0. Thus, every read had to be followed by a write to restore the memory value. It proved to be impossible to wire up core memory with a machine. Fabrica-on relied on cheap labor in the Far East, using microscopes. The cost per bit in 1955 was $1. Eventually, this fell to $0.01 per bit. What was the next big memory breakthrough? DRAM (dynamic random access memory) was invented in 1966 by Dr. Robert Dennard at the IBM Thomas J. Watson Research Center. It was patented in 1968. The design used a single transistor and a single capacitor per NMOS DRAM cell. At the -me, some regarded it as “Dennard’s Folly”, since the -ny capacitors lost their charge about 100 -mes each second. Obviously designers have found a way to make it work; this remains the dominant memory technology today. A 32x32 core plane memory storing 1024 bits. VII:7 VII:8 BYU ECEn Random-­‐Access Memory (RAM) ¢  BYU ECEn SRAM vs DRAM Summary Key features §  RAM is tradi-onally packaged as a chip. §  Basic storage unit is a cell (one bit per cell). §  Mul-ple RAM chips form a memory. ¢  §  §  §  §  ¢  Trans. per bit StaCc RAM (SRAM) Each cell stores bit with a circuit of six transistors. Retains value indefinitely, as long as it is kept powered. Rela-vely insensi-ve to disturbances such as electrical noise. Faster, more expensive, and larger than DRAM. Cost Applications SRAM 6 1X Yes No Yes 100x Cache memories DRAM 1 10X No Yes Yes 1X Persistent? Sensitive? Volatile? Does it retain its contents as long as it is powered? Dynamic RAM (DRAM) §  §  §  §  Access time Each cell stores bit with a capacitor and transistor. Value must be refreshed every 10-­‐100 ms. Sensi-ve to disturbances. Slower, cheaper, and smaller than SRAM. Main memories, frame buffers Does it lose informa-on when powered off? Is electrical noise likely to change the memory contents? VII:9 VII:10 BYU ECEn ConvenConal DRAM OrganizaCon ¢  BYU ECEn Reading DRAM Supercell (2,1) d x w DRAM: §  d x w total bits organized as d supercells of size w bits ¢  ¢  Step 1(a): Row access strobe (RAS) selects row 2. Step 1(b): Row 2 copied from DRAM array to row buffer. 16 x 8 DRAM chip (128 bits total) 0 2 bits / 2 3 2 bits / supercell (2,1) 2 8 bits / 3 (to/from CPU) 2 3 1 rows memory controller 2 8 bits / data cols 1 0 addr 1 rows memory controller 0 RAS = 2 0 addr (to/from CPU) 16 x 8 DRAM chip cols 1 3 data internal row buffer VII:11 internal row buffer VII:12 BYU ECEn Memory Modules Reading DRAM Supercell (2,1) ¢  ¢  BYU ECEn addr (row = i, col = j) Step 2(a): Column access strobe (CAS) selects column 1. Step 2(b): Supercell (2,1) copied from buffer to data lines, then sent to CPU. : supercell (i,j) DRAM 0 64 MB memory module consisting of eight 8Mx8 DRAMs 16 x 8 DRAM chip 0 CAS = 1 2 bits / 1 2 3 0 addr (to/from CPU) DRAM 7 cols 1 bits bits bits 56-63 48-55 40-47 rows memory controller bits bits bits bits 32-39 24-31 16-23 8-15 bits 0-7 2 63 8 bits / 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0 3 64-bit doubleword at main memory address A Memory controller data supercell (2,1) 64-bit doubleword supercell (2,1) internal row buffer VII:13 VII:14 BYU ECEn Enhanced DRAMs ¢  ¢  BYU ECEn NonvolaCle Memories Basic DRAM cell has not changed since its invenCon in 1966. §  Commercialized by Intel in 1970. DRAM cores with beYer interface logic and faster I/O : §  Synchronous DRAM (SDRAM) ¢  ¢  Uses a conven-onal clock signal instead of asynchronous control Allows reuse of the row addresses (e.g., RAS, CAS, CAS, CAS) §  Double data-­‐rate synchronous DRAM (DDR SDRAM) §  Double edge clocking sends two bits per cycle per pin §  Different types dis-nguished by size of small prefetch buffer: –  DDR (2 bits), DDR2 (4 bits), DDR3 (8 bits) §  By 2010, standard for most server and desktop systems §  Intel Core i7 supports only DDR3 SDRAM §  ¢  §  ¢  VII:15 DRAM and SRAM are volaCle memories §  Contents lost when powered off NonvolaCle memories retain value even if powered off §  Generic name is read-­‐only memory (ROM) §  Misleading because some types can be read and wri^en Types §  Read-­‐only memory ROM (ROM): programmed during produc-on §  Programmable ROM (PROM): can be programmed once §  Erasable programmable ROM (EPROM): can be bulk erased (UV, X-­‐ray) §  Electrically erasable PROM (EEPROM): electronic erase capability §  Flash memory: EEPROMs with block-­‐level erase capability Uses of nonvolaCle memory §  Firmware, solid state disks, etc. VII:16 BYU ECEn Typical CPU-­‐Memory Bus Structure ¢  ¢  BYU ECEn Memory Read TransacCon (#1) A bus is a collec-on of parallel wires that carry address, data, and control signals. Buses are typically shared by mul-ple devices. CPU is execu-ng a load: movq A,%rax ¢  CPU chip CPU places address A on the memory bus CPU chip register file register file %rax ALU A main memory I/O bridge bus interface ALU bus interface x I/O bridge system bus memory bus VII:17 system bus memory bus A main memory VII:18 BYU ECEn Memory Read TransacCon (#2) BYU ECEn Memory Read TransacCon (#3) CPU is execu-ng a load: movq A,%rax ¢  CPU is execu-ng a load: movq A,%rax Main memory reads A from bus, retrieves word x, places it on bus ¢  CPU chip CPU reads word x from bus, copies it into register %rax CPU chip register file %rax register file %rax ALU x ALU x bus interface bus interface x I/O bridge system bus memory bus x A I/O bridge main memory VII:19 system bus memory bus BYU ECEn Memory Write TransacCon (#1) BYU ECEn Memory Write TransacCon (#2) CPU is execu-ng a store: movq %rax,A ¢  CPU is execu-ng a store: movq %rax,A CPU places address A on bus; main memory reads A, waits for data word ¢  CPU chip CPU places data word y on bus CPU chip register file %rax register file y %rax ALU y ALU A y bus interface bus interface x I/O bridge system bus memory bus x A I/O bridge main memory VII:21 system bus memory bus A main memory VII:22 BYU ECEn BYU ECEn What’s Inside a Disk Drive? Memory Write TransacCon (#3) Spindle CPU is execu-ng a store: movq %rax,A ¢  A main memory VII:20 Platters Arm Main memory reads data word y from bus, stores it at address A Actuator CPU chip register file %rax y ALU Electronics (including a processor and memory!) SCSI connector bus interface y I/O bridge system bus memory bus A main memory VII:23 Image courtesy of Seagate Technology VII:24 BYU ECEn Disk Geometry ¢  ¢  ¢  BYU ECEn Disk OperaCon (Single-­‐PlaYer View) Disks consist of pla^ers, each with two surfaces. Each surface consists of concentric rings called tracks. Each track consists of sectors separated by gaps. The disk surface spins at a fixed rotational rate tracks surface track k The read/write head is attached to the end of the arm and flies over the disk surface on a thin cushion of air. gaps spindle spindle spindle spindle spindle By moving radially, the arm can position the read/write head over any track. sectors VII:25 VII:26 BYU ECEn Disk Geometry (MulCple-­‐PlaYer View) ¢  Disk Capacity Aligned tracks form a cylinder. ¢  read/write heads move in unison from cylinder to cylinder cylinder k BYU ECEn ¢  Capacity: maximum number of bits that can be stored. §  Vendors express capacity in units of gigabytes (GB), where 1 GB = 109. Capacity is determined by these technology factors: §  Recording density (bits/in): number of bits that can be squeezed into a 1 inch segment of a track. surface 0 §  Track density (tracks/in): number of tracks that can be squeezed into a 1 inch platter 0 surface 1 surface 2 radial segment. arm platter 1 §  Areal density (bits/in2): product of recording and track density. surface 3 surface 4 platter 2 surface 5 spindle spindle VII:27 VII:28 BYU ECEn Recording Zones Modern disks parCCon tracks into disjoint subsets called recording zones §  Each track in a zone has the same number of sectors, determined by the circumference of innermost track. §  Each zone has a different number of sectors/track; outer zones have more sectors/track than inner zones. §  Contrast with old approach: fixed number of sectors for all tracks. §  So we use average number of sectors/ track when compu-ng capacity CompuCng Disk Capacity Capacity = (# bytes/sector) x (avg. # sectors/track) x (# tracks/surface) x (# surfaces/plaYer) x (# plaYers/disk) … ¢  BYU ECEn ¢  Spindle Example: §  512 bytes/sector §  300 sectors/track (on average) §  20,000 tracks/surface §  2 surfaces/pla^er §  5 pla^ers/disk Capacity = 512 x 300 x 20000 x 2 x 5 = 30,720,000,000 = 30.72 GB VII:29 VII:30 BYU ECEn Disk Structure BYU ECEn Disk Access Sequence ¢  ¢  View shows single plaYer from above Structure §  Surface divided into tracks §  Tracks divided into sectors §  Head is in posi-on above a track §  Assume rota-on is counter-­‐clockwise §  About to read blue sector 1. Finished BLUE read, RED read is next VII:31 VII:32 BYU ECEn Disk Access Sequence 1. Finished BLUE read, RED read is next BYU ECEn Disk Access Sequence 1. Finished BLUE read, RED read is next 2. Seek to RED track 2. Seek to RED track 3. Wait for RED sector to rotate around VII:33 VII:34 BYU ECEn Disk Access Sequence 1. Finished BLUE read, RED read is next 2. Seek to RED track BYU ECEn Disk Access: Service Time Components 3. Wait for RED sector to rotate around 4. Ater RED read 1. Finished BLUE read, RED read is next 2. Seek to RED track Seek VII:35 3. Wait for RED sector to rotate around RotaConal latency 4. Ater RED read Data transfer VII:36 BYU ECEn Disk Access Time ¢  ¢  ¢  ¢  BYU ECEn Disk Access Time Example Average Cme to access some target sector approximated by : §  Taccess = Tavg seek + Tavg rota-on + Tavg transfer Seek Cme (Tavg seek) §  Time to posi-on heads over cylinder containing target sector. §  Typical Tavg seek is 3 to 9 ms RotaConal latency (Tavg rotaCon) §  Time wai-ng for first bit of target sector to pass under r/w head. §  Typical Tavg rota-on is ~4 ms (half a rota-on at 7200 RPM) Transfer Cme (Tavg transfer) §  Time to read the bits in the target sector. §  Typical Tavg transfer is ~0.02 ms (1/400 of a rota-on with 400 sectors/track) ¢  Given: §  Rota-onal rate = 7200 RPM §  Average seek -me = 9 ms §  Avg # sectors/track = 400 ¢  Derived: §  Tavg rota-on = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms. §  Tavg transfer = 60/7200 RPM x 1/400 secs/track x 1000 ms/sec = 0.02 ms §  Taccess = 9 ms + 4 ms + 0.02 ms = 13.02 ms ¢  Important points: §  Access -me dominated by seek -me and rota-onal latency. §  First bit in a sector is the most expensive, the rest are virtually free. §  SRAM access -me is roughly 4 ns/doubleword, 60ns for DRAM; hence §  §  VII:37 Disk is about 40,000 -mes slower than SRAM. Disk is about 2,500 -mes slower then DRAM. VII:38 BYU ECEn Logical Disk Blocks ¢  BYU ECEn Discovering Disk Details Modern disks present a simpler abstract view of the complex sector geometry: §  The set of available sectors is modeled as a sequence of b sector-­‐sized ¢  Would it be possible – using somware means alone – to determine the sector geometry of a disk? §  How might you do this? logical blocks (0, 1, 2, ..., b-­‐1) ¢  ¢  Mapping between logical blocks and actual (physical) sectors §  Maintained by hardware/firmware device called disk controller. §  Converts requests for logical blocks into (surface,track,sector) triples. Allows controller to set aside spare cylinders for each zone. §  Accounts for some of the the difference between ¢  Researchers at CMU did just that §  DIXtrac tool accesses all pairs of logical blocks in sequence §  Measures access -me + determines rela-ve posi-ons of those blocks §  Eventually determines number of recording zones, number of sectors per zone, the sector that each logical block maps to, etc. “forma^ed capacity” and “maximum capacity”. §  Historically, disk manufacturers have not published technical informa-on about their disks at this level of detail VII:39 VII:40 BYU ECEn BYU ECEn Reading a Disk Sector (#1) I/O Bus CPU chip CPU chip register file register file ALU ALU system bus memory bus main memory I/O bridge bus interface CPU initiates a disk read by writing a command, logical block number, and destination memory address to a port (I/O address) associated with disk controller. main memory bus interface I/O bus I/O bus USB controller mouse keyboard graphics adapter disk controller Expansion slots for other devices such as network adapters. USB controller mouse keyboard monitor disk graphics adapter disk controller monitor disk VII:41 VII:42 BYU ECEn Reading a Disk Sector (#2) BYU ECEn Reading a Disk Sector (#3) CPU chip CPU chip register file ALU main memory bus interface When the DMA transfer completes, the disk controller notifies the CPU with an interrupt (i.e., asserts a special “interrupt” pin on the CPU) register file Disk controller reads the sector and performs a direct memory access (DMA) transfer into main memory. ALU main memory bus interface I/O bus USB controller graphics adapter mouse keyboard I/O bus USB controller disk controller monitor mouse keyboard graphics adapter disk controller monitor disk disk VII:43 VII:44 BYU ECEn Solid State Disks (SSDs) BYU ECEn SSD Performance CharacterisCcs I/O bus Requests to read and write logical disk blocks Solid State Disk (SSD) Sequen-al read throughput Random read throughput Avg seq read -me Flash translation layer 550 MB/s 365 MB/s 50 μs Sequen-al write throughput Random write throughput Avg seq write -me Intel SSD 730 product specifica-on Flash memory Block 0 Page 0 Block B-1 Page 1 … Page P-1 … Page 0 Page 1 … ¢  Page P-1 ¢  ¢  ¢  ¢  ¢  470 MB/s 303 MB/s 60 μs Pages: 512B to 4KB, Blocks: 32 to 128 pages (16KB to 512KB) Data read/wri^en a page at a -me Page can be wri^en only ater its en-re block has been erased (set to 1), so page change causes all other pages to be copied to new block A block wears out ater about 100,000 writes SequenCal access faster than random access §  Common theme in the memory hierarchy Writes are slower than reads §  Erasing a block takes a long -me (~1 ms) §  Includes overhead of copying other pages to new block §  In earlier SSDs, the read/write gap was much larger VII:45 VII:46 Source: Source: IIntel ntel SSSD SD 7730 30 pproduct roduct sspecificaCon. pecificaCon. Source: Source: IIntel ntel SSSD SD 7730 30 pproduct roduct sspecificaCon. pecificaCon. Source: Source: IIntel ntel SSSD SD 7730 30 pproduct roduct sspecificaCon. pecificaCon. Source: Source: IIntel ntel SSSD SD 7730 30 pproduct roduct sspecificaCon. pecificaCon. Source: Source: IIntel ntel SSSD SD 7730 30 pproduct roduct sspecificaCon. pecificaCon. Source: Intel SSD 730 product specificaCon. BYU ECEn SSD Tradeoffs vs RotaCng Disks ¢  ¢  A Look Back Advantages §  SSDs have no moving parts: faster, less power, more rugged Disadvantages §  SSDs have the poten-al to wear out ¢  Mi-gated by “wear leveling logic” in flash transla-on layer For SSD 730, Intel guarantees 128 petabyte (128 x 1015 bytes) of writes before it wears out §  In 2015, SSD about 30x more expensive per byte than magne-c disk §  §  ¢  BYU ECEn IBM’s RAMAC disk memory device §  Introduced in 1956 §  50 2-­‐foot diameter disks §  5 MB total storage §  Lease rate: $35,000/year §  ApplicaCons §  MP3 players, smart phones, laptops §  Beginning to appear in desktops and servers VII:47 Equivalent to more than $250,000 today VII:48 BYU ECEn An Impressive 30-­‐Year Run! SRAM metric $/MB Access (ns) 1980 1985 19,200 2,900 300 150 1990 1995 2000 2005 2010 2010:1980 320 35 256 15 100 3 75 2 60 1.5 320x 200x BYU ECEn Mind the Gaps: CPU–Memory–Disk Disk seek time 100,000,000.0 Disk CPU 1980 1985 1990 1995 2000 2005 2010 2010:1980 8,000 375 0.064 880 200 0.256 100 100 4 30 70 16 1 60 64 0.1 50 2,000 0.06 40 8,000 130,000x 9x 125,000x metric $/MB Seek Cme (ms) Typical size (MB) 1980 1985 1990 1995 2000 2005 2010 2010:1980 500 87 1 100 75 10 8 28 160 0.30 10 1,000 0.01 8 20,000 0.005 0.0003 1,600,000x 5 3 29x 160,000 1,500,000 1,500,000x 1980 1985 1990 1995 2000 2005 2010 2010:1980 8080 1 1,000 286 6 166 386 20 50 Pent. 150 6 P-­‐III 600 1.6 Core2 2,000 0.50 Core i7 2,500 0.4 2,500x 2,500x Processor Clock rate (MHz) Cycle Cme (ns)...
View Full Document

  • Winter '17
  • James Archibald
  • Central processing unit, CPU cache

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern