lec23-wrapup.pdf - CS 150 Digital Design Lecture 23 – Course Wrap-Up Elad Alon today’s lecture by John Lazzaro TAs Daiwei Li James Parker Dan Yeager

lec23-wrapup.pdf - CS 150 Digital Design Lecture 23 –...

This preview shows page 1 out of 31 pages.

You've reached the end of your free preview.

Want to read all 31 pages?

Unformatted text preview: CS 150 Digital Design Lecture 23 – Course Wrap-Up 2011-11-29 Elad Alon today’s lecture by John Lazzaro TAs: Daiwei Li, James Parker, Dan Yeager www-inst.eecs.berkeley.edu/~cs150/ CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 1 Sad fact: Computers turn electrical energy into heat. Computation is a byproduct. Energy and Performance Air or water carries heat away, or chip melts. CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 2 The Joule: Unit of energy. Can also be expressed as Watt-Seconds. Burning 1 Watt for 100 seconds uses 100 Watt-Seconds of energy. 1A 1V + - This is how electric tea pots work ... 1 Joule heats 1 gram of water 0.24 degree C 1 Joule of Heat Energy per Second att W 1 The Watt: Unit of power. The amount of energy burned in the resistor in 1 second. 1 Ohm Resistor 20 W rating: Maximum power the package is able to transfer to the air. Exceed rating and resistor burns. CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 3 Cooling an iPod nano ... Like resistor on last slide, iPod relies on passive transfer of heat from case to the air. Why? Users don’t want fans in their pocket ... To stay “cool to the touch” via passive cooling, power budget of 5 W. If iPod nano used 5W all the time, its battery would last 15 minutes ... CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 4 Powering an iPod nano (2005 edition) 1.2 W-hour battery: Can supply 1.2 watts of power for 1 hour. 1.2 W / 5 W = 15 minutes. More W-hours require bigger battery and thus bigger “form factor” -it wouldn’t be “nano” anymore :-). Real specs for iPod nano : 14 hours for music, 4 hours for slide shows. 85 mW for music. 300 mW for slides. CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 5 Finding the (2005) iPod nano CPU ... A close relative ... Two 80 MHz CPUs. One CPU used for audio, one for slides. Low-power ARM roughly 1mW per MHz ... variable clock, sleep modes. 85 mW system power realistic ... CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 6 Year-to-year: continuous improvements iPod nano 2005 14 hours battery life (audio playback) iPod nano 2006 24 hours battery life (audio playback) CS 150 L23: Course Wrap-Up What changed inside ? Source: ifixit.com UC Regents Fall 2011 © UCB 7 Source: ifixit.com CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 8 Source: ifixit.com CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 9 iPod nano 2005 a C-shaped PC board, with a battery in the “C” opening. iPod nano 2006 battery lies on top of PC board. CS 150 L23: Course Wrap-Up Source: ifixit.com UC Regents Fall 2011 © UCB 10 How? Small IC packages, fewer parts iPod nano 2006 iPod nano 2005 Source: arstechnica.com CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 11 Aluminum permits thinner case ... What’s happened since 2006? Source: ilounge.com CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 12 2010 Nano 2010 Shuffle 0.74 ounces 0.4 4 ounces nearly the same depth 2010 Nano: “up to” 24 hours audio playback 2010 Shuffle: “up to” 15 hours audio playback CS 150 L23: Course Wrap-Up 0.39 W Hr (33% of 2005 Nano) Sources: iFixit, Apple 0.19 W Hr UC Regents Fall 2011 © UCB 13 Desired screen size sets smartphone W x L Depth? : Thin body vs. Battery life CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 14 22% gain in battery energy over 5 iterations CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 15 iPhone (2007) Main boar d Ante nna s CS 150 L23: Course Wrap-Up ry e t t Ba UC Regents Fall 2011 © UCB 16 iPhone 4{,S} Battery L-shape Main Board Metal frame acts as antenna CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 17 In 4 years: 6.8x increase in transistor count 33% max clock speed increase Attached DRAM: 128 MB -> 512 MB CS 150 L23: Course Wrap-Up 6.8x transistors: Dual CPU and GPU, and to save energy. UC Regents Fall 2011 © UCB 18 Notebooks ... as designed in 2006 ... 2006 Apple MacBook -- 5.2 lbs 8.9 in 1 in 12.8 in Performance: Must be “close enough” to desktop performance ... most people no longer used a desktop (even in 2006). Size and Weight. Ideal: paper notebook. Heat: No longer “laptops” -- top may get “warm”, bottom “hot”. Quiet fans OK. CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 19 Battery: Set by size and weight limits ... Battery rating: 55 W-hour. 46x more energy than iPod nano battery. And iPod lets you listen to music for 14 hours! Almost full 1 inch depth. Width and height set by available space, weight. CS 150 L23: Course Wrap-Up At 2.3 GHz, Intel Core Duo CPU consumes 31 W running a heavy load - under 2 hours battery life! And, just for CPU! At 1 GHz, CPU consumes 13 Watts. “Energy saver” option uses this mode ... UC Regents Fall 2011 © UCB 20 55 W-hour battery stores the energy of 1/2 a stick of dynamite. If battery short-circuits, catastrophe is possible ... CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 21 CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 22 MacBook Air ... design the laptop like an iPod CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 23 2011 Air: 11.8 in x 7.56 in x 0.68 in; 2.38 lbs 0.68 in 0.11 in 2006 Macbook: 12.8 in x 8.9 in x 1 in; 5.2 lbs CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 24 Mainboard: “form-fit” fills about battery 25% of ... the laptop Non-removable, 35 W-h battery: 63% of 2006 MacBook’s 55 W-h CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 25 CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 26 2011 Air: 35 W-h battery, 5 hour battery life * iPad 2: 25 W-h battery, 10 hour battery life * *For a content-consumption workload. iPa d 2 : 1.33 lbs MacBook Air 11.6 in: 2.38 lbs Battery-Life-Hour/W-h: 2.8x iPad advantage CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 27 “Content Creation vs. Content Consumption” 2011 Air: $999 -- 64 GB SSD, 2 GB RAM, x86 iPad 2: $699 -- 64 GB SSD, 512 MB RAM, ARM iPhone 4S and iPad 2: Identical CPU/RAM stack CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 28 MacBook Air: Full PC Top Thunderbolt I/O Core i5 CPU/GPU Platform Controller Hub Bottom 4GB DRAM CS 250 L12: System Context UC Regents Fall 2011 © UCB 29 ower Pie The (IBM Thinkpad CPU is only R40) part of power budget! t stem 2004-era notebook T.J. Watson Research Center running a full workload. Current Generation Laptop Power Pie 15% 15% 4% 4% (IBM Thinkpad R40) “other” 29% 4% 5% 1% GPU Idle Power 8% “Amdahl’s Law for Power” 15% 8% 4% 4% If our CPU took no power 26% 1% CPU LCD 52% 1% at all to run, that would52% Backlight 13% CPU HDD only double battery life! 13% Power Supply Wireless 1% 3% LCDLCD Optical Drive 4% Graphics 1% 3%3% LCD Backlight Memory Rest of the system 4% 1% 3%3% Max Power Workload Data courtesy Mahesri et al., U of Illinois, 2004 CS 150 L23: Course Wrap-Up Max Power Workload 6 Pradip Bose| Hot Chips 2005 Tutorial UC Regents Fall 2011 © UCB 30 August 14, 2005 © 2004, 2005 IBM Corporation Servers: Total Cost of Ownership (TCO) Machine rooms are expensive. Removing heat dictates how many servers to put in a machine room. Reliability: running computers hot makes them fail more often. CS 150 L23: Course Wrap-Up Electric bill adds up! Powering the servers + powering the air conditioners is a big part of TCO. UC Regents Fall 2011 © UCB 31 Computations per W-h doubles every 1.6 years, going back to the first computer. (Jonathan Koomey, Stanford). CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 32 channel transistors. This source, substrate, emiconductor fabrication and drain doping effectively produces two that switches the current back-to-back junction diodes from the source h current gate lengths in terminal to the drain terminal. When a sufcaling will no longer be ficiently large positive voltage is applied to transistors. Alternatives the gate of an N-channel transistor (which ultrathin channel struccreates an electric field, hence the field efontrol leakage pathways. Processors and Energy fect), the silicon surface is ‘‘inverted’’—the arge carriers may be obconduction band is populated and forms a al orientations. Here, we narrow conducting layer between the source evice performance trends and the drain. If there is a voltage difference scuss below the challenges and possible solutions in erformance trend. Building Blocks CS 150 L23: Course Wrap-Up metal oxide semiconductor sistor, is a fundamental Device engineers trade speed and power L/α xd/α α*NA We can reduce leakage (PstandbyVt) by raising Vdd Vt. d ctive Pa se a e ee p s cr In uce V/α tox/α We can increase speed by raising Vdd and lowering Vt. C Vt B 2 We can reduce CV (Pactive) by lowering Vdd. Red Length: on: ate: UC Regents Fall 2011 © UCB Gate delay ALING e: between the source and the drain, an electric current can flow between them. When the gate voltage is removed or set at zero voltage, the surface region under the gate is depleted with electric carriers and there is no current flow between the source and the drain. We can therefore see that the current flowing through the structure can be regulat-33 ed by applying voltage to the gate electrode. e P standby Reduc Vdd From: Silicon Device Scaling to the Sub-10-nm Regime Meikei Ieong,1* Bruce Doris,2 Jakub Kedzierski,1 Ken Rim,1 Min Yang1 CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB g various relevant device scaling parameters. (B) Complementary metal-oxide34 s a function of power-supply voltage (Vdd). Gate delay rapidly increases as Vdd Customize processes for product types ... Transistors Require Optimization to the Application Performance vs. Leakage High Performance Low Power CPU Mobile Chipset Network Processor Cell Phone PDA Ultra-Low Power From: “Facing the Hot Chipstransistors Challenge Again”,can Bill Holt, Intel, presented at Hot Chips 17, leakage 2005. Optimized provide ~1000x lower CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 35 Five low-power design techniques Parallelism and pipelining Power-down idle transistors Slow down non-critical paths Clock gating Thermal management CS 150 L23: Course Wrap-Up UC Regents Fall 2011 © UCB 36 26 Design Technique #1 (of 5) Trading Hardware for Power via Parallelism and Pipelining ... CS 150 L23: Course Wrap-Up Slow Fast Slow UC Regents Fall 2011 © UCB 37 High Supply Voltage Low Supply Voltage Active Power Reducti Mul Volt Replicated Designs And so, we can transform this: Slow Vdd Block Vdd Logic Block High Supply Voltage GateReduction delay Active Power roughly linear with Vdd Vdd/2 Freq = 1 Vdd = 1 Logic Block Throughput = 1 Power = 1 Logic Block Area = 1 Pwr Den = 1 Multiple Supply Voltages Block processes stereo audio. 1/2 36 of clocks for “left”, 1/2 for “right”. Replicated Designs Fast Slow Vdd/2 Into this: Freq = 1 Vdd = 1 Logic Block 2 Throughput = 1 CV Power = 1only power Logic Block Area = 1 Pwr Den = 1 Freq = 0.5 Vdd = 0.5 Throughput = 1 Power = 0.25 Area = 2 Pwr Den = 0.125 Ex: Top block processes audio channel 1, bottom block processes audio channel 2. This magic trick brought to you by Cory Hall ... CS 150 L23: Course Wrap-Up 36 UC Regents Fall 2011 © UCB 38 COMP P ref (EQ 16) LA ≈ 0.39 ref LAT f LA LAT LA LA A COMP LA A LA L = ( 1.15 C ref ) ( 0.58 V ref ) L CO P = C pipe V pipe f pipe pipe critical races. 1 CO 1and T C T C 1 Architecture Level 1 by improvement 1 mprovement Clearly can an be1even obtained bigger simultaneously canexploiting be obtained parallelism by simultaneously and pipelinexploiting parallelism andOptimization pipelinT T T T Area = 640 x 1081 Area µ2= 640 x 1081 µ2 hese cases ing.along The summary with the area of all penalty these cases is presented along with in Table the area 2. penalty is presented in Table 2. Chandrakasan & Brodersen (UCB EECS) COMPARATOR Figure 20 Pipelined Figure implementation 20 Pipelined implementation of the simplewhere datapath. of C the simple datapath. the total effective capacitance being switched per clock cycle. The effective capacitance was ref is A based voltage Architecture Table scaling 2 Architecture results based voltage scaling results determined by averaging the energy over a sequence of input patterns with a uniform distribution. ical races. and critical races. C Simple 1 Parallel 2.9V 1 Parallel T3.4 ipelined 2.9V Pipelined 1.3 1 T 5V1 1 1 T 1 Architecture Level Optimization 0.36 COMPARATOR 5V T ADDER l of pipelining the level also of has pipelining the effect alsoofhas reducing the effect logic ofdepth reducing and logic hencedepth powerand contributed hence power dueA>B contributed to hazards due to hazards 1 B Simple LATCH B LATCH A LATCH C2 LATCH P T COMPARATOR ADDER LATCH B A Area Area Power Power chitecture Voltage Architecture Voltage A> Badvantage eductionpower as the parallel case as the with parallel the advantage case1 withofthe lower area overhead. of(normalized) lowerAs area anoverhead. added(normalized) bonus, As an increasing added bonus, increasing B reduction (normalized) (normalized) 1 T 2.9V 0.36 3.4 2.9V 0.39 1.3 T 2.0 0.2 3.7 Figure 18 A simple datapath with corresponding layout. C 2 Area = 640 x 1081 µ and pipeliny an evenClearly bigger improvement an even biggercan improvement be obtainedcan by simultaneously be obtained by simultaneously exploiting parallelism exploiting and parallelism pipelin1 0.39 Area = 636 x 833 µ2 e summary ing.ofThe all these summary casesofalong all Figure these withcases thePipelined area along penalty with the is presented area penalty in the Table is presented 2. datapath. in Table 2. 20 implementation of simple ned-Parallel Table 2 2.0 Pipelined-Parallel 3.7 Simple LATCH C LATCH C1 COMPARATOR LATCH A 1 his architecture, With this thearchitecture, power reduces theby power a factor reduces of approximately by a factor of2.5, approximately providing approximately 2.5, providingthe approximately same the same T 0.2 A Architecture Table 2 based Architecture voltage scaling based results voltage scaling results COMPARATOR COMPARATOR With this architecture, the power reduces by a factor of approximately 2.5, providing approximately the same LATCH C ADDER LATCH B LATCH A 1 One way to maintain throughput while reducing the supply voltage is to utilize a parallel architecture. As oncept used Theinsame the previous parallelism section concept can be used used in the to optimize previousmemory section operations be used for memory operations original can rate while the original throughput. Since thefor speed requirements for the adder, comparaCto optimize 1maintaining 2T 1 andshow critical races. tor,schemes and latch1memory have 25ns 50ns, the voltage can be Simple Simple 5V 1 of data 1 decreased Figure low-power. 21 two For alternate example, schemes Figure5V for 21 show reading two 8bits alternate from for reading at from 8bits of to data from memory at dropped from 5V to 2.9V (the voltage at 2T1 Architecture Level Optimization which the delay doubled, from Figure 7). While the datapath capacitance has increased by a factor of 2, the ad side throughput is the serialParallel f.access On thescheme left had inside which is the theserial 8-bits access of datascheme are read inwhich a0.36 serial thefor8-bits of0.36 data are read in a serial forClearly an even bigger improvement can be obtained by in simultaneously exploiting parallelism and pipelinParallel 2.9V 2.9V 3.4 3.4 MUX reduction as the Access parallel case with the advantage of lower area2Toverhead.Power As an Area Area Power ccess power5.1.2 Memory A>Badded bonus, increasing Architecture Architecture Voltage Voltage shown in Figure 19, two identical adder-comparator datapaths are used, allowing each unit to work at half the (normalized) (normalized) (normalized) the level of pipelining also has the effect of reducing logic depth and hence(normalized) power contributed due to hazards Par alle l LATCH C LATCH B LATCH A 1 ing. ThePipelined summary of allPipelined these2.9V cases along with the area penalty1.3 is presented in Table 2T 2.9V 1.3 0.39 0.39 2. COMPARATOR COMPARATOR operating frequency has correspondingly decreased by a factor of 2. Unfortunately, there is also a slight 1 2 1 2T 2 .1.2 Memory 5.1.2 Access Memory Architecture Access Voltage ADDER T increase in the total “effective” capacitance introduced due to the extra routing, resulting in an increased capacA>B =27 V the power 1.15 Cdatapath ) ( 0.58 V )by:f itance by factor ofCof 2.15. for = the (parallel is given B Papipe Table 2 Architecture based scaling results 64Thus pipe 27 of 64 pipe ref ref ref Pipelined-Parallel Pipelined-Parallel 2.0 2.0 3.7voltage 3.7 0.2 0.2 f pipe ≈ 0.39 P C Area P = C par par (normalized) ref (EQ 16) 1 2T f ref = 1476 x 1219 µ2 Power= ( 2.15 C ) ( 0.58 V ) 2 Area -------≈ 0.36 P ref par ref ref 2 (normalized) V 2 par f (EQ 15) A LATCH C1 LATCH A LATCH C2 1 COMPARATOR 11 LATCH P 5V LATCH B Simple COMPARATOR Figure memory 19 Parallel of the simple datapath. ame parallelism The same concept parallelism used inconcept the previous used in section the previous can be used section to optimize can be used memory to optimize operations for implementation operations for T ofreading wer. Forlow-power. example, Figure For example, 21 showFigure two alternate 21 showschemes two alternate for reading schemes 8bits for data from 8bits memory of dataatfrom memory at Pipelined ADDER The amount of parallelism can to further reduce the power supply voltage and the power conA> B 1 be increased Parallel 2.9V 3.4 Tserial put f. Onthroughput the left hadf. side On the is the leftserial had side access is the scheme serialin access whichscheme the 8-bits in which of data are8-bits read of in data a0.36 are forreadas in serialapproaches forsumption for the a fixed throughput. However, theasupply the threshold voltage of the devices, the B 1 Pipelined 2.9V 1.3T significantly with 0.39a reduction in supply voltage and therefore the amount of parallelism and delays increase Pipelined-Parallel 2.0 T corresponding overheadT circuitry increase significantly. At some “optimum” voltage, Area = 640 x 1081the µ2 overhead circuitry 3.7 0.227 of 64 27 of 64 due to parallelism dominates and the power starts to increase with further reduction in supply [14]. Anantha P. Chandrakasan Figure 20 Pipelined implementation of the simple datapath. C 1 CS 150 L23: Course Wrap-Up Minimizing Power Consumption in CMOS Circuits 1 From: Minimizing Power Consumption in CMOS Circuits Robert W. Brodersen UC Regents Fall 2011 © UCB Another possible approach is to apply pipelining to the architecture, as shown in Figure 20. With the addiWith this architecture, the power reduces by a factor of approximately 2.5, providing approximately the same tional pipeline latch, theAnantha criticalP.path becomes the max[Tadder, Tcomparator], allowing the adder and the comparChandrakasan Department of EECS power reduction the parallel caseW. with the advantage of lower area overhead. As an added bonus, increasing University of California Robert Brodersen parallelism concept used in the previous section can beasused to optimize memory ator to operate at a slower rate. For this example,operations the two delaysfor are equal, allowing theat Berkeley supply voltage to again the level of pipelining also has the effect of reducing logic depth and hence power contributed due to hazards be reduced from 5V used in the reference datapath to 2.9V (the voltage at which the delay doubles) with no For example, Figure 21 show two alternate schemes for reading 8bits of data from memory at and critical races. Department of EECS of 64 loss in throughput. However, is aatmuch overhead incurred by thispower technique, asfor25 we only need Abstract: area An approach is presented for minimizing consumption digital systems implemented Universitythere of California Berkeleylower 5.1.2 Memory Access The same low-power. 39 in CMOS involves optimization at all levels of the design. This optimization includes the technolthroughput f. On the left had side is the serial access scheme in which the 8-bits of data are read in awhich serial for- ogy used to the digital circuits, the circuit style andextra topology, the ...
View Full Document

  • Fall '08
  • Staff

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes
A+ icon
Ask Expert Tutors