SEU_TMR_agarswal_slides_eu.ppt - ELEC 7770 Advanced VLSI...

Info icon This preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 [email protected] http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07 Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 1 Soft Errors Soft errors are the errors caused by the operating environment. They are not due to a permanent hardware fault. Soft errors are intermittent or random, which makes their testing unreliable. One way to deal with soft errors is to make hardware robust: Capable of detecting soft errors Capable of correcting soft errors Both measures are probabilistic Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 2 Some Early References J. von Neumann, “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components,” pp. 329-378, 1959, in A. H. Taub, editor, John von Neumann: Collected Works, Volume V: Design of Computers, Theory of Automata and Numerical Analysis, Oxford University Press, 1963. M. A. Breuer, “Testing for Intermittent Faults in Digital Circuits,” IEEE Trans. Computers, vol. C-22, no. 3, pp. 241-246, March 1973. T. C. May and M. H. Woods, “Alpha-Particle-Induces Soft Errors in Dynamic Memories,” IEEE Trans. Electron Devices, vol. ED-26, no. 1, pp. 2-9, 1979. Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 3 Causes of Soft Errors Interconnect coupling (crosstalk). Power supply noise: IR-drop, delta-I. Effects generally attributed to alpha-particles: Charged particles: electrons, protons, ions. Radiation (photons): X-rays, gamma-rays, ultra-violet light. Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 4 Sources of Alpha-Particles Radioactive contamination in VLSI packaging material. Ionosphere, magnetosphere and solar radiation. Other electromagnetic radiation. Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 5 Alpha-Particle Helium nucleus: two protons and two neutrons, mass = 6.65 ×10-27kg, charge = +2e (e = 1.6 ×10-19C). Energy = 3.73 GeV Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 6 Soft Error Rate (SER) Failures in time (FIT): One FIT is 1 error per billion hours of operation. Alternative unit is mean time between failures (MTBF). 1 year MTBF Spring 07, Apr 17, 19 = 109/(365×24) = 114,155 FIT ELEC 7770: Advanced VLSI Design (Agrawal) 7 Particle Strike Ion or Charged particle - + n + + + - p - substrate Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 8 current Induced Current time I(t) = I0(e– t/a – e– t/b), Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) a >> b 9 Voltage Induced at a Node V = Q/C Where Q = ∫ I(t) dt C = node capacitance Smaller node capacitance will result in larger voltage swing. Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 10 Effect on Digital Circuit Charged Particles IN Charged Particles Combinational Logic OUT CK Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 11 An SRAM Cell WL VDD 0 bit 1 bit BL BL Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 12 SRAM Cell Struck by Alpha-Particle Single-Event Upset (SEU) WL Charged Particles VDD 0→1 bit 1→0 bit BL BL Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 13 D-Latch 1 D CK = 0 Spring 07, Apr 17, 19 Q 0 ELEC 7770: Advanced VLSI Design (Agrawal) 14 SEU in D-Latch 1→0 Q D CK = 0 Spring 07, Apr 17, 19 Charged Particles 0→1 ELEC 7770: Advanced VLSI Design (Agrawal) 15 Single Event Transients in Combinational Logic 1 0 1 1 1 CK Charged Particles 0 CK Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 16 Effects of Transients Error correcting effects Transient pulse is filtered by gate inertia Transient is blocked by an unsensitized path Transient is blocked by an inactive clock Error enhancing effects Large number of gates can produce multiple pulses Fanouts can multiply error pulses Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 17 SEUs in FPGA Parts that can be affected Look-up table (LUT) Configuration memory cell Flip-flop Block RAM Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 18 F1 F2 F3 1 F4 LUT Memory cells 1 1 0 0 1 0 0 0 out 0 1 1 1 0 0 Spring 07, Apr 17, 19 1 ELEC 7770: Advanced VLSI Design (Agrawal) 19 F1 F2 F3 1 Memory cells 1 1 0 0 Charged Particle 1 changed to 0 Spring 07, Apr 17, 19 F4 SEU in LUT 1 0 0 0 out 0 1 1 0 0 0 1 ELEC 7770: Advanced VLSI Design (Agrawal) 20 Four Types of SEU in FPGA M M M M M FF F1 F2 F3 F4 M Type 3 Type 2 LUT Type 1 M Configuration memory cell Spring 07, Apr 17, 19 Type 4 ELEC 7770: Advanced VLSI Design (Agrawal) Block RAM 21 SEU Detection Methods Hardware redundancy Time redundancy Error detection codes (EDC) Self-checker techniques Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 22 SEU Mitigation Techniques Triple modular redundancy (TMR) Multiple redundancy with voting Error detection and correction codes (EDAC) Hardened memory cells FPGA-specific methods Reconfiguration Partial configuration Rerouting design Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 23 Hardware Redundancy for Detection inputs Combinational Logic Combinational Logic (duplicated) output Logic 1 indicates error Hardware overhead is high ~ 100% Performance penalty is negligible. Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 24 Time Redundancy for Detection inputs Combinational Logic DQ output CK+ d DQ Logic 1 indicates error CK Hardware overhead is low. Performance penalty ( ~ d) = maximum detectable pulse width. Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 25 Repeat on Error Detection inputs Combinational Logic DQ C output CK+ d DQ Logic 1 indicates error CK Operation: Spring 07, Apr 17, 19 If error is detected, then output retains its previous value. Repeating the computation can produce correct result. ELEC 7770: Advanced VLSI Design (Agrawal) 26 Muller C-Element A output C B A B output 0 0 0 0 1 Old output 1 0 Old output 1 1 1 Spring 07, Apr 17, 19 A B ELEC 7770: Advanced VLSI Design (Agrawal) S Q output R 27 Triple Modular Redundancy (TMR) Combinational Logic copy 1 inputs Combinational Logic copy 2 Majority Voter output Combinational Logic copy 3 Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 28 Majority Voter Circuit A B C output A B 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 Spring 07, Apr 17, 19 C Majority Voter output A B output C ELEC 7770: Advanced VLSI Design (Agrawal) 29 Alternative Implementations of Voter VDD A LUT 0 0 0 1 0 1 1 1 output B output C ABC Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 30 Triple Modular Redundancy (TMR) inputs Combinational Logic DQ CK DQ Majority Voter CK+ d DQ output CK+3d DQ CK+2d Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 31 TMR for Memory Cells inputs Combinational Logic DQ CK DQ Majority Voter output CK DQ CK Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) Problems: 1. Accumulation of errors in flip-flops. 1. Voter is not protected. 32 FF Refresh and TMR for Memory Cells r1 DQ r2 CK DQ r3 CK DQ CK Spring 07, Apr 17, 19 Majority Voter Majority Voter Majority Voter output Majority Voter ELEC 7770: Advanced VLSI Design (Agrawal) 33 A Resistor Hardened SRAM Cell WL VDD 0 bit 1 bit BL BL Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 34 References F. L. Kastensmidt, L. Carro and R. Reis, Fault Tolerant Techniques for SRAM-Based FPGAs, Springer, 2006. S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, “Robust System Design with Built-In SoftError Resilience,” Computer, vol. 38, no. 2, pp. 43-52, February 2005. Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 35 Summary of Topics Covered (1) Nanotechnology devices Moore’s law System level design for testability and test scheduling problem Verification Logic equivalence Binary decision diagrams Power consumption and low-power concepts Multi-core parallelism Microprocessors Memories Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 36 Summary of Topics Covered (2) Timing Timing verification Timing simulation Static timing analysis Timing optimization Linear programming and clock constraints Clock skew problem Zero skew design Retiming, constraint graph and performance optimization Soft errors and fault-tolerant design Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 37 ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern