fault_tolerance_terminology_concepts(1).pdf - Terminology...

This preview shows page 1 - 2 out of 17 pages.

Terminology and Concepts Prof. Naga Kandasamy ECE Department, Drexel University April 8, 2017 These notes are adapted in part from: B. W. Johnson, Design & Analysis of Fault Tolerant Digital Systems, Addison Wesley, 1989. M. Shooman, Reliability of Computer Systems and Networks, Wiley & Sons, 2002. 1 Goals of Fault Tolerance Dependability of a computer or computer-based system may be defined as “justifiable confidence that it will perform specified actions or deliver specified results in a trustworthy and timely man- ner.” 1 Dependability is an umbrella term encompassing the concepts of reliability, availability, performability, safety, and testability. We will now define the above terms in an intuitive fashion. Reliability. The reliability R ( t ) of a system is a function of time, defined as the conditional probability that the system will perform correctly throughout the interval [ t 0 , t ] , given that the system was performing correctly at time t 0 . So, reliability is the probability that the system will operate correctly throughout a complete interval of time. We can introduce the concept of reliability in terms of testing data. Let us assume that 50 compo- nents operate for 1,000 hours, and during this testing phase, two fail. So, the probability of failure for a component in 1,000 hours of operation is 2 / 50 = . 04 . The probability of success, which is the reliability of the component, is simply R (1 , 000) = 1 - . 04 = 0 . 96 . So, reliability is the probability of no failure within the given operating period. We can also determine the failure rate λ of the same component as 2 failures/ (50 × 1 , 000) operating hours, which is equal to λ =4 × 10 5 . This is also sometimes stated in terms of a hazard function z as 40 failures per million operating hours, or in terms of fits (failures in time) which are failures per billion operating hours. For the simplest case where the failure rate is a constant, the reliability function can be shown to be R ( t )= e λt (we will derive this exponential failure law in a later section). Substituting the values from the preceding discussion, we obtain R (1 , 000) = e 4 × 10 - 5 × 1000 = 0 . 96 , which agrees with 1 B. Parhami, “A Multi-Level View of Dependable Computing,” Computers & Electrical Engineering, Vol. 20, No. 4, pp. 347–368, 1994. 1
Image of page 1

Subscribe to view the full document.

the previous computation. Now, consider the case of a system with n components in which all the components must work correctly. The system reliability in this case is R sys ( t )= e nλt . (1) For example, the first supercomputer, the CDC 6600, had 400 , 000 transistors and the failure rate for an individual transistor was 4 × 10 9 per hour. Though this failure rate is very low on a per- component basis, the overall computer reliability for 1 , 000 hours would be R (1 , 000)= e 400 , 000 × 4 × 10 - 9 × 1 , 000 =0 . 20 , which is a very low number. Though individual components can be made very reliable via strict quality control and testing, in a large system, however, it is unreasonable to expect that no compo- nent will ever fail. Returning to (1), R sys ( t ) can be improved by reducing n , λ , or t . An alternative
Image of page 2
You've reached the end of this preview.
  • Spring '17
  • nagarajan kandasamy

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern