Terminology and Concepts
Prof. Naga Kandasamy
ECE Department, Drexel University
April 8, 2017
These notes are adapted in part from:
•
B. W. Johnson,
Design & Analysis of Fault Tolerant Digital Systems,
Addison Wesley, 1989.
•
M. Shooman,
Reliability of Computer Systems and Networks,
Wiley & Sons, 2002.
1
Goals of Fault Tolerance
Dependability of a computer or computerbased system may be defined as “justifiable confidence
that it will perform specified actions or deliver specified results in a trustworthy and timely man
ner.”
1
Dependability is an umbrella term encompassing the concepts of reliability, availability,
performability, safety, and testability. We will now define the above terms in an intuitive fashion.
Reliability.
The reliability
R
(
t
)
of a system is a function of time, defined as the conditional
probability that the system will perform correctly throughout the interval
[
t
0
, t
]
, given that the
system was performing correctly at time
t
0
. So, reliability is the probability that the system will
operate correctly throughout a complete interval of time.
We can introduce the concept of reliability in terms of testing data. Let us assume that 50 compo
nents operate for 1,000 hours, and during this testing phase, two fail. So, the probability of failure
for a component in 1,000 hours of operation is
2
/
50 =
.
04
. The probability of success, which
is the reliability of the component, is simply
R
(1
,
000) = 1

.
04 = 0
.
96
. So, reliability is the
probability of no failure within the given operating period. We can also determine the failure rate
λ
of the same component as 2 failures/
(50
×
1
,
000)
operating hours, which is equal to
λ
=4
×
10
−
5
.
This is also sometimes stated in terms of a hazard function
z
as 40 failures per million operating
hours, or in terms of
fits
(failures in time) which are failures per billion operating hours.
For the simplest case where the failure rate is a constant, the reliability function can be shown to be
R
(
t
)=
e
−
λt
(we will derive this exponential failure law in a later section). Substituting the values
from the preceding discussion, we obtain
R
(1
,
000) =
e
−
4
×
10

5
×
1000
= 0
.
96
, which agrees with
1
B. Parhami, “A MultiLevel View of Dependable Computing,”
Computers & Electrical Engineering,
Vol. 20, No.
4, pp. 347–368, 1994.
1