9 Hardware Reliability
, Bernhard Fechner
, Felix Salfner
, Max Walter
, Philipp Limbourg
Swiss Federal Institute of Technology (ETH), Zurich, Switzerland
University of Hagen, Germany
Humboldt University Berlin, Germany
Technische Universität München, Germany
Saarland University, Germany
University of Duisburg-Essen, Germany
Reliability is an important part of dependability. This chapter aims at supporting readers
in the usage of the classical deﬁnitions, modelling and measures of (hardware) reliabil-
In the IT ﬁeld the term “fault tolerance” is often widely used as “reliability improve-
ment”. The question to be clariﬁed is the relationship between reliability and fault toler-
ance. In a general sense
will be understood as ability of a component/system
to function correctly over a speciﬁed period of time, mostly under predeﬁned condi-
is deﬁned as the ability of the system to continue operation in the
event of a failure. Fault tolerance means that a computer system or component is de-
signed such that, in case a component fails, a backup component or backup procedure
can immediately take its place with no loss of functionality. Reliability can be improved
through fault tolerance. Metrics of “classical” reliability theory are well known and nu-
merous. Metrics of fault tolerance are less common, e.g. number of tolerated faults,
number of checkpoints, reconﬁguration time, etc.
The most important method supporting fault tolerance/reliability is redundancy.
is duplication of components or repetition of operations to provide alterna-
tive functional channels in case of failure. Redundancy can be implemented in different
ways: structural (hot and standby redundancy), temporal, functional, etc. Application
of redundancy is always connected with an increase in cost and/or complexity as well
as sometimes with synchronisation problems.
Predicting the system reliability by modelling during the design phase, and mea-
suring the parameters of a real system are two completely different approaches. This
chapter is sub-divided into ﬁve sections depending on the primary goal of the readers.
The sections of this chapter are presented as set of references structured according to
the various reliability metrics (RM).
An index is provided at the end of the book so that speciﬁc issues can be referenced
The chapter is organised as follows:
Sect 9.2 deals with the motivation on the application of reliability metrics. The
reader should be able to deﬁne the reliability problem he/she is interested in.
I. Eusgeld, F.C. Freiling, and R. Reussner (Eds.): Dependability Metrics, LNCS 4909, pp. 59–103, 2008.