A protocol family approach to survivable storage infrastructures
Jay J. Wylie, Garth R. Goodson, Gregory R. Ganger, Michael K. Reiter
Carnegie Mellon University
supports a variety of fault models with
a single client-server protocol and a single server imple-
mentation. Protocol families shift the decision of which
types of faults to tolerate from system design time to data
creation time. With a protocol family based on a com-
mon survivable storage infrastructure, each data-item can
be protected from different types and numbers of faults.
Thus, a single implementation can be deployed in dif-
ferent environments. Moreover, a single deployment can
satisfy the specific survivability requirements of different
data for costs commensurate with its requirements.
Survivable, or fault-tolerant, storage systems protect data
by spreading it redundantly across a set of storage-nodes.
In the design of such systems, determining which kinds
of faults to tolerate and which timing model to assume
are important and difficult decisions. Fault models range
from crash faults to Byzantine faults and timing models
range from synchronous to asynchronous.
sions affect the access protocol employed, which can have
a major impact on performance. For example, a system’s
access protocol can be designed to provide consistency
under the weakest assumptions (i.e., Byzantine failures
in an asynchronous system), but this induces potentially-
unnecessary performance costs. Alternatively, designers
can “assume away” certain faults to gain performance.
Traditionally, the fault model decision is hard-coded
during the design of an access protocol. This traditional
approach has two significant shortcomings. First, it limits
the utility of the resulting system—in some environments,
the system incurs unnecessary costs, and, in others, it can-
not be deployed. The natural consequence is distinct sys-
tem implementations for each distinct fault model. Sec-
ond, all users of any given system implementation must
use the same fault model, either paying unnecessary costs
or accepting more risk than desired. For example, tempo-
rary and easily-recreated data incur the same overheads
as the most critical data.
We advocate an alternative approach, in which the de-
cision of which faults to tolerate is shifted from design
time to data-item creation time.
This shift is achieved
through the use of a
of access protocols that share
a common server implementation and client-server inter-
face (i.e., storage infrastructure). A
ports different fault models in the same way that most
access protocols support varied numbers of failures: by
simply changing the number of storage-nodes utilized,
and some read and write thresholds. A protocol family
enables a given storage infrastructure to be used for a mix
of fault models and allows the number of faults tolerated
to be chosen independently for each data-item.
We have developed a protocol family for survivable