Unformatted text preview: A Sticker Based Model for DNA Computation
Sam Roweis1 , Erik Winfree1,
Richard Burgoyne2 , Nickolas V. Chelyapov3 ,
Myron F. Goodman4, Paul W. K. Rothemund3 ,
Leonard M. Adleman3y
Laboratory for Molecular Science, University of Southern California
and
1
Computation and Neural Systems Option, California Institute of Technology
2
Department of Biomedical Engineering, University of Southern California
3
Department of Computer Science, University of Southern California
4
Department of Biological Sciences, University of Southern California May 1996 Abstract We introduce a new model of molecular computation that we call the sticker model. Like many
previous proposals it makes use of DNA strands as the physical substrate in which information is
represented and of separation by hybridization as a central mechanism. However, unlike previous
models, the stickers model has a random access memory that requires no strand extension, uses
no enzymes, and (at least in theory) its materials are reusable.
The paper describes computation under the stickers model and discusses possible means for
physically implementing each operation. We go on to propose a speci c machine architecture
for implementing the stickers model as a microprocessorcontrolled parallel robotic workstation.
Finally, we discuss several methods for achieving acceptable overall error rates for a computation
using basic operations that are error prone.
In the course of this development a number of previous general concerns about molecular
computation Smith, Hartmanis, Letters to Science] are addressed. First, it is clear that generalpurpose algorithms can be implemented by DNAbased computers, potentially solving a wide
class of search problems. Second, we nd that there are challenging problems, for which only
modest volumes of DNA should su ce. Third, we demonstrate that the formation and breaking
of covalent bonds is not intrinsic to DNAbased computation. This means that costly and shortlived materials such as enzymes are not necessary, nor are energetically costly processes such as
PCR. Fourth, we show that a single essential biotechnology, sequencespeci c separation, su ces
for constructing a generalpurpose molecular computer. Fifth, we illustrate that separation errors
can theoretically be reduced to tolerable levels by invoking a tradeo between time, space, and
error rates at the level of algorithm design we also outline several speci c ways in which this
can be done and present numerical calculations of their performance.
Despite these encouraging theoretical advances, we emphasize that substantial engineering
challenges remain at almost all stages and that the ultimate success or failure of DNA computing
will certainly depend on whether these challenges can be met in laboratory investigations.
Reprint requests to [email protected] The MATLAB code which was used to generate all of the gures in
Section 5 of this paper is also available by request from [email protected] Roweis is supported in part by the
Center for Neuromorphic Systems Engineering as a part of the National Science Foundation Engineering Research
Center Program under grant EEC9402726 and by the Natural Sciences and Engineering Research Council of Canada.
Winfree is supported in part by National Institute for Mental Health (NIMH) Training Grant # 5 T32 MH 1913806 also by General Motors' Technology Research Partnerships program. Adleman, Chelyapov, and Rothemund are
supported in part by grants from the National Science Foundation (CCR9403662) and Sloan Foundation.
y To whom correspondence should be addressed. 1 Introduction
Much of the recent interest in molecular computation has been fueled by the hope that it might
some day provide the means for constructing a massively parallel computational platform capable of
attacking problems which have been resistant to solution with conventional architectures. Model architectures have been proposed which suggest that DNA based computers may be exible enough to
tackle a wide range of problems Adleman1, Adleman2, Amos, Lipton, Boneh2, Beaver, Rothemund],
although fundamental issues such as the volumetric scale of materials and delity of various laboratory procedures remain largely unanswered.
In this paper we introduce a new model of molecular computation that we call the sticker model.
Like many previous proposals it makes use of DNA strands as the physical substrate in which
information is represented and of separation by hybridization as a central mechanism. However,
unlike previous models, the stickers model has a random access memory that requires no strand
extension, uses no enzymes, and (at least in theory) its materials are reusable.
The paper begins by introducing a new way of representing information in DNA, followed by an
abstract description of the basic operations possible under this representation. Possible means for
physically implementing each operation are discussed. We go on to propose a speci c machine
architecture for implementing the stickers model as a microprocessorcontrolled parallel robotic
workstation, employing only technologies which exist today. Finally, we discuss methods for achieving acceptable error rates from imperfect separation units. 2 The Stickers Model
2.1 Representation of Information
The stickers model employs two basic groups of single stranded DNA molecules in its representation
of a bit string. Consider a memory strand N bases in length subdivided into K nonoverlapping
regions each M bases long (thus N M K ). Each region is identi ed with exactly one bit position
(or equivalently one boolean variable) during the course of the computation. We also design K
di erent sticker strands or simply stickers. Each sticker is M bases long and is complementary to
one and only one of the K memory regions. If a sticker is annealed to its matching region on a given
memory strand then the bit corresponding that particular region is on for that strand. If no sticker
is annealed to a region then that region's bit is o . Figure 1 illustrates this representation scheme.
bit ... bit i bit i+1 bit i+2 bit ... (up to bit K) 5’ A T C GG T C A T A G CA C T
Memory
Strands 0 0 3’ 0 A GTA T
M bases
CG T CGTGA TAGCC
5’ T A T C GG T C A T A G CA C T 1 0 3’ A GC GTA T CG T GA GA C
A A GTA G
T A CC
T Stickers 1 Figure 1: A memory strand and associated stickers (together called a memory complex) represent a
bit string. The top complex on the left has all three bits o the bottom complex has two annealed
stickers and thus two bits on.
1 Each memory strand along with its annealed stickers (if any) represents one bit string. Such partial
duplexes are called memory complexes. A large set of bit strings is represented by a large number
of identical memory strands each of which has stickers annealed only at the required bit positions.
We call such a collection of memory complexes a tube. This di ers from previous representations
of information using DNA in which the presence or absence of a particular subsequence in a strand
corresponded to a particular bit being on or o (e.g. see Adleman1, Lipton]). In this new model,
each possible bit string is represented by a unique association of memory strands and stickers
whereas previously each bit string was represented by a unique molecule.
To give a feel for the numbers involved, a reasonable size problem (for example breaking DES as
discussed in Adleman3]), might use memory strands of roughly 12000 bases (N ) which represent
580 binary variables (K ) using 20 base regions (M ).
The information density in this storage scheme is (1=M ) bits/base, directly comparable to the density
of previous schemes Adleman1, Boneh2, Lipton]. We remark that while information storage in DNA
has a theoretical maximum value of 2 bits/base, exploiting such high values in a separation based
molecular computer would require the ability to reliably separate strands using only single base
mismatches. Instead we choose to sacri ce information density in order to make the experimental
di culties less severe. 2.2 Operations on Sets of Strings
We now introduce several possible operations on sets of bit strings which together turn out to be
quite exible for implementing general algorithms. The four principle operations are combination of
two sets of strings into one new set, separation of one set of strings into two new sets and setting or
clearing the kth bit of every string in a set. Each of these logical set operations has a corresponding
interpretation in terms of the DNA representation introduced above. Figure 2 summarizes these
required DNA interactions.
The most basic operation is to combine two sets of bit strings into one. This produces a
new set containing the multiset union of all the strings in the two input sets. In DNA,
this corresponds to producing a new tube containing all the memory complexes (with their
annealed stickers undisturbed) from both input tubes.
A set of strings may be separated into two new sets, one containing all the original strings
having a particular bit on and the other all those with the bit o . This corresponds to isolating
from the set's tube exactly those complexes with a sticker annealed to the given bit's region.
The original input set (tube) is destroyed.
To set (turn on) a particular bit in every string of a set, the sticker for that bit is annealed to
the appropriate region on every complex in the set's tube (or left in place if already annealed).
Finally, to clear (turn o ) a bit in every string of a set, the sticker for that bit must be removed
(if present) from every memory complex in the set's tube.
Computations in this model consist of a sequence of combination, separation, and bit setting/clearing
operations. This sequence must begin with some initial set of bit strings and must ultimately produce
one (possibly null) set of strings deemed to be \the answers". We call the tube containing the initial
set of bit strings the mother tube for a computation. Thus, to complete our theoretical description of
2 T A G C CA G T A T C G T G A Combine T A G C CA G T A T C G T G A A T C GG T C A T A G CA C T
AGTAT A T C GG T C A T A G CA C T
AGTAT A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T
TAGCC
A T C GG T C A T A G CA C T
AGTAT TAGCC
A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T AGTAT
A T C GG T C A T A G CA C T T A G C CA G T A T C G T G A Separate on Bit 1 A T C GG T C A T A G CA C T
AG TATCG TGA T A G C CA G T A T C G T G A
A T C GG T C A T A G CA C T
TAGCC A T C GG T C A T A G CA C T
TAGCC A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T
AGTAT
AG TATCG TGA A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T
AGTAT
A T C GG T C A T A G CA C T T A G C CA G T A T C G T G A Set Bit 3 T A G C CA G T A T C G T G A A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T
CGTGA A T C GG T C A T A G CA C T
TAGCC A T C GG T C A T A G CA C T
CGTGA
TAGCC A T C GG T C A T A G CA C T
AGTAT A T C GG T C A T A G CA C T
AG TATCGTGA A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T T A G C CA G T A T C G T G A Clear Bit 1 AG TATCG TGA A T C GG T C A T A G CA C T
AGTAT A T C GG T C A T A G CA C T
AGTAT A T C GG T C A T A G CA C T
TAGCC A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T
CGTGA
TAGCC A T C GG T C A T A G CA C T
CGTGA A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T Figure 2: DNA manipulations required for the four operations of the stickers model. 3 how to compute with the stickers model, we must describe how to create a mother tube of memory
complexes and also how to read out at least one bit string from a (possibly empty) nal tube of
answers (or recognize that the tube contains no strands). We consider creation of the mother tube
rst:
It will su ce for our purposes to consider creating a mother tube which corresponds to the
(K L) library set of strings. A (K L) library set contains strings of length K generated by
taking the set of all possible bit strings of length L followed by K ; L zeros. There are thus
2L length K strings in the set1 .
Our paradigm of computation will generally be to cast hard problems as large combinatorial searches
over inputs of length L. We search for the few rare \answer" strings by processing all 2L possible
inputs in parallel and eliminating those that fail the search criteria. It is important that the memory
strand we design may have more than L bit regions. The rst L bits represent the encoding of the
input and are the random portion of the initial library. The remaining K ; L bits are used for
intermediate storage and answer encoding and are initially o on all complexes. All bits can be
written to and read from later in the computation as needed. In this way creating a mother tube
which is a (K L) library set corresponds to generating all possible inputs (of length L) and zeroing
the workspace (length K ; L).
Lastly, we indicate how to obtain a solution at the end of the computation:
To read a string from the nal \answer" set, one memory complex must be isolated from the
answer tube and its annealed stickers (if any) determined. Alternately, it must be reported
that the answer tube contains no strands. 2.3 Example Problem
To illustrate the power of the operations de ned above we work through the solution of the NPComplete2 Minimal Set Cover problem Garey] within the stickers model. Informally, assume we
are given a collection of B bags each containing some objects. The objects come in A types. The
problem is to nd the smallest subset of the bags which between them contain at least one object
of every type. Formally the problem is as follows: Given a collectionSC = fC1 : : : CB g of subsets
of f1 : : : Ag what is the smallest subset I of f1 : : : B g such that i2I Ci = f1 : : : Ag ? The
solution of the problem in our model is straightforward. We create memory complexes representing
all possible 2B choices of bags. We mark all those which include bag i as containing every type
appearing in the subset Ci . Then we separate out those complexes which have been marked as
containing all A types and read out the one(s) which uses the fewest bags. Formally, the sticker
algorithm for minimal set cover is:
K =B+A Design a memory strand with
bit regions.
Initialize a
library set in a tube called 0 . (K B ) for i=1 to B
Separate T T0 Bits 1 : : : B represent which bags are
chosen, bits B + 1 : : : B + A which
object types are present.
Mark the nal A positions of each
complex to record which object
types it contains. into Ton and Toff based on bit i
C
N + Ci j ] in Ton
T
Toff into T0
1
For example, the (7,3) library set is the set f0000000,0010000,0100000,0110000,1000000,1010000,1100000,1110000g.
for j=1 to j i j
Set bit
Combine on and Technically the NPComplete version of this problem is the binary decision version in which we ask if there exists
a collection of a particular size that covers the set, not for the collection of the smallest size.
2 4 for i=B+1 to B+A
Separate 0 into
Discard bad T
T T0 for i=0 to B1
for j=i down to 0
Separate j into
Combine j +1 and T
T and Tbad T(j+1)
T(j+1) else if it was empty then
else if it was empty then 0 0 based on bit Tj based
Tj+1
Read T1
Read T2
Read T3
::: and
into Get rid of ones which do not have
all A types. i
on bit i+1 Count how many bags
were used. At the end
of the outer loop, tube
Ti contains all complexes which used exactly i bags. where above jCi j is the number of items in subset Ci and Ci j ] is the j th item in subset Ci . Note
that the above algorithm takes O(AB ) steps, and the input is O(AB ) bits.
We point out that, as we will envision a robotic system performing the experiments automatically,
we allow arbitrary sequential algorithms for controlling the molecular operations. However, these
operations must be performed \blind" the only interface to molecular parallelism is via initialize,
combine, separate, set, clear, and read. Thus the electronic algorithms are responsible for \experiment design" i.e. compiling higherlevel problem speci cations into concise sequences of molecular
operations but they cannot get any feedback from the DNA during the course of the experiment.
As a nal comment we note that the stickers model is capable of simulating (in parallel) independent universal machines, one per memory complex, under the usual theoretical assumption of an
unbounded number of sticker regions3 . It should be noted that the stickers model is universal, in
the sense discussed, even in the absence of the clear operation, although more compact algorithms
are possible using clear. 3 Physical Implementation of the Model
Each logical operation in our model has a corresponding interpretation (which we gave as we introduced the operations) in terms of what must happen to the DNA memory strands and associated
stickers when that operation is carried out. In what follows we examine various physical procedures
which are candidates for implementing these requirements for all the operations described above.
We speak in terms of tubes instead of sets recall that a tube consists of the collection of memory
complexes that represents a set of bit strings.
Often there are several possible implementations of a given operation each has its own assumed
strengths and weaknesses on which we speculate. However, which implementations, if any, turn out
to be viable will ultimately have to be decided by laboratory experiments. 3.1 Combination
Combination of two tubes can be performed by rehydrating the tube contents (if not already in
solution) and then combining the uids together (by pouring or pumping for example) to form a
This can be seen as the consequence of two observations. First, a memory complex in the stickers model can
simulate a feedforward circuit, in the spirit of Boneh2]. Using the clear operation, a clocked feedback circuit can
also be simulated. Second, allowing the circuit to grow with each clock cycle, we can simulate a universal machine.
The electronic algorithm is responsible for designing the new gates to t into the circuit each new gate will require
a new bit and hence a new sticker region in the memory strand. For concreteness, a feedforward circuit Ct can be
automatically designed which computes the instantaneous description of a TM at time step t from the description at
t ; 1. Thus, the stickers model can simulate in parallel the execution of a TM on all 2L length L inputs.
3 5 new tube. It should be noted that even this seemingly straightforward operation is plagued by
constraints: if DNA is not handled gently the shear forces from pouring and mixing it will fragment
it into 15 kilobase sections Kornberg].
Also of concern for this operation and indeed for all others is the amount of DNA which remains
stuck to the walls of tubes, pumps, pipette tips, etc. and thus is \lost" from the computation. Even
if this \lost" DNA is a minute fraction of the total (which would be unimportant to molecular
biologists) it is problematic for computation because we are working with relatively few copies of
each relevant molecule. 3.2 Separation
The ultimate goal of the separation operation is to physically isolate those complexes in a tube that
have a sticker annealed to some position from those that do not without disturbing any annealed
stickers. The mechanism of DNA hybridization will be central to any proposal. In general, separation by hybridization is is performed by bringing the solution containing the original set of memory
complexes into contact with many identical single stranded probes. In our case, each bit position
has a particular type of probe (with a unique nucleotide sequence) that is used when separation
on that bit is performed. The probe sequence is designed such that probes hybridize only to the
region of the memory strand corresponding to their bit and nowhere else. During separation, the
original complexes with the key bit o will be captured on the probes while all those with the bit
on will remain unbound in ...
View
Full Document
 Fall '08
 Staff
 DNA, tube, stickers, memory strand

Click to edit the document details