fp-stickers96 - A Sticker Based Model for DNA Computation...

Info icon This preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: A Sticker Based Model for DNA Computation Sam Roweis1 , Erik Winfree1, Richard Burgoyne2 , Nickolas V. Chelyapov3 , Myron F. Goodman4, Paul W. K. Rothemund3 , Leonard M. Adleman3y Laboratory for Molecular Science, University of Southern California and 1 Computation and Neural Systems Option, California Institute of Technology 2 Department of Biomedical Engineering, University of Southern California 3 Department of Computer Science, University of Southern California 4 Department of Biological Sciences, University of Southern California May 1996 Abstract We introduce a new model of molecular computation that we call the sticker model. Like many previous proposals it makes use of DNA strands as the physical substrate in which information is represented and of separation by hybridization as a central mechanism. However, unlike previous models, the stickers model has a random access memory that requires no strand extension, uses no enzymes, and (at least in theory) its materials are reusable. The paper describes computation under the stickers model and discusses possible means for physically implementing each operation. We go on to propose a speci c machine architecture for implementing the stickers model as a microprocessor-controlled parallel robotic workstation. Finally, we discuss several methods for achieving acceptable overall error rates for a computation using basic operations that are error prone. In the course of this development a number of previous general concerns about molecular computation Smith, Hartmanis, Letters to Science] are addressed. First, it is clear that generalpurpose algorithms can be implemented by DNA-based computers, potentially solving a wide class of search problems. Second, we nd that there are challenging problems, for which only modest volumes of DNA should su ce. Third, we demonstrate that the formation and breaking of covalent bonds is not intrinsic to DNA-based computation. This means that costly and shortlived materials such as enzymes are not necessary, nor are energetically costly processes such as PCR. Fourth, we show that a single essential biotechnology, sequence-speci c separation, su ces for constructing a general-purpose molecular computer. Fifth, we illustrate that separation errors can theoretically be reduced to tolerable levels by invoking a trade-o between time, space, and error rates at the level of algorithm design we also outline several speci c ways in which this can be done and present numerical calculations of their performance. Despite these encouraging theoretical advances, we emphasize that substantial engineering challenges remain at almost all stages and that the ultimate success or failure of DNA computing will certainly depend on whether these challenges can be met in laboratory investigations. Reprint requests to [email protected] The MATLAB code which was used to generate all of the gures in Section 5 of this paper is also available by request from [email protected] Roweis is supported in part by the Center for Neuromorphic Systems Engineering as a part of the National Science Foundation Engineering Research Center Program under grant EEC-9402726 and by the Natural Sciences and Engineering Research Council of Canada. Winfree is supported in part by National Institute for Mental Health (NIMH) Training Grant # 5 T32 MH 1913806 also by General Motors' Technology Research Partnerships program. Adleman, Chelyapov, and Rothemund are supported in part by grants from the National Science Foundation (CCR-9403662) and Sloan Foundation. y To whom correspondence should be addressed. 1 Introduction Much of the recent interest in molecular computation has been fueled by the hope that it might some day provide the means for constructing a massively parallel computational platform capable of attacking problems which have been resistant to solution with conventional architectures. Model architectures have been proposed which suggest that DNA based computers may be exible enough to tackle a wide range of problems Adleman1, Adleman2, Amos, Lipton, Boneh2, Beaver, Rothemund], although fundamental issues such as the volumetric scale of materials and delity of various laboratory procedures remain largely unanswered. In this paper we introduce a new model of molecular computation that we call the sticker model. Like many previous proposals it makes use of DNA strands as the physical substrate in which information is represented and of separation by hybridization as a central mechanism. However, unlike previous models, the stickers model has a random access memory that requires no strand extension, uses no enzymes, and (at least in theory) its materials are reusable. The paper begins by introducing a new way of representing information in DNA, followed by an abstract description of the basic operations possible under this representation. Possible means for physically implementing each operation are discussed. We go on to propose a speci c machine architecture for implementing the stickers model as a microprocessor-controlled parallel robotic workstation, employing only technologies which exist today. Finally, we discuss methods for achieving acceptable error rates from imperfect separation units. 2 The Stickers Model 2.1 Representation of Information The stickers model employs two basic groups of single stranded DNA molecules in its representation of a bit string. Consider a memory strand N bases in length subdivided into K non-overlapping regions each M bases long (thus N M K ). Each region is identi ed with exactly one bit position (or equivalently one boolean variable) during the course of the computation. We also design K di erent sticker strands or simply stickers. Each sticker is M bases long and is complementary to one and only one of the K memory regions. If a sticker is annealed to its matching region on a given memory strand then the bit corresponding that particular region is on for that strand. If no sticker is annealed to a region then that region's bit is o . Figure 1 illustrates this representation scheme. bit ... bit i bit i+1 bit i+2 bit ... (up to bit K) 5’ A T C GG T C A T A G CA C T Memory Strands 0 0 3’ 0 A GTA T M bases CG T CGTGA TAGCC 5’ T A T C GG T C A T A G CA C T 1 0 3’ A GC GTA T CG T GA GA C A A GTA G T A CC T Stickers 1 Figure 1: A memory strand and associated stickers (together called a memory complex) represent a bit string. The top complex on the left has all three bits o the bottom complex has two annealed stickers and thus two bits on. 1 Each memory strand along with its annealed stickers (if any) represents one bit string. Such partial duplexes are called memory complexes. A large set of bit strings is represented by a large number of identical memory strands each of which has stickers annealed only at the required bit positions. We call such a collection of memory complexes a tube. This di ers from previous representations of information using DNA in which the presence or absence of a particular subsequence in a strand corresponded to a particular bit being on or o (e.g. see Adleman1, Lipton]). In this new model, each possible bit string is represented by a unique association of memory strands and stickers whereas previously each bit string was represented by a unique molecule. To give a feel for the numbers involved, a reasonable size problem (for example breaking DES as discussed in Adleman3]), might use memory strands of roughly 12000 bases (N ) which represent 580 binary variables (K ) using 20 base regions (M ). The information density in this storage scheme is (1=M ) bits/base, directly comparable to the density of previous schemes Adleman1, Boneh2, Lipton]. We remark that while information storage in DNA has a theoretical maximum value of 2 bits/base, exploiting such high values in a separation based molecular computer would require the ability to reliably separate strands using only single base mismatches. Instead we choose to sacri ce information density in order to make the experimental di culties less severe. 2.2 Operations on Sets of Strings We now introduce several possible operations on sets of bit strings which together turn out to be quite exible for implementing general algorithms. The four principle operations are combination of two sets of strings into one new set, separation of one set of strings into two new sets and setting or clearing the kth bit of every string in a set. Each of these logical set operations has a corresponding interpretation in terms of the DNA representation introduced above. Figure 2 summarizes these required DNA interactions. The most basic operation is to combine two sets of bit strings into one. This produces a new set containing the multi-set union of all the strings in the two input sets. In DNA, this corresponds to producing a new tube containing all the memory complexes (with their annealed stickers undisturbed) from both input tubes. A set of strings may be separated into two new sets, one containing all the original strings having a particular bit on and the other all those with the bit o . This corresponds to isolating from the set's tube exactly those complexes with a sticker annealed to the given bit's region. The original input set (tube) is destroyed. To set (turn on) a particular bit in every string of a set, the sticker for that bit is annealed to the appropriate region on every complex in the set's tube (or left in place if already annealed). Finally, to clear (turn o ) a bit in every string of a set, the sticker for that bit must be removed (if present) from every memory complex in the set's tube. Computations in this model consist of a sequence of combination, separation, and bit setting/clearing operations. This sequence must begin with some initial set of bit strings and must ultimately produce one (possibly null) set of strings deemed to be \the answers". We call the tube containing the initial set of bit strings the mother tube for a computation. Thus, to complete our theoretical description of 2 T A G C CA G T A T C G T G A Combine T A G C CA G T A T C G T G A A T C GG T C A T A G CA C T AGTAT A T C GG T C A T A G CA C T AGTAT A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T TAGCC A T C GG T C A T A G CA C T AGTAT TAGCC A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T AGTAT A T C GG T C A T A G CA C T T A G C CA G T A T C G T G A Separate on Bit 1 A T C GG T C A T A G CA C T AG TATCG TGA T A G C CA G T A T C G T G A A T C GG T C A T A G CA C T TAGCC A T C GG T C A T A G CA C T TAGCC A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T AGTAT AG TATCG TGA A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T AGTAT A T C GG T C A T A G CA C T T A G C CA G T A T C G T G A Set Bit 3 T A G C CA G T A T C G T G A A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T CGTGA A T C GG T C A T A G CA C T TAGCC A T C GG T C A T A G CA C T CGTGA TAGCC A T C GG T C A T A G CA C T AGTAT A T C GG T C A T A G CA C T AG TATCGTGA A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T T A G C CA G T A T C G T G A Clear Bit 1 AG TATCG TGA A T C GG T C A T A G CA C T AGTAT A T C GG T C A T A G CA C T AGTAT A T C GG T C A T A G CA C T TAGCC A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T CGTGA TAGCC A T C GG T C A T A G CA C T CGTGA A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T Figure 2: DNA manipulations required for the four operations of the stickers model. 3 how to compute with the stickers model, we must describe how to create a mother tube of memory complexes and also how to read out at least one bit string from a (possibly empty) nal tube of answers (or recognize that the tube contains no strands). We consider creation of the mother tube rst: It will su ce for our purposes to consider creating a mother tube which corresponds to the (K L) library set of strings. A (K L) library set contains strings of length K generated by taking the set of all possible bit strings of length L followed by K ; L zeros. There are thus 2L length K strings in the set1 . Our paradigm of computation will generally be to cast hard problems as large combinatorial searches over inputs of length L. We search for the few rare \answer" strings by processing all 2L possible inputs in parallel and eliminating those that fail the search criteria. It is important that the memory strand we design may have more than L bit regions. The rst L bits represent the encoding of the input and are the random portion of the initial library. The remaining K ; L bits are used for intermediate storage and answer encoding and are initially o on all complexes. All bits can be written to and read from later in the computation as needed. In this way creating a mother tube which is a (K L) library set corresponds to generating all possible inputs (of length L) and zeroing the workspace (length K ; L). Lastly, we indicate how to obtain a solution at the end of the computation: To read a string from the nal \answer" set, one memory complex must be isolated from the answer tube and its annealed stickers (if any) determined. Alternately, it must be reported that the answer tube contains no strands. 2.3 Example Problem To illustrate the power of the operations de ned above we work through the solution of the NPComplete2 Minimal Set Cover problem Garey] within the stickers model. Informally, assume we are given a collection of B bags each containing some objects. The objects come in A types. The problem is to nd the smallest subset of the bags which between them contain at least one object of every type. Formally the problem is as follows: Given a collectionSC = fC1 : : : CB g of subsets of f1 : : : Ag what is the smallest subset I of f1 : : : B g such that i2I Ci = f1 : : : Ag ? The solution of the problem in our model is straightforward. We create memory complexes representing all possible 2B choices of bags. We mark all those which include bag i as containing every type appearing in the subset Ci . Then we separate out those complexes which have been marked as containing all A types and read out the one(s) which uses the fewest bags. Formally, the sticker algorithm for minimal set cover is: K =B+A Design a memory strand with bit regions. Initialize a library set in a tube called 0 . (K B ) for i=1 to B Separate T T0 Bits 1 : : : B represent which bags are chosen, bits B + 1 : : : B + A which object types are present. Mark the nal A positions of each complex to record which object types it contains. into Ton and Toff based on bit i C N + Ci j ] in Ton T Toff into T0 1 For example, the (7,3) library set is the set f0000000,0010000,0100000,0110000,1000000,1010000,1100000,1110000g. for j=1 to j i j Set bit Combine on and Technically the NP-Complete version of this problem is the binary decision version in which we ask if there exists a collection of a particular size that covers the set, not for the collection of the smallest size. 2 4 for i=B+1 to B+A Separate 0 into Discard bad T T T0 for i=0 to B-1 for j=i down to 0 Separate j into Combine j +1 and T T and Tbad T(j+1) T(j+1) else if it was empty then else if it was empty then 0 0 based on bit Tj based Tj+1 Read T1 Read T2 Read T3 ::: and into Get rid of ones which do not have all A types. i on bit i+1 Count how many bags were used. At the end of the outer loop, tube Ti contains all complexes which used exactly i bags. where above jCi j is the number of items in subset Ci and Ci j ] is the j th item in subset Ci . Note that the above algorithm takes O(AB ) steps, and the input is O(AB ) bits. We point out that, as we will envision a robotic system performing the experiments automatically, we allow arbitrary sequential algorithms for controlling the molecular operations. However, these operations must be performed \blind" the only interface to molecular parallelism is via initialize, combine, separate, set, clear, and read. Thus the electronic algorithms are responsible for \experiment design" i.e. compiling higher-level problem speci cations into concise sequences of molecular operations but they cannot get any feedback from the DNA during the course of the experiment. As a nal comment we note that the stickers model is capable of simulating (in parallel) independent universal machines, one per memory complex, under the usual theoretical assumption of an unbounded number of sticker regions3 . It should be noted that the stickers model is universal, in the sense discussed, even in the absence of the clear operation, although more compact algorithms are possible using clear. 3 Physical Implementation of the Model Each logical operation in our model has a corresponding interpretation (which we gave as we introduced the operations) in terms of what must happen to the DNA memory strands and associated stickers when that operation is carried out. In what follows we examine various physical procedures which are candidates for implementing these requirements for all the operations described above. We speak in terms of tubes instead of sets recall that a tube consists of the collection of memory complexes that represents a set of bit strings. Often there are several possible implementations of a given operation each has its own assumed strengths and weaknesses on which we speculate. However, which implementations, if any, turn out to be viable will ultimately have to be decided by laboratory experiments. 3.1 Combination Combination of two tubes can be performed by rehydrating the tube contents (if not already in solution) and then combining the uids together (by pouring or pumping for example) to form a This can be seen as the consequence of two observations. First, a memory complex in the stickers model can simulate a feedforward circuit, in the spirit of Boneh2]. Using the clear operation, a clocked feedback circuit can also be simulated. Second, allowing the circuit to grow with each clock cycle, we can simulate a universal machine. The electronic algorithm is responsible for designing the new gates to t into the circuit each new gate will require a new bit and hence a new sticker region in the memory strand. For concreteness, a feedforward circuit Ct can be automatically designed which computes the instantaneous description of a TM at time step t from the description at t ; 1. Thus, the stickers model can simulate in parallel the execution of a TM on all 2L length L inputs. 3 5 new tube. It should be noted that even this seemingly straightforward operation is plagued by constraints: if DNA is not handled gently the shear forces from pouring and mixing it will fragment it into 15 kilobase sections Kornberg]. Also of concern for this operation and indeed for all others is the amount of DNA which remains stuck to the walls of tubes, pumps, pipette tips, etc. and thus is \lost" from the computation. Even if this \lost" DNA is a minute fraction of the total (which would be unimportant to molecular biologists) it is problematic for computation because we are working with relatively few copies of each relevant molecule. 3.2 Separation The ultimate goal of the separation operation is to physically isolate those complexes in a tube that have a sticker annealed to some position from those that do not without disturbing any annealed stickers. The mechanism of DNA hybridization will be central to any proposal. In general, separation by hybridization is is performed by bringing the solution containing the original set of memory complexes into contact with many identical single stranded probes. In our case, each bit position has a particular type of probe (with a unique nucleotide sequence) that is used when separation on that bit is performed. The probe sequence is designed such that probes hybridize only to the region of the memory strand corresponding to their bit and nowhere else. During separation, the original complexes with the key bit o will be captured on the probes while all those with the bit on will remain unbound in ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern