11 Pages

Clustering

Course: TOMOS 8008, Fall 2008
School: Maryland
Rating:
 
 
 
 
 

Word Count: 6190

Document Preview

clustering http://genomebiology.com/2001/2/8/research/0027.1 Research A method for repeat analysis in DNA sequences Natalia Volfovsky, Brian J Haas and Steven L Salzberg Address: The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA. Correspondence: Natalia Volfovsky. E-mail: natalia@tigr.org comment Published: 1 August 2001 Genome Biology 2001, 2(8):research0027.10027.11 The...

Register Now

Unformatted Document Excerpt

Coursehero >> Maryland >> Maryland >> TOMOS 8008

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
clustering http://genomebiology.com/2001/2/8/research/0027.1 Research A method for repeat analysis in DNA sequences Natalia Volfovsky, Brian J Haas and Steven L Salzberg Address: The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA. Correspondence: Natalia Volfovsky. E-mail: natalia@tigr.org comment Published: 1 August 2001 Genome Biology 2001, 2(8):research0027.10027.11 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2001/2/8/research/0027 2001 Volfovsky et al., licensee BioMed Central Ltd (Print ISSN 1465-6906; Online ISSN 1465-6914) Received: 24 January 2001 Revised: 29 March 2001 Accepted: 8 June 2001 reviews reports Abstract Background: A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats. Results: The resulting software tool collects all repeat classes and outputs summary statistics as well as a file containing multiple sequences (multi fasta), that can be used as the target of searches. Its use is demonstrated here on several complete microbial genomes, the entire Arabidopsis thaliana genome, and a large collection of rice bacterial artificial chromosome end sequences. Conclusions: We propose a new clustering method for analysis of the repeat data captured in suffix trees. This method has been incorporated into a system that can find repeats in individual genome sequences or sets of sequences, and that can organize those repeats into classes. It quickly and accurately creates repeat databases from small and large genomes. The associated software (RepeatFinder), should prove helpful in the analysis of repeat structure for both complete and partial genome sequences. deposited research refereed research Background Repetitive sequences present many difficulties for genome sequencing and analysis. The presence of large numbers of repeats often confounds sequence assembly, especially if the repeats are long and highly conserved. The presence of low copy-number repeats can also confound assembly, especially for whole-genome shotgun sequencing projects [1]. Once a genome has been assembled, repeats take on a new and more important role involving their biological function. Certain classes of repeats, such as transposons, perform a function by allowing mobile elements to move around a genome. Other classes belong to less well-defined categories with respect to their role, though they may be even more ubiquitous. Repetitive sequences appear to dominate the centromeres of many eukaryotes [2], and telomeric and subtelomeric repeats extend for thousands or tens of thousands of nucleotides at the ends of chromosomes. These repeats also appear elsewhere in the genome, for reasons as yet unknown. .or these and other reasons, it is critical to both the assembly and analysis of genomic sequences to identify and characterize repetitive sequence elements. There are numerous computational methods for detecting repeats, in one form or another, in genomic DNA sequences. These include algorithms that locate repeated substrings, including tandem repeats [3-6], as well as programs for identifying known repeats, such as the widelyused RepeatMasker [7]. RepeatMasker uses a database of known repeat sequences and implements a string-matching algorithm to find copies of those repeats in a new sequence. A more rapid implementation of the same approach is MaskerAid [8], a wrapper for WU-BLAST [9,10] that uses interactions information 2 Genome Biology Vol 2 No 8 Volfvovsky et al. the BLAST engine instead of the CrossMatch algorithm. Most of these tools have some restriction on the maximum length of the input sequence, which limits their use to sequences considerably smaller than the size of a eukaryotic chromosome. Recently, however, new systems based on suffix trees, such as RepeatMatch (based on MUMmer [11]) and REPuter [12,13], have overcome this size limitation, at least for biologically realistic input sizes. Both RepeatMatch and REPuter are highly efficient computational tools that can find all exact repeats in sequences as long as complete eukaryotic chromosomes - 10-100 megabases (Mb). The output of these systems, however, while accurately representing the long list of positions of exact repeats, does not provide any overview or summary of the repetitive structure of the sequence. The REPuter system includes a visualization tool to generate repeat graphs, which are useful for identifying the positions of repeats, but this does not provide an overview of the exact and non-exact repeats in a genome. .igure 1 shows an example of a repeat graph [12] for a short DNA sequence. Examining the output of REPuter and RepeatMatch for a complete bacterial genome, it quickly becomes obvious that many exact repeats are non-exact copies of one another. Whether a genome is a few or hundreds of megabases in length, the task of recognizing and describing how repeats resemble one another at this scale is too complicated to accomplish manually. Here we describe a new system for the recognition of repeat classes in genome sequences. This system, Repeat.inder, is freely available from our website [14]. In contrast to approaches that cluster together the results of BLAST searches (for example, Z.H. Bao and S. Eddy, unpublished data) our algorithm uses a comprehensive set of exact repeats as the basis for constructing repeat classes. It relies on the efficient suffix tree data structure for identification of exact repeats, which permits rapid identification of repeat classes even in sequences containing tens of millions of nucleotides. The algorithm does not make any prior assumptions about the number or structure of the classes. At its core is a merging procedure that produces the actual members of each repeat class using merging criteria described below, and it also builds a repeat map of the genome sequence. We have applied this system to several complete microbial genomes [15-21], to the complete Arabidopsis thaliana genome [2], and to a large collection of rice bacterial artificial chromosome (BAC) end sequences [22,23]. The results of this analysis are described below. The output of the system gives a clear picture of all repeat classes identified in a genome or a sequence collection. It provides straightforward access to the actual repeat sequences as a multi-fasta file, simple statistical analyses of the results, and a procedure for identifying each classs most representative element. We describe here the computational techniques used in the system and demonstrate its use on several different genome sequences. Results and discussion We begin by defining an exact repeat as a subsequence that occurs in DNA sequence at least twice. A maximal repeat (.igure 2a) is a repeat that cannot be extended in either direction without incurring a mismatch. Repeats may have a direction with respect to the underlying sequence (forward, reverse) and with respect to each other (reverse complement). By allowing a set of editing operations - deletions, insertion and mismatches - we extend the definition of an exact repeat to an approximate repeat [13]. The set of repeats chosen initially, from which the repeat classes will be constructed, is called the initial repeat set. In the initial repeat set, different repeats may be very close together (.igure 2b) and may even overlap (.igure 2c). This intricate picture can be simplified by constructing a more general type of repeat: a merging repeat will be defined as a sequence that can be found in the whole genome sequence not less than twice, where occurrences of the merging repeat are permitted to be partial copies. Merging repeats, labeled M1-M4 in .igure 2, are created from initial repeat sequences that are close together or overlapping. Merging repeats maintain pointers to the initial repeats comprising them; for example repeat M1 (.igure 2b) has pointers to initial repeats A1 and B1. We shall also refer to these initial repeats as subrepeats of M1. Using these properties, we can formulate a similarity condition between merging repeats. Two merging repeats M1 and M2 are similar if they have at least one common initial sequence, or there exists a sequence of merging repeats M1, N1, N2, Nk, M2 such that each pair of merging repeats in the sequence shares at least one common subrepeat. The minimum number k of merging repeats needed to establish the similarity between M1 and M2 can be used as a similarity measure. .or example, the merging repeats M2 and M4 in .igure 2b are similar with similarity measure k = 2 based on the sequence M2, M3, M1, M4. One goal of our clustering algorithm is to distribute merging repeats into classes according to this similarity condition so that two rules are satisfied: first, elements in the same class (homogeneity elements) are highly similar to each other; and second, elements from different classes (separation elements) have no similarity to each other. The maximal similarity k defined on all merging repeats in a class can be used for assessing the overall similarity of the class members. (In this paper the measure of similarity k is used only for the definition of merging repeats). In this study we use exact forward and reverse complement repeats as initial repeats for clustering. The method does not http://genomebiology.com/2001/2/8/research/0027.3 >example.seq GGCGGTCATTGCGGTTCTTTTGACTGTTGGATCGAACCTGAAGACCGTCTTCAGCTATGT CGGTAGCAATCTCACGACGTAATTGGTTGCCCGGCCAGGCATAGGCGCGTTTACTACGTA AAAACTCTTTTACAGATTAACAGTGAAACGACCAATCTTTGTTGTCATCGCACTCACCTG comment 1 180 (bp) reviews Figure 1 Exact repeats. An example sequence of 180 bp and graphical depiction of exact repeats, using minimal repeat length 6 bp. This example shows forward and reverse complemented repeats. In the repeat graph [12], both of the horizontal lines correspond to the example sequence, and diagonal lines connect the two occurrences of each repeat. REPuter includes a visualization tool to provide similar graphics. reports (a) deposited research A1 A2 GAAAGCTACATGCTATATGTATTGTACCCCTGCTGACCCCGTTACTCTGTAGCTACATGCTATATGTATTGTAT (b) A1 Gap C1 C2 Gap A2 refereed research B1 M1 (c) B2 M2 M3 M4 A1 Overlap interactions A2 B1 M1 M2 B2 M3 information Figure 2 Definition of repeats. (a) Exact repeats, labeled as A1 and A2. (b,c) Merging repeats. (b) Merging with gaps; (c) merging with overlap. The nucleotide sequence is shown as a purple bar. Top, red, blue and yellow lines show the locations of the repeat sequences. Pairs A1 and A2, B1 and B2, and C1 and C2 are initial repeat pairs. Bottom, green bars labeled M1, M2, M3 and M4 indicate the location of merging repeats. 4 Genome Biology Vol 2 No 8 Volfvovsky et al. require the use of exact initial repeats, but can be applied easily to an initial set containing approximate repeats [3,13]. Algorithm description Our algorithm is based on first identifying all exact repeats in the input sequence, and then defining repeat classes by merging and extending these short exact matches. An exact repeat is represented by pair of coordinates (A1,A2) delimiting its location in the genome sequence, and by the repeat length l. We implemented an algorithm that uses either of two suffix tree methods, RepeatMatch [11] or REPuter [12] to determine all the exact repeats in a given sequence. (.or more on suffix trees see [24].) The computational time and space requirements for both these systems are linear in the size of the input sequences, an essential requirement for any algorithm attempting to process whole eukaryotic genomes. The subsequent clustering procedure merges neighboring repeats and groups them into classes. The input to the system can be either a single genome sequence or a set of sequences. The clustering procedure consists of the following steps, which are described in more detail below. Step 1: Selection and pre-processing. The list of coordinates of all exact repeats as output by RepeatMatch or REPuter can be interpreted as a partition of the original genome sequence. (The output of RepeatMatch and REPuter are very similar. We used REPuter in the example and in the subsequent repeat analyses of microbial genomes; for the A. thaliana genome and and the rice BAC end-sequence data we used RepeatMatch.) Each partition point has a reference to the pair coordinates (A1,A2) and the repeat length l. Each repeat corresponds to at least two partition points. Some repeats can be found in the sequence more than twice, and the corresponding partition points can appear with different coordinates and different lengths. To prepare the data for the merging procedure, we sort the list of partition points in increasing order, and in the case of duplicate first coordinates, in increasing order of second coordinates. (The clustering algorithm is orderindependent; however, the linear nature of repeat data allows us to use this pre-processing step to simplify the clustering procedure without affecting the final clusters.) In particular cases it is useful to filter the original repeat data to remove certain types of repeats; for example, simple one-base (homopolymeric) or two-base repeats. Step 2: Merging procedure. In outline, this procedure works by repeatedly merging together two exact repeats that either overlap or that occur within a limited distance (a gap) of each other. Specific values for the overlap and gap distance can be specified for each genome sequence. Whether the algorithm is merging repeats that overlap (.igure 2c), or merging repeats separated by a gap (.igure 2b), the new merging repeats will always have the property that significant subsequences of the repeat appear at least twice in the genome sequence. At the time of merging procedure, we generate a repeat map of the genome sequence. This map is based on a linked-list data structure, which allows for rapid and simple modifications to the dynamically changing repeat data. Every merging repeat in the map is linked by pointers to all the merging repeats with which it shares exact repeats. Step 3: Classification. This step defines the repeat classes. Each merging repeat will be assigned to a specific class if its list of references (that is, the repeats that were combined into the merging repeat) contains at least one repeat that already belongs to the class. If a merging repeat has references that belong to multiple distinct classes, then those classes are combined into one. If a merging repeat contains no references to an existing class, then the merging repeat forms a new class. Step 4: BLAST searches and repeat class updates. The initial classification is based on exact repeats. To merge together similar but non-exact repeats, we use WU-BLAST [9,10] to search all merging repeats against all others. The resulting matches between the classes are used as input to an update procedure which redistributes all merging repeats into new classes. It is possible to skip this step if the initial repeat set contains approximate rather than exact repeats. Step 1: Pre-processing In this step, the output from REPuter or RepeatMatch is used to partition the original genome sequence. .or each repeat starting at coordinates A1 and A2, with length l, this list will include both (A1,A2,l) and (A2,A1,l). The list is then sorted by first and by second coordinates. To illustrate the method, we use the example shown in .igure 1. The table on the left in .igure 3 shows all seven pairs of repeats, while the right table shows the corresponding sorted partition points. Step 2: Merging and repeat map generation Using the list of partition points, we begin merging exact repeats using the following criteria. Given two partition points p1 = (A1, A2, lA) and p2 = (B1, B2, lB) , where A1 < B1, we compute the distance between the non-overlapping repeats as d(p1, p2) = max (0, B1 - A1 - lA + 1). Next, given a maximum gap size G > 0, the merging with gap protocol uses the rule that sequences corresponding to p1 and p2 are merged if d(p1, p2) < G. The merging with overlap protocol only merges sequences that overlap one another; that is they are at least partially identical. We denote the overlap of two sequences as o(p1,p2) = max (0, A1 + lA B1 + 1) for A1 < B1 http://genomebiology.com/2001/2/8/research/0027.5 Repeats type . 4+ 4+ . . 4+ 4+ Partition points A1 A 2 lA $ ! !& "% $% %% & $ & $ & !' # #! $ !' "% !& #! $ # %% & $ & ! & $% $ $ & & $ $ $ $ $ $ $ $ $ $ A 1 A 2 lA $ ! !& $% %% & & $ !' "% #! $ # & $ $ & $ $ $ $ These references define the correspondences between all merging repeats. Each merging repeat maintains references to the other merging repeats with which it shares exact repeats; each exact repeat is assigned to the first merging repeat in which it appears. In our example, the merging repeat starting in coordinate 77 gets a reference to itself only, because its exact repeats have no previous references. The next repeat, starting in position 116, gets a reference to itself and to its mate the merging repeat 77. A data structure stores with each merging repeat its start coordinate, its length (lM), the number of exact repeats it includes (nM), and a list of references to itself and to other repeats (R1, R2, R3). comment reviews Step 3: Classification Given the repeat map, we can begin to define classes by noting that if a merging repeat has at least one reference in common with another, then they belong to the same class. .igure 5 illustrates one step in this procedure. The merging repeat (M,lM) = (126,8) has two common references in two different classes, class 1 and class 5. These classes are then combined together into a new class 1, which contains all references from both the original classes. Figure 3 Pre-processing procedure. The table on the left shows repeat pairs that were found by REPuter in the 180 bp example sequence shown in Figure 1 using a minimum repeat length of 6 bp. Repeats are represented by type: forward (F) or reverse complement (RC), first coordinates (A1, A2) and length (lA). Reverse complement repeats with A1= A2 are omitted. The table on the right contains a list of partition points. Arrows show the correspondence between the repeat (67,153,6) and the two partition points (67,153,6) and (153,67,6). Then the criterion for merging with overlap is as follows: given a minimum overlap proportion op, where 0 op 1, repeat points (A1, A2, lA) and (B1, B2, lB) are merged if at least one of the four repeats has overlap satisfying o (p1,p2) > op min (lA, lB). The parameter op is interpreted as a fraction of the shorter of the two repeats. Thus for op = 0.75, we will merge two overlapped sequences if the length of their overlap is at least 75% of the length of the shorter sequence. Using either merging procedure, if two sequences are merged then the new sequence will be defined as a merging repeat with starting position M = A1 and with length lM = max (A1 + lA, B1 + lB) - A1. The merging procedure is not permitted to merge pairs of partition points of the form (B1, B2, lB) and (B2, B1, lB). This condition avoids merging of tandem repeats and avoids repetitiveness within the merging repeats. On the left side of .igure 4 we illustrate the merging procedure using a merging with G = 1. Dark gray rectangles mark the start coordinates of merging repeats. The extent of each merging repeat is shown by dividing sets of repeats using horizontal lines. This procedure, by updating and creating new references, leads to the repeat map shown on the right of .igure 4. reports Step 4: BLAST searches and further merging .or this step, the most time-consuming part of the algorithm, we use the underlying sequences of the merging repeats, and run a BLAST search of all sequences against all others. Classes are merged if any of their underlying sequences have a BLAST E-value less then a user-specified deposited research Merging procedure A1 A2 lA $ ! !& "% $% %% & $ & $ & !' # #! $ !' "% !& #! $ # %% & $ & ! & $% $ $ & & $ $ $ $ $ $ $ $ $ $ Repeat map M l M nM R1 $ ! !& "% $% %% $ $ !' # $ $ & & $ & & $ & $ ! !& !& $% %% %% $ ! $% refereed research R2 R3 "% $ $ !' %% interactions $ # Figure 4 Merging procedure. The start coordinates of merging repeats are shown in dark gray. Horizontal lines divide sets of exact repeats that were merged into a single merging repeat. The arrow shows the connection between a group of two short exact repeats and the corresponding 11 bp merging repeat starting at position 77. information 6 Genome Biology Vol 2 No 8 Volfvovsky et al. Repeat map M $ ! !& "% $% %% $ $ !' # Classification l M nM R1 R2 R3 $ $ & & $ & & $ & $ ! !& !& $% %% %% $ ! $% "% $ $ !' %% $ # Class 1 Class 2 Class 3 Class 4 Class 5 $ ! !& $% $ Class %% "% 1 $ %% $ $ Class 2 ! Class 3 !& "% Class 4 $% Final classification Class 1 $ $% %% $ $ # Class 2 ! !' Class 3 !& "% Figure 5 One step of the classification procedure. The gray rectangle in the repeat map table shows the merging repeat with its length, the number of exact repeats it includes, and references to the repeats it contains. The highlighted repeat will be added to existing classes. It contains references to class 1 (16) and class 5 (116), marked by gray in the first classification table. On the next classification step, these two classes are merged and the rest of their references are added to the new class 1. Arrows show the directions of class merging. threshold when compared to any sequence in another class. If a class appears in multiple similarity pairs, all these similar classes are merged with the original class. .or the example in .igure 4, BLAST searches do not reveal any new similarity pairs; thus the classification from the figure is identical to the final classification (Table 1). genome is; in addition, the analysis extracts the repeats themselves for further analysis. Defining the prototype for a repeat class Small microbial genomes have relatively few types of repeats, and relatively few copies of each type. In contrast, our studies of longer eukaryotic genome sequences have uncovered tens of thousands of repeat classes and hundreds of thousands of merging repeats. In order to be able to process this data efficiently - in particular, in order to run the procedure where all classes are compared against each other using BLAST - we Table 1 Final classification Class 1 1 1 1 1 1 2 2 3 3 Coordinate 16 67 77 116 126 151 23 139 38 47 Length 6 6 11 8 8 8 6 6 8 8 Copies 1 1 2 2 2 2 1 1 1 1 Sequence TCTTTT CAATCT ACGTAATTGGT ACGTAAAA TCTTTTAC ACCAATCT ACTGTT AACAGT CTGAAGAC GTCTTCAG Repeat analysis of microbial genomes We used our repeat clustering algorithm to analyze several complete microbial genomes. Table 2 summarizes the repeat analysis for the Neisseria meningitidis genome [20] using two different clustering criteria. It illustrates how increasing the exact repeat size in the initial step leads to fewer merging repeats and fewer classes. It also shows how reducing the size of the gap and increasing the required overlap increases the number of repeat classes, as would be expected. .or a more comprehensive repeat analysis, we chose seven different microbial genomes, using 25 base pairs (bp) as the minimal exact repeat length and allowing less than a 25 bp gap for the merging procedure. Table 3 shows the results for these genomes. It presents the number of merging repeats, the number of repeat classes, the longest single merging repeat, and the number of classes containing more than two members. As shown here, these latter classes comprise only 10-25% of all repeat classes, indicating that most repeat types are simple duplications. Among these duplication, the vast majority occur in tandem, although this is not shown in Table 3. The picture given here shows how repeat analysis can quickly provide an overall picture of how repetitive a http://genomebiology.com/2001/2/8/research/0027.7 Table 2 Sensitivity of the clustering method to different merging parameters comment Minimal exact repeat 25 bp G (bp) 50 25 5 50 95 100 op (%) Number of merging repeats 1031 1155 1394 2328 3748 4564 Number of classes 122 162 218 357 510 550 Minimal exact repeat 50 bp Number of merging repeats 655 741 843 1305 1892 2322 Number of classes 63 77 92 165 234 242 reviews The length of the Neisseria meningitidis genome is 2,272,351 bp [20]. Table 3 reports Repeat structure of microbial genomes Genome Reference Length (bp) Number of merging repeats The longest merging repeat (bp) Number of classes Number of classes with more than two elements 4 deposited research 3 23 21 8 38 50 refereed research Treponema pallidum Chlamydia pneumoniae Methanococaus jannaschii Helicobacter pylori Thermotoga maritima Neisseria meningitidis Caulobacter crescentus [16] [18] [15] [19] [17] [20] [21] 1,138,006 1,229,853 1,664,976 1,667,867 1,860,725 2,272,351 4,016,917 87 74 557 297 218 1155 1114 3283 2519 4929 2317 1697 9900 4206 31 25 113 95 43 162 216 developed a procedure to define the most representative element for each class, which we call its prototype. Referring to the repeat map shown in .igure 5, we use the length of the merging repeat (lM) and the number of exact repeats (nM) to defined the desirable properties for the prototype. The different merging protocols affect the properties of the prototype. Thus, in the merging with gap procedure, the merging repeats with the longest lengths and with the greatest number of subrepeats should be the best candidates to represent the class. In this case, many members will consist of simple subsequences of the prototype. When we use the merging with overlap procedure, we also look for the greatest number of subrepeats, but the length of the most representative repeat should be closer to the shortest repeat in the class. In this case the representative element will tend to match across most of its length to every member of the class. Using these considerations, we can construct the objective function for both cases. .or each class, given the merging repeat length l (lM) and number of subrepeats n (nM), the maximum and the minimum repeat lengths in the class (lmax and lmin), and the maximum and the minimum number of subrepeats in the class (nmax and nmin ), we define the function .(l,n) for each merging repeat of the class as interactions lmax l nmax n . (l,n) = + nmax nmin lmax lmin for merging with gaps and information l lmin nmax n . (l,n) = + nmax nmin lmax lmin for merging with overlaps. 8 Genome Biology Vol 2 No 8 Volfvovsky et al. This non-negative function is a summary of the variance in the length and number of subrepeats from the desirable values for the class prototype. Then we solve the optimization problem of minimizing function .(l,n): find (l,n) corresponding to an element in the single repeat class: min .(l,n). If we get several elements that minimize this function, we select the one with the maximal number of subrepeats. Thus in our example (.igures 1,3-5) the prototype for class 1 is the longest repeat starting in position 77, with l = 11 and n = 2. Likewise, the prototype for class 2 is the repeat starting at position 23, and for class 3 it is the repeat starting at position 38. We used this procedure in our studies of the genome sequences of A. thaliana [2] and rice BAC end sequences [22,23]. [25] and the Arabidopsis gene database [26] (using a maximum BLAST E-value of 0.01 and at least 100% identity for Arabidopsis genes and at least 95% identity to AtRepBase sequences). Of 105,434 repeat sequences that fall into 27,961 separate repeat classes, 2,124 sequences matched an annotated repeat sequence in AtRepBase, and 25,149 sequences matched a segment of an Arabidopsis gene. Comparing both sets of matches, only 417 of the repeat sequences were found to match both a gene segment and an annotated repeat sequence. The large number of repeats that match gene segments reflects the prevalence of segmental chromosomal duplications and tandem gene duplications in Arabidopsis. Due to the greedy merging with gap method used to build the repeat classes, relatively few of the repeat classes contained an abundance of the repeat sequences; the largest repeat class contained 30,975 sequences of which 6,505 matched gene segments and 1,723 matched annotated repeats. To further analyze the composition of the repeat classes, a prototype repeat sequence was chosen to represent each repeat class containing at least five members, and the top database matches were identified (Table 5). Of the 1,454 prototype repeat sequences examined, approximately half (755) matched gene segments and 58 matched annotated repeats. The genes matched by the prototype repeat sequences include known members of large Arabidopsis gene families including a cytochrome P450, a receptor kinase, a diseaseresistance protein and several transposon open reading frames. In addition, there were many matches to hypothetical proteins, the validity of which remains to be determined. The biological relevance of the remaining repeat classes remains unclear at present. Repeat structure of the Arabidopsis genome The 125 Mb A. thaliana genome consists of five chromosomes ranging from 18 Mb to 30 Mb in length. We applied the suffix tree algorithm for finding exact repeats to each of these sequences separately, and than used our clustering method to determine the repetitive structure of each chromosome. We found from 100,000 to 400,000 pairs of exact repeats in each chromosome using a minimum length of 25 bp (after filtering out simple repeat sequences). These repeats in total represent approximately 10% of the chromosome sequences. To group the repeats into classes the gapmerging strategy was used, with a maximum gap size of less than 25 bp. The algorithm finds some 5,000-7,000 repeat classes per chromosome, but only 20% of these contain more than two elements. Arabidopsis is known to contain extensive gene duplication and strong evidence of a wholegenome duplication [2]; thus it is not surprising to observe such a preponderance of repeats with just two members. We defined the prototype element for each class using the optimization procedure described above, combined all the prototypes from five chromosomes in one database, and generated a final classification of the whole genome by clustering the BLAST search results of all prototypes against all. This resulted in over 5,000 classes with three or more elements. Table 4 contains a summary of the repeat structure for the entire A. thaliana genome. To find out more about the composition of the Arabidopsis repeats, each sequence was searched against AtRepBase Table 4 Summary of repeat analysis of Arabidopsis genome and rice BAC end sequences Class size Number of classes in Arabidopsis Number of classes in rice data 3 2,792 3,532 4 970 1,606 5 420 875 6-10 662 1,509 11-50 336 561 51-600 32 34 30,975 1 0 128,570 0 1 Rice repeat database Yuan et al. [27] recently reported on the construction of a rice repeat database that was generated by searching all available rice sequences for minisatellite sequences, mobile elements, rDNA, centromeric repeat sequences and telomeric repeat sequences. This database includes 215 sequences. We attempted to use the repeat finding system described here to enlarge this set, using as input the large collection of sequences from the Clemson University rice BAC end database [23]. Unlike either Arabidopsis or the microbial genomes, where a single genome sequence or a few large chromosomes were http://genomebiology.com/2001/2/8/research/0027.9 Table 5 Prototype repeat sequences ( Arabidopsis thaliana genome) that matched genes or annotated AtRepBase repeats comment Class number 1 12639 56 42 284 6 20 54 62 95 1389 269 58 236 18068 5 29 400 12310 187 38 47 104 345 12594 735 240 167 60 211 81 124 411 324 170 421 293 357 426 22466 64 242 249 18597 256 202 290 18166 12400 297 Class size 30,975 202 164 135 135 133 111 111 107 85 85 81 78 71 67 64 64 57 56 55 50 48 46 45 45 45 43 42 41 41 39 38 38 37 37 37 36 35 34 32 32 32 31 31 31 30 30 29 29 29 Genes Hypothetical protein Hypothetical protein ATR0087 Pseudogene Hypothetical protein Putative O-methyltransferase 1 Putative reverse transcriptase Hypothetical protein Hypothetical protein Putative receptor kinase Putative receptor kinase Putative retroelement pol polyprotein Putative disease resistance protein Putative Ser/Thr kinase Pseudogene Hypothetical protein Hypothetical protein Hypothetical protein Hypothetical protein Putative NBS/LRR disease resistance protein Putative phenylalanine ammonia-lyase Pseudogene Putative reverse transcriptase Hypothetical protein Putative disease-resistance protein Pseudogene Hypothetical protein Hypothetical protein Hypothetical protein Putative disease-resistance protein X93607 ATR0046 ATR0084 repeat 5 from 102144 to 105991 AF024504 ATR0056 repeat01 ATR0090 Repeats ATR0081 minisatellite 1 from 63767 to 63826 AC002534 ATR0058 reviews reports U65470 ATR0043 deposited research M65137 ATR0025 refereed research Hypothetical protein Pseudogene CHP-rich zinc finger protein-like Hypothetical protein Hypothetical protein Hypothetical protein Putative cytochrome P450 Putative transposon protein Putative serine carboxypeptidase Hypothetical protein Pseudogene Putative CHP-rich zinc finger protein Hypothetical protein Mutator-like transposase interactions information ATR0089 10 Genome Biology Vol 2 No 8 Volfvovsky et al. being processed, in this case we had 101,562 BAC end sequences with an average length of approximately 400700 bp. We therefore developed a special pre-processing procedure which generates a single sequence (approximately 42 Mb long) from all the BAC ends. Each original sequence is represented by its coordinate in the new sequence. This procedure permits the algorithm to work with hundreds of thousands of different sequences simultaneously. The system found 5,208,206 exact repeat pairs with lengths from 25 bp to 728 bp, where the latter represents an entire BAC end that was repeated exactly. The maximum length of each repeat was bounded by the length of the BAC end sequence in which it was found. This length restriction was added to the merging procedure to avoid artificially long repeats that might mistakenly span more than one BAC end sequence. The pre-clustering procedure also includes filtering of the exact repeats data to remove simple-sequence repeats, which were determined to comprise over 40% of exact repeats. We merged the filtered exact repeats data, requiring an overlap of 95%. This resulted in 48,768 repeat classes, of which only 8,118 include more than two elements. Table 4 contains a summary of these repeat classes. A searchable rice repeat database, based on the prototypes of these classes, is available online at [28]. To test this new repeat database, we compared it to the set of annotated repeats based on known, expertly curated repeats [27]. There were four general groups in this set: telomere/centromere repeats, transposon/transposon-like repeats, rDNA, and all the rest [27]. We used BLAST to search annotated repeats against the rice repeat database, using an E-value cutoff of 10-8. Classification of the BLAST hits shows that the annotated repeats from the four distinct groups always fall into separate classes in the rice repeat database; in other words, the new database divides the previous repeat classes into a finer-grained set of repeats, but it does not merge any of the four known groups together. Conclusions We describe a new system for rapid identification of all repeats in genome sequences and assignment of these repeats to similarity classes. The system has been used to analyze the repeat structure of several complete microbial genomes, and the much larger genome of the model plant A. thaliana. We also used it to create a new rice repeat database, based on an analysis of a large BAC end sequence database. This new computational tool should prove helpful in the analysis of repeat structure for both complete and partial genome sequences. Acknowledgements We thank N. El-Sayed, O. White, J.F. Heidelberg, M.-I. Benito, H.M. Khouri, T.V. Feldblyum, M. Pop, J.R. Buchoff and M.F. Shumway for helpful comments, suggestions and discussion. This work was supported in part by NSF grants KDI-9980088 and IIS-9902923 and by NIH grant R01LM06845. References 1. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al.: The genome sequence of Drosophila melanogaster. Science 2000, 287:2185-2195. The Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 2000, 408:796-815. Leung M-Y, Blaisdell BE, Burge C, Karlin, S: An efficient algorithm for identifying matches with errors in multiple long molecular sequences. J Mol Biol 1991, 221:1367-1378. Agarwal P, States, DJ: The Repeat Pattern Toolkit (RPT): analyzing the structure and evolution of the C. elegans genome. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, ISMB 94, 1-9. Menlo Park, CA: AAAI Press, 1994. Kannan SK, Myers EW: An algorithm for locating nonoverlapping regions of maximal alignment score. SIAM J Comput 1996, 25:648-662. Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 1999 27:573-580. RepeatMasker [http://ftp.genome.washington.edu/RM/RepeatMasker.html] Bedell JA, Korf I, Gish W: MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics 2000, 16:10401041. Gish W, States DJ: Identification of protein coding regions by database similarity search. Nat Genet 1993, 3:266-272. Washington University School of Medicine: Index of /blast/blast [http://blast.wustl.edu/blast] Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL: Alignment of whole genomes. Nucleic Acids Res 1999, 27:2369-2376. Kurtz S, Schleiermacher C: REPuter - fast computation of maximal repeats in complete genomes. Bioinformatics 1999, 15:426-427. Kurtz S, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R: Computation and visualization of degenerate repeats in complete genomes. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, 2000. Menlo Park, CA: AAAIPress, 228-238. TIGR software tools [http://www.tigr.org/softlab/] Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RA, Gocayne JD, et al.: Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 1996, 273:1058-1073. Fraser CM, Norris SJ, Weinstock GM, White O, Sutton G, Clayton R, Dodson R, Gwinn M, Hickey E, Ketchum KA, et al.: Complete genomic sequence of Treponema pallidum, the syphilis spirochete. Science 1998, 281:375-388. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Performance Because of the use of the efficient suffix tree procedures, the system runs very fast, with the all-versus-all BLAST search consuming approximately 80% of the computation time. The running time of the exact repeat finder is about 10-15% of the total, with the other processes - merging, clustering and post-BLAST updating - using a relatively minor proportion of overall computation time. The running time depends on both the sequence length and the number of repeats; for example, small microbial genomes take just 3-15 minutes, whereas the highly repetitive rice repeat database took about two days to process. The memory needed for computation is dominated by the requirements of the suffix tree used for the initial repeats computation [11-13]; this can grow to many gigabytes for large eukaryotic chromosomes. 14. 15. 16. http://genomebiology.com/2001/2/8/research/0027.11 17. Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH, Hickey EK, Peterson JD, Nelson WC, Ketchum KA, et al.: Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima. Nature 1999, 399:323-329. 18. Read TD, Brunham RC, Shen C, Gill SR, Heidelberg JF, White O, Hickey EK, Peterson J, Utterback T, Berry K, et al.: Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39. Nucleic Acids Res 2000, 28:1397-1406. 19. Tomb JF, White O, Kerlavage AR, Clayton RA, Sutton GG, Fleischmann RD, Ketchum KA, Klenk HP, Gill SR, Dougherty BA, et al.: The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 1997, 388:539-547. 20. Tettelin H, Saunders NJ, Heidelberg J, Jeffries AC, Nelson KE, Eisen JE, Ketchum KA, Hood DW, Peden JF, Dodson RJ, et al.: Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. Science 2000, 287:1809-1815. 21. Nierman W, Feldblyum TV, Laub MT, Paulsen IT, Nelson KE, Eisen J, Heidelberg JF, Alley MRK, Ohta N, Maddock JR, et al.: Complete genome sequence of Caulobacter crescentus. Proc Natl Acad Sci USA 2001, 98:4136-4141. 22. Mao L, Wood TC, Yu Y, Budiman MA, Tomkins J, Woo S, Sasinowski M, Presting G, Frisch D, Goff S, et al.: Rice transposable elements: a survey of 73,000 sequence-tagged-connectors. Genome Res 2000, 10:982-990. 23. Clemson University rice BAC end database [http://www.genome.clemson.edu/projects/rice/rice_bac_end] 24. Gusfield D: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. New York: Cambridge University Press, 1997. 25. AtRepBase [http://nucleus.cshl.org/protarab/AtRepBase.htm] 26. Arabidopsis gene sequence database [http://www.tigr.org/tdb/e2k1/ath1/ath1.shtml] 27. Yuan Q, Liang F, Hsiao J, Zismann V, Benito M-I, Quackenbush J, Wing R, Buell R: Anchoring of rice BAC clones to the rice genetic map in silico. Nucleic Acids Res 2000, 28: 3636-3641. 28. Oryza sativa repeat database search [http://www.tigr.org/tdb/rice/blastsearch.shtml] comment reviews reports deposited research refereed research interactions information
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Maryland - TOMOS - 1903
The InsTITuTe for sysTems researchIsr TechnIcal rePorT 2007-26The Legacy of Taylor, Gantt, and Johnson: How to Improve Production SchedulingJeffrey W. HerrmannIsr develops, applies and teaches advanced methodologies of design and analysis to so
Maryland - TOMOS - 7488
The InsTITuTe for sysTems researchIsr TechnIcal rePorT 2007-26The Legacy of Taylor, Gantt, and Johnson: How to Improve Production SchedulingJeffrey W. HerrmannIsr develops, applies and teaches advanced methodologies of design and analysis to so
Maryland - TOMOS - 1903
EMERGY BASIS OF FOREST SYSTEMSBy DAVID ROGERS TILLEYA DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1999ACKNOWL
Maryland - TOMOS - 2812
EMERGY BASIS OF FOREST SYSTEMSBy DAVID ROGERS TILLEYA DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1999ACKNOWL
Maryland - TOMOS - 1903
Maryland - TOMOS - 4329
Maryland - TOMOS - 1903
FMrI reveAls lonG-terM eFFects oF PrenAtAl druG exPosure on vIsuosPAtIAl WorkInG MeMory netWorks durInG Adolescencetracy deBoer , Julie schweitzer , Pradeep k. kurup , thomas J. ross , Monique ernst , Prasanna nair , Maureen Black , Betty Jo salmero
Maryland - TOMOS - 8674
FMrI reveAls lonG-terM eFFects oF PrenAtAl druG exPosure on vIsuosPAtIAl WorkInG MeMory netWorks durInG Adolescencetracy deBoer , Julie schweitzer , Pradeep k. kurup , thomas J. ross , Monique ernst , Prasanna nair , Maureen Black , Betty Jo salmero
Maryland - TOMOS - 1903
Conscious experience versus conscious thoughtPeter CarruthersAre there different constraints on theories of conscious experience as against theories of conscious propositional thought? Is what is problematic or puzzling about each of these phenome
Maryland - TOMOS - 4353
Conscious experience versus conscious thoughtPeter CarruthersAre there different constraints on theories of conscious experience as against theories of conscious propositional thought? Is what is problematic or puzzling about each of these phenome
Maryland - TOMOS - 1903
Suffering without subjectivityPeter CarruthersThis paper argues that it is possible for suffering to occur in the absence of phenomenal consciousness in the absence of a certain sort of experiential subjectivity, that is. (Phenomenal consciousnes
Maryland - TOMOS - 4357
Suffering without subjectivityPeter CarruthersThis paper argues that it is possible for suffering to occur in the absence of phenomenal consciousness in the absence of a certain sort of experiential subjectivity, that is. (Phenomenal consciousnes
Maryland - TOMOS - 1903
http:/genomebiology.com/2002/3/6/research/0029.1ResearchFull-length messenger RNA sequences greatly improve genome annotationBrian J Haas*, Natalia Volfovsky*, Christopher D Town*, Maxim Troukhan, Nickolai Alexandrov, Kenneth A Feldmann, Richard
Maryland - TOMOS - 8007
http:/genomebiology.com/2002/3/6/research/0029.1ResearchFull-length messenger RNA sequences greatly improve genome annotationBrian J Haas*, Natalia Volfovsky*, Christopher D Town*, Maxim Troukhan, Nickolai Alexandrov, Kenneth A Feldmann, Richard
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:OUTCOMES OF AN ELEMENTARY GRADES SOCIAL COMPETENCE EXPERIMENT ACCORDING TO STUDENT SELF-REPORTElise T. Harak, Doctor of Philosophy. 2008Directed By:Professor Gary D. Gottfredson Counseling and Personnel ServicesP
Maryland - TOMOS - 8520
ABSTRACTTitle of Document:OUTCOMES OF AN ELEMENTARY GRADES SOCIAL COMPETENCE EXPERIMENT ACCORDING TO STUDENT SELF-REPORTElise T. Harak, Doctor of Philosophy. 2008Directed By:Professor Gary D. Gottfredson Counseling and Personnel ServicesP
Maryland - TOMOS - 1903
ABSTRACTTitle of Thesis:OXIDATION-REDUCTION TRANSFORMATIONS OF CHROMIUM IN AEROBIC SOILS AND THE ROLE OF ELECTRON-SHUTTLING QUINONES IN CHEMICAL AND MICROBIOLOGICAL PATHWAYSDominic A. Brose, Master of Science, 2008Directed By:Professor Bruce
Maryland - TOMOS - 8316
ABSTRACTTitle of Thesis:OXIDATION-REDUCTION TRANSFORMATIONS OF CHROMIUM IN AEROBIC SOILS AND THE ROLE OF ELECTRON-SHUTTLING QUINONES IN CHEMICAL AND MICROBIOLOGICAL PATHWAYSDominic A. Brose, Master of Science, 2008Directed By:Professor Bruce
Maryland - TOMOS - 1903
Applied Psycholinguistics 21 (2000), 429449 Printed in the United States of AmericaSyllable structure development of toddlers with expressive specific language impairment AIMEE BAIRD PHARR and NAN BERNSTEIN RATNER University of Maryland, College P
Maryland - TOMOS - 7473
Applied Psycholinguistics 21 (2000), 429449 Printed in the United States of AmericaSyllable structure development of toddlers with expressive specific language impairment AIMEE BAIRD PHARR and NAN BERNSTEIN RATNER University of Maryland, College P
Maryland - TOMOS - 1903
ABSTRACTTitle of Dissertation:EXECUTIVE COACHING AS A DEVELOPMENTAL EXPERIENCE: A FRAMEWORK AND MEASURE OF COACHING DIMENSIONS Hilary Joyce Gettman, Doctor of Philosophy, 2008Directed by:Professor Cynthia K. Stevens, Management and Organizati
Maryland - TOMOS - 8630
ABSTRACTTitle of Dissertation:EXECUTIVE COACHING AS A DEVELOPMENTAL EXPERIENCE: A FRAMEWORK AND MEASURE OF COACHING DIMENSIONS Hilary Joyce Gettman, Doctor of Philosophy, 2008Directed by:Professor Cynthia K. Stevens, Management and Organizati
Maryland - TOMOS - 1903
ABSTRACTTitle of Thesis:RESEARCH ON MUSIC AND HEALING IN ETHNOMUSICOLOGY AND MUSIC THERAPY May May Chiang, Master of Arts, 2008Directed By:Professor J. Lawrence Witzleben Department of Musicology and Ethnomusicology, Chair.This thesis exami
Maryland - TOMOS - 8236
ABSTRACTTitle of Thesis:RESEARCH ON MUSIC AND HEALING IN ETHNOMUSICOLOGY AND MUSIC THERAPY May May Chiang, Master of Arts, 2008Directed By:Professor J. Lawrence Witzleben Department of Musicology and Ethnomusicology, Chair.This thesis exami
Maryland - TOMOS - 1903
Maryland - TOMOS - 5731
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:THE INFLUENCE OF VISFATIN AND VISFATIN GENE POLYMORPHISMS ON GLUCOSE AND OBESITY-RELATED VARIABLES AND THEIR RESPONSES TO AEROBIC EXERCISE TRAINING Jennifer Ann McKenzie, Ph.D., 2008Directed By:Professor James M. Hag
Maryland - TOMOS - 8616
ABSTRACTTitle of Document:THE INFLUENCE OF VISFATIN AND VISFATIN GENE POLYMORPHISMS ON GLUCOSE AND OBESITY-RELATED VARIABLES AND THEIR RESPONSES TO AEROBIC EXERCISE TRAINING Jennifer Ann McKenzie, Ph.D., 2008Directed By:Professor James M. Hag
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:NO WOMAN IS THE WORSE FOR SENSE AND KNOWLEDGE : SAMUEL JOHNSON AND WOMEN Julia Robertson Acker, M.A., 2007Directed By:Laura J. Rosenthal, Professor and Associate Chair, Department of English Language and Literature
Maryland - TOMOS - 7645
ABSTRACTTitle of Document:NO WOMAN IS THE WORSE FOR SENSE AND KNOWLEDGE : SAMUEL JOHNSON AND WOMEN Julia Robertson Acker, M.A., 2007Directed By:Laura J. Rosenthal, Professor and Associate Chair, Department of English Language and Literature
Maryland - TOMOS - 1903
TECHNICAL RESEARCH REPORTEfficient Retrieval of Similar Time Sequences Under Time Warpingby B. Yi, H. V. Jagadish, C. FaloutsosT.R. 97-77ISRINSTITUTE FOR SYSTEMS RESEARCHSponsored by the National Science Foundation Engineering Research Cent
Maryland - TOMOS - 5886
TECHNICAL RESEARCH REPORTEfficient Retrieval of Similar Time Sequences Under Time Warpingby B. Yi, H. V. Jagadish, C. FaloutsosT.R. 97-77ISRINSTITUTE FOR SYSTEMS RESEARCHSponsored by the National Science Foundation Engineering Research Cent
Maryland - TOMOS - 1903
TECHNICAL RESEARCH REPORTLocal Pursuit as a Bio-Inspired Computational Optimal Control Tool by Cheng Shao, Dimitrios Hristu-VarsakelisCDCSS TR 2005-1 (ISR TR 2005-85)C+ -D SCENTER FOR DYNAMICS AND CONTROL OF SMART STRUCTURESThe Center fo
Maryland - TOMOS - 6549
TECHNICAL RESEARCH REPORTLocal Pursuit as a Bio-Inspired Computational Optimal Control Tool by Cheng Shao, Dimitrios Hristu-VarsakelisCDCSS TR 2005-1 (ISR TR 2005-85)C+ -D SCENTER FOR DYNAMICS AND CONTROL OF SMART STRUCTURESThe Center fo
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:THE STRUCTURE OF RESPONSIBILITY: SYMMETRY, AGENCY, AND UNDERMINING FACTORS Matthew David King, Ph. D, 2008Directed By:Professor Christopher Morris, Philosophyought to explain what conditions must be satisfied for an
Maryland - TOMOS - 8072
ABSTRACTTitle of Document:THE STRUCTURE OF RESPONSIBILITY: SYMMETRY, AGENCY, AND UNDERMINING FACTORS Matthew David King, Ph. D, 2008Directed By:Professor Christopher Morris, Philosophyought to explain what conditions must be satisfied for an
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:ABORTION ESCORTS AND DEMOCRATIC PARTICIPATION Steven Douglas Maloney, Doctor of Philosophy, 2008Directed By:Professor C. Fred Alford Department of Government and PoliticsMy dissertation explores the theoretical val
Maryland - TOMOS - 8070
ABSTRACTTitle of Document:ABORTION ESCORTS AND DEMOCRATIC PARTICIPATION Steven Douglas Maloney, Doctor of Philosophy, 2008Directed By:Professor C. Fred Alford Department of Government and PoliticsMy dissertation explores the theoretical val
Maryland - TOMOS - 1903
ABSTRACTTitle of dissertation:Mathematical Problems Arising When Connecting Kinetic To Fluid Regimes Weiran Sun, Doctor of Philosophy, 2008Dissertation directed by: Professor C. David LevermoreDepartment of MathematicsIn this dissertation w
Maryland - TOMOS - 8555
ABSTRACTTitle of dissertation:Mathematical Problems Arising When Connecting Kinetic To Fluid Regimes Weiran Sun, Doctor of Philosophy, 2008Dissertation directed by: Professor C. David LevermoreDepartment of MathematicsIn this dissertation w
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:COMBINATORIAL DISCOVERY OF A MORPHOTROPIC PHASE BOUNDARY IN A LEAD-FREE PIEZOELECTRIC MATERIAL Shigehiro Fujino, Doctor of Philosophy, 2008Directed By:Professor Ichiro Takeuchi Department of Materials Science and Eng
Maryland - TOMOS - 8554
ABSTRACTTitle of Document:COMBINATORIAL DISCOVERY OF A MORPHOTROPIC PHASE BOUNDARY IN A LEAD-FREE PIEZOELECTRIC MATERIAL Shigehiro Fujino, Doctor of Philosophy, 2008Directed By:Professor Ichiro Takeuchi Department of Materials Science and Eng
Maryland - TOMOS - 1903
ABSTRACTTitle of Dissertation:ENVIRONMENTAL RISK FACTORS, HEALTH AND THE LABOR MARKET RESPONSE OF HOUSEHOLDS IN THE UNITED STATES Marcella Veronesi, Ph.D., 200835BDirected By:Professor Anna Alberini, Department of Agricultural and Resource E
Maryland - TOMOS - 8557
ABSTRACTTitle of Dissertation:ENVIRONMENTAL RISK FACTORS, HEALTH AND THE LABOR MARKET RESPONSE OF HOUSEHOLDS IN THE UNITED STATES Marcella Veronesi, Ph.D., 200835BDirected By:Professor Anna Alberini, Department of Agricultural and Resource E
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:THE INTERGENERATIONAL TRANSMISSION OF GENDER-ROLE ATTITUDES AND BEHAVIOR: HOW DO PARENTS MATTER? Vanessa R. Wight, Ph.D., 2008Directed By:Professor Suzanne M. Bianchi Department of SociologyThis study examines the
Maryland - TOMOS - 8556
ABSTRACTTitle of Document:THE INTERGENERATIONAL TRANSMISSION OF GENDER-ROLE ATTITUDES AND BEHAVIOR: HOW DO PARENTS MATTER? Vanessa R. Wight, Ph.D., 2008Directed By:Professor Suzanne M. Bianchi Department of SociologyThis study examines the
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:FEASIBILITY OF EXTRACTING SOLANESOL FROM TOBACCO BIOMASS AS A BYPRODUCT FOLLOWING PROTEIN RECOVERY Peter Machado, Master of Science, 2008Directed By:Associate Professor Y. Martin Lo, Ph.D. Nutrition and Food Science
Maryland - TOMOS - 8551
ABSTRACTTitle of Document:FEASIBILITY OF EXTRACTING SOLANESOL FROM TOBACCO BIOMASS AS A BYPRODUCT FOLLOWING PROTEIN RECOVERY Peter Machado, Master of Science, 2008Directed By:Associate Professor Y. Martin Lo, Ph.D. Nutrition and Food Science
Maryland - TOMOS - 1903
ABSTRACTTitle of Thesis:BEYOND PEMDAS: TEACHING STUDENTS TO PERCEIVE ALGEBRAIC STRUCTUREEthan Michael Merlin, Master of Arts, 2008Thesis directed by:Professor Lawrence M. Clark Department of Curriculum and InstructionEvidence shows that t
Maryland - TOMOS - 8553
ABSTRACTTitle of Thesis:BEYOND PEMDAS: TEACHING STUDENTS TO PERCEIVE ALGEBRAIC STRUCTUREEthan Michael Merlin, Master of Arts, 2008Thesis directed by:Professor Lawrence M. Clark Department of Curriculum and InstructionEvidence shows that t
Maryland - TOMOS - 1903
Maryland - TOMOS - 5161
Maryland - TOMOS - 1903
Maryland - TOMOS - 5876
Maryland - TOMOS - 1903
ABSTRACTTitle of dissertation:SINGULAR CURVESMODULIOFSHIMURAEric Francis Errthum, Doctor of Philosophy, 2007Dissertation directed by:Professor Stephen S. Kudla Department of MathematicsThe j-function acts as a parametrization of the
Maryland - TOMOS - 6785
ABSTRACTTitle of dissertation:SINGULAR CURVESMODULIOFSHIMURAEric Francis Errthum, Doctor of Philosophy, 2007Dissertation directed by:Professor Stephen S. Kudla Department of MathematicsThe j-function acts as a parametrization of the
Maryland - TOMOS - 1903
ABSTRACTTitle of Dissertation:INFORMATION TECHNOLOGY AND ITS TRANSFORMATIONAL EFFECT ON THE HEALTH CARE INDUSTRYCorey M. Angst Doctor of Philosophy in Information Systems, 2007 Dissertation directed by: Professor Ritu Agarwal Decision and Infor
Maryland - TOMOS - 6780
ABSTRACTTitle of Dissertation:INFORMATION TECHNOLOGY AND ITS TRANSFORMATIONAL EFFECT ON THE HEALTH CARE INDUSTRYCorey M. Angst Doctor of Philosophy in Information Systems, 2007 Dissertation directed by: Professor Ritu Agarwal Decision and Infor
Maryland - TOMOS - 1903
ABSTRACT Title of Document: MY MOBILE MUSIC: AN ADAPTIVE PERSONALIZATION SYSTEM FOR DIGITAL AUDIO PLAYERS Tuck Siong Chung, Ph.D., 2007 Directed By: Professor Roland T. Rust, Department of Marketing Professor Michel Wedel, Department of Marketing Thi
Maryland - TOMOS - 7250
ABSTRACT Title of Document: MY MOBILE MUSIC: AN ADAPTIVE PERSONALIZATION SYSTEM FOR DIGITAL AUDIO PLAYERS Tuck Siong Chung, Ph.D., 2007 Directed By: Professor Roland T. Rust, Department of Marketing Professor Michel Wedel, Department of Marketing Thi
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:GOVERNMENT WEBSITES FOR SPECIAL POPULATIONS: TOWARD CONTENTBASED EVALUATION Kelly M. Hoffman, M.L.S., 2007Directed By:Professor Paul T. Jaeger, College of Information StudiesE-Government research has traditionally