Pruner - PRUNER: Algorithms for Finding Monad Patterns in...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
PRUNER: Algorithms for Finding Monad Patterns in DNA Sequences Ravi VijayaSatya and Amar Mukherjee (rvijaya, amar) School of Computer Science, University of Central Florida, Orlando, FL 32816-2362 Abstract In this paper, we present new algorithms for discovering monad patterns in DNA sequences. Monad patterns are of the form ( l , d )- k , where l is the length of the pattern, d is the maximum number of mismatches allowed, and k is the minimum number of times the pattern is repeated in the given sample. The time-complexity of some of the best known algorithms to date is O ( nt 2 l d | | d ), where t is the number of input sequences, and n is the length of each input sequence. The first algorithm that we present in this paper takes O ( n 2 t 2 l d/2 | | d/2 ) and space O ( ntl d/2 | | d/2 ), and the second algorithm takes O( n 3 t 3 l d /2 | | d /2 ) time using O( l d /2 | | d /2 ) space. In practice, our algorithms have much better performance provided the d/l ratio is small. The second algorithm performs very well even for large values l and d as long is the d/l ratio is small. Keywords: Pattern discovery, regulatory patterns, k-mismatch patterns 1. Introduction Discovering regulatory patterns in DNA sequences is a well known problem in computational biology. Due to mutations and other errors, the actual occurrences of these regulatory patterns allow for a certain degree of error. There fore, the actual regulatory pattern (or the consensus pattern) may never appear in a gene upstream region, but d- mismatch occurrences of this pattern might appear. The general approach to this problem is to take a set of t DNA sequences each of length n , at least k of which are guaranteed to contain the desired binding site, and look for patterns of a certain length l that occur in at least k out of the t sequences with at most d mismatches at each occurrence. The values of l , d and k can be determined either from prior knowledge about the binding site, or by trail and error, trying different values of l and d . These single contiguous blocks of patterns are called monad patterns. In general, many regulatory signals are made up of a group of monad patterns occurring within a certain distance form each other [Eskin et. al, 2003, Eskin et. al. 2002, GuhaThakurtha et. al. 2001, van Helden et. al. 2000]. In such a case, the patterns are called dyad , triad multi-ad , or in general as composite patterns. Finding the composite patterns by finding the component monad patterns individually is significantly more difficult, since the composite monad patterns might be too subtle to detect. Eskin & Pevzner [Eskin et. al., 2002] present a simple transformation to convert a multi-ad problem into a slightly larger monad problem. In this paper, we present an algorithm to solve the monad problem. The same transformation as in [Eskin et. al., 2002] can be applied to transform a multi-ad problem into a monad problem that is handled by our algorithm.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 06/12/2011 for the course CAP 5510 taught by Professor Staff during the Spring '08 term at University of Central Florida.

Page1 / 12

Pruner - PRUNER: Algorithms for Finding Monad Patterns in...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online