An Introduction to Bioinformatics•Dynamic programming algorithms give “correct” solutions but is very slow unless the sequences are quite short.•Current common protein sequence data bases contain more than 100 M residues, For a “query” sequence of 1000 residues, we need to evaluate 1011matrix cells. Even if you compute 10 M cells /second, it will take 104secs =~ 3 hour just for one query.•Goal: search small fraction of the possible high scoring alignments.•The vast literature on exact and approximate match algorithms can be used. But with scoring matrices, distant matches are hard to find.•Need heuristic algorithms: FASTA and BLAST are two such classes of algorithms, BLAST is more popular but we still use the FASTA data format.Database Searching
An Introduction to BioinformaticsDatabase searching•Core: pair-wise alignment algorithm•Speed (fast sequence comparison)•Relevance of the search results (statistical tests)•Recovering all information of interest•The results depend of the search parameters like gap penalty, scoring matrix.•Sometimes searches with more than one matrix should be preformed
This preview has intentionally blurred sections.
Sign up to view the full version.