58 What Use is All this Sequence Information? •Predict protein structure and function •Predict domain(s) present in the sequence •Predict sub-cellular location •Predict post-translational modifications •Infer evolutionary (phylogenetic) relationships All these tasks require fast, robust, and reliable alignment of sequences
59 Sequence Alignment: A Fundamental Concept in Bioinformatics •Aligningsequences of nucleic acids or the encoded proteins (amino acid chains) allows us to infer the properties of a novel gene/sequence by inferencebased on sequence similarity to a previously characterized gene. •This process provides the basis for –structure and function prediction (for even small regions of a protein) –regulatory properties –phylogenetic inference Further reading: David W. Mount. 2001. Bioinformatics. Sequence & Genome Analysis. CSH Press.
60 Sequence Alignment: A Fundamental Concept in Bioinformatics BLAST (Basic Local Alignment Search Tool) •One of the fastest and most robust algorithms for searching an entire database for regions of similarities. •Ideal for finding domain similarities and classifying (annotating) a gene based on function. •Not appropriate for inferring evolutionary relationships. Homologous Sequences •Orthologs –genes separated by speciation •Paralogs –genes produced by gene duplication within the same species Phylogenetics