PartIII.Scoring.Local.Multiple

PartIII.Scoring.Local.Multiple - Outline Scoring Matrices...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Outline Scoring Matrices Alignment with Affine Gap Penalties Local Alignment Multiple Alignment In this part we consider certain generalization of the definition of the concept of similarity leading to scoring matrices that are deterministic or probabilistic. We will also consider gaps in the alignment and also local similarity of a much smaller length string against a larger string. We then consider alignment of a set of strings, the set containing more than two strings. The Basic Problem A gene or a protein may be related to another gene or protein. Relatedness may mean 1. They are homologous if they shared a common ancestry. 2. They may have common functions. Analysis of DNA or protein sequences (the sequence of amino acids or residues ) may reveal certain domains or motifs that are shared among a group of molecules. Protein alignments give more information than DNA alignments. This is because certain DNA mutations, particularly at the third location in a codon, do not change the protein. Such mutations are called silent mutations . Also mutations in the intron regions of a DNA has practically no effect on the protein. When a DNA sequence is analyzed, it is common practice to analyze the translated amino acid sequence. Protein sequence comparison can identify homologous sequences that originated from a common ancestor over 1 billion years ago (BYA) whereas DNA sequences can look back up to 600 MYA (millions of years ago). But there are situations where the DNA sequence must be identified viz. to locate a gene or a motif, searching for polymorphism or identifying a cloned CDNA fragment. Need to Develop Scoring Matrices Two sequences are either homologous or not homologous. Statements like two sequences are 20% or 50% homologous are wrong. The only relevant criterion to be homologous is that they are originated from a common ancestral sequence. But, it is correct to say that two homologous sequences are 20% or 50% similar if 20% or 50 % of nucleotides or residues are identical (matched). The cost of substitution of one nucleotide for another nucleotide is set arbitrarily in models for DNA comparison, but two amino acids may not matched but may still be biochemically or biophysically related and may command a large similarity score. These are called conservative substitutions. Thus definition of scoring matrices are essential for comparing amino acid sequences. Homologous: Orthologous and Paralogous Homologous proteins may be orthologous or paralogous. Orthologs are homologous sequences in different species that arose from a common ancestor. For example, humans and rodents diverged 80 MYA (millions of years ago) when a single ancestral myoglobin gene diverged by speciation. Orthologs have similar biological function viz....
View Full Document

Page1 / 139

PartIII.Scoring.Local.Multiple - Outline Scoring Matrices...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online