1 15-853 Page 1 15-853:Algorithms in the Real World Computational Biology I –Introduct ion ±to ±Comp . ±B io . – Longest Common Subsequence and Minimum Edit Distance 15-853 Page 2 DNA DNA : sequence of base-pairs (bp): { A, C, T, G } Human Genome about 3 x 10 9 bps divided into 46 chromosomes with between 5 x 10 7 and 25 x 10 7 bps each Each chromosome is a sequence of base-pairs DNA is used to generate proteins : DNA mRNA Protein transcription translation 15-853 Page 3 Proteins Proteins : sequence of Amino Acids { gly, trp, cys, … } (20 of them) Each DNA bp triple (a “ codon ”) encodes 1 amino acid Since there are 64 possible codons, this is a many to one mapping. Some triples have special meanings, e.g. EOF. Chromosomes are partitioned into genes each of which codes a protein. Some regions of the chromosome do not code anything ( intergene DNA ). gene 1 gene 3 gene 2 15-853 Page 4 Form and Function The Amino Acid sequence determines the protein’s 3d structure . The structure is also be affected by the environment. –The primary structure refers to the amino acid sequence. secondary structure refers to general configuration into alpha helixes and beta sheets tertiary structure refers to the full 3d structure Protein’s 3d structure determines its function .
2 15-853 Page 5 Some Goals in Molecular Biology 1. Extract and compare genome sequence for various organisms. 2. Determine what proteins they code. 3. Determine structure and purpose of coded proteins. Goals 2. and 3. can often be aided by matching genome or protein sequences to previously studied sequences Use to: study and cure genetic diseases –d e s i g n d r u g s –s t u d y e v o l u t i o n understand cellular processes 15-853 Page 6 Example of MS Multiple Sclerosis is a disease in which the immune system attacks the myelin sheaths of nerve cells Conjecture : The immune system T-cells incorrectly identify the myelin sheaths as a virus or bacteria from an earlier infection.
