intro-bioinformatics - Introduction to Bioinformatics...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Introduction to Bioinformatics Introduction • • • • • • • • • • An introduction to bioinformatics (May 27) An Sequence comparison (May 27) Gene finding (May 28) Gene finding (May 28) Barcode analysis of genomes (May 31) Barcode Prediction of binding motifs (May 31) Prediction Protein structure prediction (June 1) Protein function prediction (June 1) Protein Microarray data analysis (June 2) Microarray Biological pathway prediction (June 2) Goals Goals A general introduction to the field of bioinformatics general what problems people have been and are currently working on what how people solve these problems how what key computational techniques are used and needed what how much help computing has provided to biological research how A way of thinking -- tackling “biological problems” way computationally how to look at a “biological problem” from a computational point of view how how to collect statistics from biological data how statistics from how to build a “computational” model how how to solve a computational modeling problem algorithmically how Goals Goals • Some exposure to computational biology and bioinformatics research – what are the main research areas – key challenges – available tools and resources • Basic topics of bioinformatics, covering – – – – computational genomics computational proteomics structural bioinformatics computational systems biology Course Material Course • All the ppt files used in this short course can be downloaded from All ppt files – – under “Lecture” under • Basic molecular biology and genetics – – primer.pdf • Special issue of Journal of Computer Science and Technology on Special Journal on “Computational Challenges from Modern Biology” – Course Requirements • Please turn off your mobile phones! • Strongly encourage you to do the homework (but I will Strongly not grade them) not An Introduction to Bioinformatics An Ying Xu (徐鹰) The Basics The ccgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatc gtgtgggtagtagctgatatgatgcgaggtaggggataggatagc aacagatgagcggatgctgagtgcagtggcatgcgatgtcgatga tagcggtaggtagacttcgcgcataaagctgcgcgagatgattgc aaagragttagatgagctgatgctagaggtcagtgactgatgatcg atgcatgcatggatgatgcagctgatcgatgtagatgcaataagtc gatgatcgatgatgatgctagatgatagctagatgtgatcgatggta ggtaggatggtaggtaaattgatagatgctagatcgtaggta…… …………………………… cell chromosome genes protein genome: DNA sequence metabolic pathway/network Biology and Computation Biology biology computation molecular biology is information science ....... -- Leroy Hood Bioinformatics Bioinformatics (or computational biology) ccgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatc gtgtgggtagtagctgatatgatgcgaggtaggggataggatagc aacagatgagcggatgctgagtgcagtggcatgcgatgtcgatga tagcggtaggtagacttcgcgcataaagctgcgcgagatgattgc aaagragttagatgagctgatgctagaggtcagtgactgatgatcg atgcatgcatggatgatgcagctgatcgatgtagatgcaataagtc gatgatcgatgatgatgctagatgatagctagatgtgatcgatggta ggtaggatggtaggtaaattgatagatgctagatcgtaggta…… …………………………… • This interdisciplinary science … is about providing computational support to studies on linking the behavior of cells, organisms and populations to the information encoded in the genomes. – Temple Smith, Current Topics in Computational Molecular Biology (2002) Bioinformatics Bioinformatics • It is about developing and using computational It techniques to techniques – – – – analyze and interpret biological data predict structures and functions of biological entities predict model the dynamic behavior of biological processes and systems …… • People have used mathematical or computing techniques People to solve biological problems since early 1900’s to – e.g., evolution and genetic analyses by R.A. Fisher, J.B.S. Haldane, ane, S. Wright S. • So what is new? A Historical Perspective Historical n • Realization of the existence of “genes” iin our cells by Realization Hermann Müller, a student of Morgan (1921) • Understanding of the physical natures of genes by – – – F. Sanger (e.g., 1949), F. E. Chargraff (e.g., 1950), E. Chargraff (e.g., J. Kendrew (e.g., 1958) in 40’ and 50’s A Historical Perspective Historical • Understanding of the double helical structure of DNA by James Watson and Frances Crick in 1953 • Development of sequencing technology, first of proteins and then of genomic DNA, based on the work of – F. Sanger on sequencing of insulin (1956), – W Gilbert and A. Maxam on sequencing of Lactose operator (1977) which demonstrated that the genetic sequence of a genome, including human’s, is sequence-able! A Historical Perspective Historical • Development of a science of analyzing protein and DNA Development science of sequences, particularly in sequences, – protein sequence analyses and evolution by Margaret Dayhoff Margaret Dayhoff (60’s) – phylogenetic analyses and comparative sequence analyses by W. analyses and comparative Fitch and E. Margoliash (1967) and by R. Doolittle (1983) Margoliash and R. A Historical Perspective Historical • Development of sequence comparison algorithms – Needleman and Wunsch (1970) Needleman Wunsch – Smith and Waterman (1981) • Organization of biological data into databases – Protein Data Bank (PDB, 1973) of protein structures – GENBANK (1982) of DNA sequences • Computational methods for gene finding in genomic Computational sequences sequences – Work by Borodovsky, Claverie, Uberbacher from mid-80’s to Work Borodovsky Claverie Uberbacher to early 90’s early A Historical Perspective Historical • Sequencing of Human and other genomes – (1986 – 2003) • Development of “high-throughput” measurement technologies – – – – microarray chips for functional states of genes two-hybrid systems for protein-protein interactions structural genomics for structure determination ……… ccgtacgtacgtagagtgctagtct agtcgtagcgccgtagtcgatcgt gtgggtagtagctgatatgatgcga ggtaggggataggatagcaacag atgagcggatgctgagtgcagtgg catgcgatgtgatagctagatgtga tcgatggtaggtaggatggtaggt genes A Historical Perspective Historical • These “high-throughput” probing technologies and others are being used to generate enormous amounts of data about the existence, the structure, the functional state, the relationship of biological molecules and machineries ….. • … but what are these data telling us? ….. A Historical Perspective Historical • So what is new? It is the amount & the type of biological data about the cellular states, molecular structures and functions, generated by high-throughput technologies, that have driven the rapid advancement of bioinformatics! An Example of Computation for Biology An • Lactococcus is a premier model microorganism for a wide array of studies in molecular biology, and is nonpathogenic • Streptococcus iis closely related to Lactococcus and could s Lactococcus and become pathogenic upon triggers • Question: What make one pathogenic and the other nonpathogenic? – – – – specific genes? unique pathways? different regulatory mechanisms? …… An Example of Computation for Biology An • X years ago, …. to search for potential genes that possibly make the difference, researchers had to – – remove various parts of DNA sequence, then observe if they may have any relevance x x x x acggtcgtacgtacgtgttagccgataatccagtgtgagatacacatcatcgaaacacatgaggcgtgcgatagatgatcc..... ? ? ? This could be a very lengthy process …… ? An Example of Computation for Biology An • Since the Human genome project (1986), computational scientists have developed computer programs to locate genes in genomic DNA sequence – GRAIL, Gene-Scan, Glimmer, ……. acggtcgtacgtacgtgttagccgataatccagtgtgagatacacatcatcgaaacacatgaggcgtgcgatagatgatcc..... genes • With gene-prediction programs, researchers only need to knock-out regions predicted to be genes in their search for relevant genes An Example of Computation for Biology An • Over the years, many genes have been thoroughly studied in different organisms, e.g., human, mouse, fly, …., rice, … – their biological functions have been identified and documented • Computational scientists have developed computer programs to associate newly identified genes to genes with known functions! – Existing methods can associate > 60% of newly identified genes to genes with known functions • Now, researchers only need to knock-out genes with possibly relevant functions in their search for understanding of a particular biological process …… An Example of Computation for Biology An • Computational programs have been springing out Computational that can predict – – – – – if two proteins interact with each other if a group of gene products work in the same pathway which regulators regulate a particular pathway functions of genes at a genome scale …….. These capabilities allow researchers to study complex biology problems like understanding the difference between Lactococcus and Streptococcus in a more efficient and systematic manner more An Example of Computation for Biology An comparative genome analysis • Identify unique genes in each genome • Identify unique metabolic pathways and regulatory networks … through comparative genome studies Computation for Biology Computation Biocomputing (bioinformatics, computational (bioinformatics, biology), in conjunction with large-scale biobiology), data, facilitates tackling large, complex data, biological problems at systems level, fundamentally changing the science of biology fundamentally Examples of “Computation for Biology” • Suggesting functions of newly identified genes – It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little is known about the disease molecular basis of the disease molecular – Sequence search found that NF1 is homologous to a yeast protein yeast called Ira, which is a GAP-type protein and known to regulate the Ira type function of a second type of protein called Ras Ras – Hypothesis: NF1 regulates Ras in human cell; follow-up NF1 Ras up experiments verified this. experiments NF1 Ira Ras Ras Examples of “Computation for Biology” • Computer-assisted drug design • 3D structure models of G protein-coupled receptors were used to computationally screen 100,000+ compounds as possible drug targets and 100 were selected • Follow-up experiments confirmed a high hit rate of 12%-21% OM Becker, et al, PNAS, 2004, 101:11304-11309 Examples of “Computation for Biology” • Computational studies reveal the functional mechanism of GroEL heptamer Courtesy of JP Ma’s lab Examples of “Computation for Biology” • Computational analysis of Plasmodium falciparum metabolism Computational falciparum – Plasmodium causes human malaria – computational prediction of metabolic pathways of plasmodium – computational simulations have helped to identify 216 “chokepoints” in this pathway model – among all 24 previously suggested drug targets, 21 target at the “chokepoints” – among the three popular drugs for malaria, they all targeted at the “chokepoints” Yeh I. et. al. Genome Research 2004, 14:917-24 Computation for Biology Computation Though computation may not solve a biological problem directly, it can help quickly narrow down the search space Searching a needle in a haystack … Change of Paradigm Change • The human genome sequencing project has led to fundamental The changes in how biological science is done! changes – It represents biology’s first foray into ‘big science’ – Science, editorial, first editorial, 2003 2003 • The coordinated efforts in “high-throughput” production of biological The production data beyond sequences have fueled the rapid transition of biology data from “cottage industry science” to “big science” – – – – – – functional genomic data structural genomic data proteomic data haplotype mapping metabolonic data …… Change of Paradigm Change high-throughput data production data mining and interpretation models or hypotheses domain expertise rational experimental design specific data generation Integrative Biology Integrative • “Howdy, want to do biology together?” “omic” data Examples of Integrated Computational and Experimental Biology and • Identification of disordered regions in proteins • Experimental data suggest that some portions of some protein • structures do not have rigid structures Computational studies suggested that “disorder” is a common phenomena, particularly in signaling proteins, validated experimentally • Computational prediction suggests that protein-protein interaction interfaces are often disordered What Could Potentially be Done glucose-6-P Phosphoenolpyruvate Pi 3 ADP NAD+ Lactate 3 ATP NADH NAD+ NADH • ADP H+ Pyruvate kinase GLYCOGEN • h ld ATP Identify critical pathways in a fermentation pathway and its regulation through dynamic-kinetic simulations Suggest potential targets for genetic engineering through molecular dynamics simulation of enzymes pyruvate CoA NADP+ pflB pdh CoA Formate NADPH CO2 acs CoA ATP ATP k ac ADP AcetylCoA AMP PPi adhE NADH d al CoA NADPH Acetyl phospate Acetaldehyde NADH NAD CoA Pi pat Acetate NADP +Pi • adhE Ethanol NAD Identify more effective subnetworks from other organisms (metagenome analysis) and replace counterpart in our target organism through genetic engineering Computation for Biology Computation • There are increasingly more successful examples of employing There computational techniques to study (or help to study) complex biological problems, in many fronts of biological research • We begin to see computational techniques with predictive We capabilities that can help to generate new hypotheses and guide capabilities that experimental designs Biology for Computation Biology • What computational science is to molecular What biology is like what mathematics has been to physics ...... physics Look into the Future • Like physics, where general rules and laws are taught at the Like general are start, biology will surely be presented to future generations of students as a set of basic systems ....... duplicated and adapted to a very wide range of cellular and organismic organismic functions, following basic evolutionary principles constrained functions, by Earth’s geological history. by – Temple Smith, Current Topics in Computational Molecular Biology Current Take-Home Message Take • The driving force of bioinformatics is biological data The production through “high-throughput” technologies • Computation is becoming increasingly indispensable in Computation biological research biological • Combination of high-throughput data generation and throughput computation allows scientists to look at more complex biological problems at systems level biological Homework Homework • Read “Primer on Molecular Genetics” Read • Find one GOOD example of bioinformatics being used to Find help solve a challenging biological problem. Write a onehelp page essay explaining the biological problem, describing page the role that computation has played in this particular solution, and discussing what if no computational tool is available for this problem. available ...
View Full Document

This note was uploaded on 06/16/2011 for the course BIO 127 taught by Professor Xuyin during the Spring '10 term at Georgetown.

Ask a homework question - tutors are online