ferrin-infoviz

Ferrin-infoviz - Informatics and Visualization Tools for Structural Genomics Research Tom Ferrin Ph.D Departments of Pharmaceutical Chemistry and

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Informatics and Visualization Tools for Structural Genomics Research Tom Ferrin, Ph.D. Departments of Pharmaceutical Chemistry and Biopharmaceutical Sciences Resource for Biocomputing, Visualization, and Informatics University of California, San Francisco UCB InfoViz seminar April 14, 2004 Resource for Biocomputing, Visualization, and Informatics The RBVI is a NIH/NCRR Biomedical Technology Research Center We create innovative computational and visualizationbased data analysis methods and algorithms, turns these into easy-to-use software tools, and apply these tools for solving a wide range of genomic and molecular recognition problems within the complex sequence structure function triad Application areas Gene characterization and interpretation Drug design Variation in drug response due to genetic factors Protein engineering Biomaterials design Bioremediation Prediction of protein function from sequence and structure “It’s sink or swim as a tidal wave of data approaches” Petabyte (1,000 terabytes) Exabyte (1,000 petabytes) Zettabyte (1,000 exabytes) Yottabyte (1,000 zettabytes) Tony Reichhardt Nature 399:517-520 10 June 1999 “Many biologists are still in denial, never having faced the amount of information now pouring into databases such as Genbank and SwissProt… They haven’t really thought about how they’re going to use all this data...” Ibid. The Growing Gap in Functional Knowledge 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 Rapid DNA sequencing invented EST sequencing begun 500,000 0 1975 1980 1985 Publications 1990 1995 DNA sequences Sample RVBI projects •New methods for large-scale data collection, storage, analysis, and presentation for polymorphism (SNP) genotyping project •Extensible visualization tools for comparative studies of protein sequence, structure, and function Fortune - Oct 30, 2000 Case Report #1: Michael Adams-Conroy Young child born to abusive mother, adopted at age 3, with signs of fetal alcohol syndrome, obsessive-compulsive disorder, Tourette’s syndrome, and attention-deficit hyperactivity disorder. Prescribed Prozac to help control emotional outbursts. Child dies suddenly; toxicology tests show massive overdose of Prozac. Adoptive parents investigated for homicide and their other two children put into protective custody. Michael Adams-Conroy (continued) Sharp-eyed psychiatrist notices unusually high levels of other metabolites in toxicology report, indicating child may have had an enzyme deficiency inhibiting Prozac from being metabolized normally. Subsequent genetic testing showed child had defect in 2D6 gene which resulted in abnormal liver enzyme that metabolizes antidepressants. Adoptive parents exonerated. Case Report #2 Patient: 3-year old boy Diagnosis: Acute Lymphoblastic Leukemia (ALL) Standard therapy: 6-mercaptopurine (6-MP) Result: Adverse Drug Reaction leading to acute bone marrow suppression Normal Mechanism of Action S 6-MP TPMT CH3 S 6-THIOGUANINE NTs 6-METHYLMERCAPTOPURINE THIOPURINE METHYLTRANSFERASE (TPMT) GENES ARE DEFECTIVE IN 1:300 PEOPLE TPMT X TPMT This leads to elevated levels of Thioguanine Nucleotides S 6-MP X TPMT CH3 S 6-THIOGUANINE NTs ↑ ↑ ↑ 6-METHYLMERCAPTOPURINE PEOPLE DIFFER IN THEIR RESPONSE TO DRUGS NO RESPONSE THERAPEUTIC RESPONSE ADVERSE DRUG REACTION (ADR) TESTING FOR TPMT GENES IS NOW AVAILABLE CHILDREN WITH DEFECTIVE TPMT GENES SHOULD RECEIVE A LOWER DOSE OF 6-MP Adverse Drug Reactions ADRs may kill 30,000 - 40,000 Americans each year and cause 2,200,000 serious nonfatal reactions. JAMA 1998 June 3;279(21):1684 Drugs with known genetically-linked potential for fatal adverse reactions (partial list): Drug (Brand Name) Perscribed For... Adverse Reaction Gene at Cause Imipramine (Tofrannil) Depression, ATD Heartbeat irregularity CYP2D6 Isoniazid (Laniazid) Tuberculosis Liver toxicity NAT2 Warfarin (Coumadin) Preventation of blood clots Internal bleeding CYP2C9 5-fluorouracil (Adrucil) Cancer Severe immune suppression DPD Clarithromycin (Biaxin) Antibiotic Heartbeat irregularity KCNE2 Azathioprine (Imuran) Rheumatoid arthritis Severe immune suppression TPMT Pharmacogenetics of Membrane Transporters $12-million, 4-year NIH grant • Kathleen Giacomini and Ira Herskowitz, co-PIs, plus ~20 other UCSF researchers Major Project Goal: • Understand the genetic basis for variation in response to drugs which interact with membrane transporters. This class of proteins is of great pharmacological importance, as it provides the target for about 30% of the most commonly used prescription drugs and is a major determinant of the absorption, distribution and elimination of many others. PMT project goals - continued •Determine the amount of genetic variation (singlenucleotide polymorphisms) in at least 40 transporter genes by examining the DNA from an ethnically diverse sample of 250 people. •Test the performance of these transporter variants in cell cultures and determine, through clinical phenotype studies, if people with those variants respond differently to drugs in a clinically significant way. •Provide access to the data from these studies to the general scientific community through the World Wide Web to facilitate collaborative research and to speed development of new drug treatments. The Corriel Cell Collection African American (AA) - 100 Caucasian (CA) - 100 Asian American (AS) - 30 Mexican American (ME) - 10 Pacific Islander (PA) - 7 TOTAL - 247 PMT Intranet Website Used by ~100 researchers at UCSF Effective data analysis and display driven by iterative design/refinement cycle, successful because the bioinformatics team works closely with the molecular biologists • Jill Mesirov, Whitehead: “Bioinformatics needs to be tightly integrated with the scientific research, not a service function” Flexibility key! • Multiple ways to display same data • Simple download mechanism for scientists who want to load raw data into Excel spreadsheets PMT Scientist-Users Are a Demanding Bunch… You’ve now seen DNA and AA sequences What about structure? Why is Structure Important? Sequence Structure Function • Current research areas: - Prediction of structure from sequence - Prediction of function from sequence and structure - Understanding evolutionary changes - Engineering proteins for specialized function • Applications in pharmacogenomics … - Improvements in drug discovery and development process - Prediction of drug response - Avoidance of toxic side effects Growth in Protein Structures The Structural Genomics Initiatives “The next step beyond the human genome project” $150 million in NIH grants to establish 9 U.S. centers • Goals: - Speed the determination of three-dimensional atomic-scale maps of proteins - 35,000 structures by 2005 - Identify all proteins expressed in an organism - “proteomics” Center NY Struct. Genomics Res. Consortium Northeast Struct. Gen. Consort. Southeast Collab. for Struct. Gen. Berkeley Struct. Genomic Center Joint Ctr. for Struc. Genomics TB Struct. Genomics Consortium Midwest Ctr. for Struct. Genomics Ctr. for Eukaryotic Struct. Genomics Struct. Gen. of Pathogenic Protozoa Lead Institution Rockefeller Univ. Rutgers Univ. Univ. of Georgia Lawrence Berkeley Lab. Scripps Research Inst. Los Alamos Nat. Lab. Argonne National Lab. Univ. of Wisconsin Univ. of Washington See http://www.nigms.nih.gov/funding/psi.html for additional information Target Bacteria/yeast/human Roundworm/fly/human Bacteria/roundworm/human Bacteria Roundworm/human Tuberculosis Archaea/bacteria/eukarya Arabidopsys thaliana Protozoans Stereo pairs ? Visualizing 3D Structure: The Chimera Molecular Modeling System Chimera is an extensible interactive 3-D modeling system designed to allow developers to quickly incorporate novel algorithms and analysis tools • ~30 extensions written to date • Extensions are written in the Python programming language - Easy to learn, even for novice programmers - Offers object-oriented language features • Extensions can control standard user interface features (e.g. camera, help, menus, toolbar) as well as their own custom interfaces Sample Chimera Extension Multalign Viewer • simultaneous display of protein sequence and structure Chimera Demo Tools for Comparative Protein Studies MinRMS - exhaustive search for all plausible structural alignments of two proteins AlignPlot - interactive exploration of structural alignments MultAlign Viewer - integrates sequence and structure space Chimera - extensible 3-D molecular modeling system Example Study Structural comparison of glutamine synthetase (GS) and creatine kinase (CK) • GS: 468 residues, PDB entry 2gls • CK: 380 residues, PDB entry 1crk • No significant sequence similarity, both have multimeric forms, proposed similar tertiary structures, and catalyze similar reactions GS and CK catalysis O- O O- O P O O O- O - O O P O O- NH2 O + ATP + H3 N NH3 + ADP - Pi NH2 - - O O O O O O- O Glutamate O- CH3 + NH2 N + Mg ATP + H N P O O O O Glutamine CH3 - NH3+ NH2 NH2 N + + H - NH2 Creatine O - O NH2 Phosphocreatine O- + MgADP + + H Glutamine synthetase and creatine kinase After MinRMS alignment Glutamine synthetase Creatine kinase Chimera’s GUI AlignPlot GUI Resulting structure-based sequence alignment INPUT All known protein structures All known protein sequences Lists of small ligands All known protein interactions For as many proteins as possible: Binding sites for ligands and proteins Binding affinities Geometries of complexes Database Graphical User Interface Web server Cross-links Collaborations Service Education Structural genomics Functional annotation Macromolecular assemblies Cellular networks Comparative proteomics Pharmacogenetics Drug discovery IMPACT INTERFACE OUTPUT Center for Computational Proteomics Research Genome-Wide Mapping of Protein Interactions All known protein structures All known protein sequences Ligand-Protein Docking Pipeline Protein-Protein Docking Pipeline D.a D.b Comparative modeling D.1 Sali Refine protein models D.2 Jacobson Identify ligand binding sites on models Lists of small ligands Babbitt, Kortemme, Sali Virtual ligand libraries D.4 Shoichet D.3 All known protein interactions Sali D.7 Identify protein binding sites on models Babbitt, Sali D.8 Annotated protein structure models Build ligand-protein complexes Shoichet Rescore ligand-protein complexes Dill, Jacobson, Babbitt Build protein-protein complexes D.9 D.5 Baker, Sali D.6 Specificity modeling of protein interactions Kortemme D.10 Central database D.11 Ferrin, Sali Graphical User Interface D.12 Ferrin Computer Science and Software Engineering Global optimization Software backplane Ferrin D.13 Information navigation and search D.16 Hearst Testing and Applications Dill, Rosen Testing D.14 Cluster hardware and software environments D.17 IBM, Intel, Ferrin, Sali Sali, Shoichet D.15 Computational applications D.18-20 Babbitt, Sali, Shoichet Summary We are in the midst of a profound and exciting new era in bioinformatics and computational biology The data made available by the various genome and structural genomics projects will occupy researchers for decades to come High performance computing and the internet play a critical role in the navigation, analysis, and dissemination of this data and the resulting scientific knowledge The tremendous volume of data makes for a critical need for tools and techniques that make information navigation easy The potential impact on drug development and treatment of human disease is enormous Acknowledgements Collaborators & Staff • Dr. Conrad Huang, Dr. Elaine Meng, Prof. Patricia Babbitt, Prof. Kathy Giacomini, Greg Couch, Eric Pettersen, Al Conde, Tom Goddard, Susan Johns, Doug Stryke, Michiko Kawamoto NIH National Center for Research Resources • P41-RR01081 National Institute of General Medical Sciences • GM61390 Additional information RBVI: www.rbvi.ucsf.edu PMT project: www.pharmacogenetics.ucsf.edu Chimera: www.cgl.ucsf.edu/chimera CCPR: www.computationalproteomics.org ...
View Full Document

This note was uploaded on 02/13/2012 for the course CS 91.510 taught by Professor Staff during the Fall '09 term at UMass Lowell.

Ask a homework question - tutors are online