Unformatted text preview: Genomics II – Functional Genomics, Proteomics, and Bioinformatics
Introduction Functional genomics strives to understand gene function in a species. The study of the proteins is called proteomics. The entire collection of proteins in a given species is called the proteome. Functional genomics and proteomics can broadly be categorized as being experimental and computational. Bioinformatics attempts to extract information within genetic sequences using a mathematical approach. Functional Genomics Expressed genes can be identified in a cDNA library.
make DNA from mRNA if mRNA is present -> gene is turned on cDNA libraries are made using RNA as the starting material. This is also called an expressed sequence tag (EST) library because the sequences can be used as markers in physical mapping. -what can we see at the c ellular level if you take cDNA libraries can be used to study gene regulation at the genomic level. a hormone? -take cells and culture mRNAs are isolated under different conditions. t hem in presence of s ubstance and in mRNAs that are found only under a set of conditions indicates genes that are absence of substance activated under those conditions. - hormone may be an effector that is causing activation of these A subtractive cDNA library (also called subtractive hybridization) is shown in Figure genes 21.1. This procedure can be used to isolate genes that are active under a certain set of - if we use reverse conditions. t ranscriptase we can c reate cDNA of all A genes expressed microarray can identify genes that are transcribed. including the hormone DNA microarrays, or gene chips, make it possible to monitor the expression of induced -denature cDNA from thousands of genes simultaneously. both cells and run t ogether in same tube A microarray is a microscope slide that is dotted with many different sequences of - make a column and DNA. The location of each gene is recorded. run cDNA's (comp DNA attaches to beads in The microarray can be used as a hybridization tool (Figure 21.2) to identify genes that c ol) -Elute cDNA's not are related to the probe being used. c omplementary - c lone into vectors and Microarrays have found a wide variety of applications in the study of functional we can see what genes genomics (Table 21.1). are expressed under -make complementary ﬂ uorescently labeled DNA (making micro array from our cDNA) t hose conditions -we know what gene is in each well 1 -add ﬂ uorscent cDNA to each well, if the ﬂ uorescent binds to the DNA in the well (mRNA) then that particular gene was expressed under those conditions because we washed it after exposing to ﬂ uorescent cDNA so the mRNA s hould bind to it if the gene in that well was expressed t ake some yeast cells microarray grow in 2% glucose up to 21 hours start taking samples at 9 hrs and every 2 hours after that -isolate mRNA follow it through time f rom time when cells ﬁ rst get hit with glucose Experiment 21A. The coordinate regulation of many genes is revealed by a DNA analysis. Cells respond to their environment via the coordinated regulation of genes. One of the earliest uses of microarrays was the study of the yeast Saccharomyces cerevisiae. S. cerevisiae has approximately 6,300 genes. This organism has the ability to metabolize carbon sources using different metabolic pathways. When yeast cells have glucose available, they metabolize the glucose to smaller products during glycolysis. If oxygen is present, these products can be broken down via the tricarboxylic acid cycle (TCA). The process of switching from glycolysis to the TCA cycle, called a diauxic shift, involves major changes in the expression of genes involved with carbohydrate metabolism. The goal. The goal of the experiment was to identify genes that are induced and repressed as yeast cells shift from glycolysis to the TCA cycle. Achieving the goal (Figure 21.3). Inoculate yeast cells into media containing 2% glucose and grow for up to 21 hours. Beginning at 9 hours after inoculation, take out samples of cells every 2 hours and isolate mRNA. Add reverse transcriptase, poly-dT primers, and fluorescently labeled nucleotides to make complementary strands of fluorescently labeled cDNA. Note: The sample at 9 hours was used to make green cDNA, while mRNA samples collected later were used to make red cDNA. goal: trying to see what genes are turned on at any particular t ime. if turned on early = green if turned on later = red if turned on early and later = yellow not expressed = clear TALKING ABOUT LIGHT ---> NOT PAINT! RED + GREEN = YELLOW At each time point, mix together the cDNA from that time (i.e., red cDNA) with cDNA from the 9-hour time point (i.e., green cDNA). Hybridize the mixture to the yeast DNA microarray. Examine the DNA microarray with a laser scanner and analyze the data by computer to determine expression levels among different genes (cluster analysis).
cluster analysis - groups by color intensity and by when the switch is happening as you increase time, it is going to make more red... which is why it shows the slopes going up at 15-17 hours for the genes on the left and2 genes on the right are being expressed closer to 19 the hours genes on left are controlled by he same transcrption factors roup of genes being regulated c oncurrently and has to do with common metabolic The resonse to the induction of TCA cycle enes on right is another group being regulated concurrently for genes repressed: eft: genes are being turned off at jxn between glycolysis and TCA (diauxic) right: being turned off after TCA cycle data and interpreting the data. In the microarray, green spots are genes expressed early in growth, while red spots are genes expressed later. Yellow spots are genes that are expressed more evenly, and spots that are barely visible indicate genes that are not substantially expressed under these growth conditions. A key component of the analysis of the results is a procedure called cluster analysis, which involves identifying genes whose pattern of expression strongly correlates with one another (Figure 21.3a, b, c, d). The results, when cluster analysis is applied, suggest that certain genes have common regulation and metabolic function. Their results clearly indicated that some genes are repressed during this process, while others are induced. DNA microarrays can be used to identify DNA-protein binding at the genome level. Chromatin immunoprecipitation (ChIP) can determine whether proteins can bind to a particular region of DNA (Figure 21.4). ChIP analyzes DNA-protein interactions as they occur in chromatin of living cells, whereas gel retardation and DNaseI footprinting are in vitro techniques.
if turned on, crosslinks protein to it, keeps it on when active if not turned on, nothing to c ross link, so its "free" we're interested in A particular protein so we have an antibody that matches that particular protein ---> add it -> attaches to that protein we are looking for can isolate the DNA that has t he protein attached to it that we're interested in; all other DNA doesn't precipitate; break covalent crosslinks do PCR and amplify if PCR ampliﬁ es --> protein was boudn to the DNA site in living cells ChIP uses a chemical crosslinking agent such as formaldehyde to covalently link proteins to their associated DNA. Cells are then lysed and the DNA is broken into pieces of approximately 200 to 1,000 bp by sonication. Antibodies to the protein of interest are used to selectively precipitate the proteinDNA complex. The sample is centrifuged to collect the DNA-protein complexes in a pellet. The covalent crosslinks are broken. If researchers already suspect that a protein binds to a known DNA region, the precipitated DNA fragments are amplified by PCR using primers that flank that region of the DNA. If PCR amplifies the DNA, the protein must have been bound to this DNA site in living cells. To identify all genomic binding targets of a protein, the precipitated DNA fragments are amplified and all amplified DNA is fluorescently labeled and associated with a microarray for identification. This is known as a ChIP-on-chip assay.
is there more than one stretch of DNA with this? precip DNA fragments (all are ampliﬁ ed); on the ends of the DNA we can ligate short pieces of DNA called linkers ( we make it and we know what the 3 s eq is) - go to micro array and see if we light up one gene or more than one gene... etc Proteomics General information Proteomics is the study of the functional role of proteins in an organism. The proteome is the sum of all of the proteins in an organism. The study of genomics represents only the first step in understanding the proteome. Homology between the genes of different species can be used to predict protein function and/or structure. But it may not provide information on regulation and proteinprotein interactions. The proteome is much larger than the genome. The size of the proteome is larger than the genome due to: alternative splicing RNA editing posttranslational covalent modification Two-dimensional gel electrophoresis is used to separate a mixture of cellular proteins.
won't move in an electric ﬁ eld if they are neutral Any given cell in a eukaryotic organism produces only a subset of the proteins in the proteome. This subset is dependent on the type of cell, the stage of development, and the environmental conditions. Two-dimensional gel electrophoresis is frequently used to separate and identify proteins (Figure 21.5). The proteins are first separated by their net charge. Proteins migrate to a point in the gel where their net charge is zero. This is called isoelectric focusing. The proteins are then separated by their molecular mass. *based on side groups of amino acids isoelectric focusing--pH gradient Mass spectrometry is used to identify proteins. After running a two-dimensional gel, it is often necessary to correlate a given spot on the gel with a particular protein. The spot can be cut out and the protein identified. Mass spectrometry can be used to determine the amino acid sequence of a protein. Tandem mass spectrometry uses two spectrometers. The first measures the mass of the peptide. The peptide is then digested into smaller fragments and analyzed by a second spectrometer (Figure 21.6). The difference in masses between the first and second spectrometers can be used to determine the amino acid sequence. 4 isolate mRNA, make cDNA out of it... see what is s ticks to.... if gene is t urned on it is making mRNA---> DNA microarrays Protein microarrays can be used to study protein expression and function. Protein microarrays are prepared in a similar manner to DNA microarrays, except that the individual spots on the slide contain proteins rather than DNA.
->>> CON: The construction of protein microarrays is technically more difficult due to the need to preserve the three-dimensional structure of a protein. Table 21.2 lists some of the applications of protein microarrays. proteins will change T s tructure when you isolate he two common types of protein t hem... take them out of the functional protein microarrays. living structure we can have a whole bank of antibodies that represents the proteome of t he organism but they are antibodies to the proteins instead put antibodies on microarray, isolate and label c ellular proteins can expose microarray to antibodies (radiolabeled) if t he antibody for a particular protein is in that well, they bind together, and if its lit up then the protein reacted with antibody microarray analysis are antibody microarrays and Antibody microarrays aid in the study of protein expression. Antibodies are spotted onto a microarray. Cellular proteins are then isolated, labeled, and exposed to the microarray. The cellular protein will then be bound to an antibody on the microarray. Functional protein microarrays involve isolating cellular proteins and spotting them onto the microarray. This microarray can be analyzed with regard to specific kinds of protein function. For example, proteins in a microarray can be assessed against a group of protein kinases, which phosphorylate other cellular proteins. The array is exposed to each kinase in the presence of radiolabeled ATP. By following the incorporation of phosphate into the array, the specificity of each kinase can be determined. Bioinformatics Sequence files are analyzed by computer programs. to get as many proteins from the proteome we are trying to study; can use to learn about function; in that particular spot was ATP taken up? if taken up then protein kinase was active for enzymatically putting ATP on that particular protein... can look at different kinases and see which proteins it phosphorylated; what do those proteins have in common structurally/ functionally? A computer program is a defined series of operations that can analyze data in a desired way. Computer data files store the data. These files may contain DNA, RNA, or amino acid sequences. A series of computational operations is called an algorithm. These can be designed by scientists to answer specific questions about the data (Figure 21.7). 5 These questions include the following: Does a sequence contain a gene? Where are functional sequences such as promoters, regulatory sites, and splice sites located within a particular gene? Does a sequence encode a polypeptide? If so, what is its amino acid sequence? Does the sequence predict certain structural features for DNA, RNA, or proteins? Is a sequence homologous to any other known sequences? What is the evolutionary relationship between two or more genetic sequences?
end The scientific community has collected sequence files and stored them in large databases. A database is a collection of a large number of computer data files. These files are annotated; they contain descriptions of the contents of the file. Some examples of major genetic databases that contain genetic information from many different species are provided in Table 21.3. genome database for a particular species Genome databases focus on the genetic characteristics of a single species. The goal is to organize the information from sequencing and mapping projects for a single species. Different computational strategies can identify functional genetic sequences (Table 21.4).
looking for things in a long string of nucleotides t hat will tell us something Computer programs can be used to scan very long sequences of genetic information and locate meaningful features within the sequences. These programs use three general types of strategies. Locate specialized sequences within a very long sequence (sequence recognition). The program has the information that a specific sequence of symbols has a specialized meaning or function (e.g., promoters, start codon, stop codons, etc.). A specialized sequence with a particular meaning or function is called a sequence element or motif. Locate an organization of sequences, such as an organization of sequence elements (e.g., a start codon followed by one of the stop codons). Locate a pattern of sequences. Pattern recognition looks for patterns in the symbols (e.g., palindromes) and not a specific sequence. 6 lets look for some s equences that when put t ogether tell me s omething look for: promoter seq, s tart, stop, splice s equences --> tell us its a s tructural gene looking for some part. pattern (start, stop, RFLP, palindromes) Several computer-based approaches can identify structural genes within a nucleotide sequence. Computer programs can employ different strategies to locate genes.
relies on known sequences look for sequences that code f or particular amino acids...look at all of the c odons, but i want to look for c odons for a particular amino acid, take all of those codons f or a particular amino acid (say 4 of them)... and see if t hey are in equal proportions t hroughout a genome. Search by signal approaches rely on known sequences such as promoters, start and stop codons, and splice sites to see if a DNA sequence contains a structural gene. Search by content strategies identify sequences with a nucleotide content that differs significantly from random distribution. Most organisms display a codon bias, meaning that some codons are used more often than others. Another mechanism is to examine translational reading frames. In a new sequence, the reading frame may begin with the first, second, or third nucleotide. An open reading frame (ORF) is a region of a nucleotide sequence that does not contain any stop codons. In prokaryotes, long ORFs are contained within the chromosomal gene sequences. In eukaryotes, ORFs may be interrupted by introns. One way around this is to clone and sequence cDNA. An optional method is to use a computer program to translate the DNA sequence into all three reading frames (Figure 21.8). start at different places. if y ou get a lot of stop codons, y ou probably didn't start at t he beginning problem: take that sequence, remove introns, make mRNA, make cDNA Computer programs can identify homologous sequences.
ﬁ rst sequence to look for that DNA was small subunit of rRNA... make a phylogenetic tree lac Y in E. Coli and K. Pneumonia have 78% same sequence information may be used to study evolutionary relationships. Homologous genes are similar genes that are derived from the same ancestral gene. An example is shown in Figure 21.9. gene was same at speciation Homologous genes that are found in different species are called orthologs. when it broke into two s pecies, and over time with mutations they obtain Two or more homologous genes that are found in the same organism are called paralogous genes or paralogs. differences paralogous -- between genome of a single organism. humans, chimps, gorillas c ompare a gene within same H s pecies (if we go back to omology implies a common ancestry, whereas similarity means only that two s s ome common ancestor)equences are similar to one another. For example, many nonhomologous bacterial A gene family consists of two or more copies of homologous genes within the genes contain similar promoter sequences at the -35 and -10 regions. 7 A simple dot matrix can compare the degree of similarity between two sequences. A matrix can be used to determine the degree of similarity between two sequences. This method is shown in Figure 21.10. Computer programs can align several genetic sequences in what is called a multiple sequence alignment. An example using the globin family members is shown in Figure 21.11. A database can be searched to identify homologous sequences. There is a strong correlation between homology and function. The ability of computer programs to identify homology between genetic sequences provides a powerful tool for predicting the function of genetic sequences. The BLAST program (basic local alignment search tool) starts with a particular genetic sequence and then locates homologous sequences within the database. An example is provided in Table 21.5. Genetic sequences can be used to predict the structure of RNA (Figure 21.12) and proteins. The function of macromolecules such as DNA, RNA and proteins relies on their structure. The structure is determined by the sequence of their building blocks. An analysis of sequences can help identify structures in these molecules. A computer methodology known as a neural network has been applied to protein secondary structure predictions. A computer neural network is a large number of calculation units organized into interconnected layers; this structure is reminiscent of the organization of neurons in the brain. Please see the Conceptual and Experimental Summary for Chapter 21 on pages 594-595.
This lecture outline was prepared from Genetics: Analysis and Principles, by Brooker, 2009 (3rd edition). It contains phrases and entire sentences taken verbatim from that source, and is in no way meant to represent original work by Mark Bierner. 8 ...
View Full Document