4. genomes - G e n es and Genomes I. The huma n genome A....

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: G e n es and Genomes I. The huma n genome A. how few genes it contains B. how big the genes are but how little is protein encoding C. how many genes have homologs in other organisms II. Ho w different are w e f r o m on e ano t he r? III. se q ue ncing t h e human genome A. automated DNA sequencing B. shotgun cloning and computational alignment I V. Ho w c o mputer pr og r a ms identify ge n es a . ORF’s b. comparative genomics What have we learned from genome sequencing Figure 4-10 Molecular Biology of the Cell (© Garland Science 2008) Comparing genomes: How many more genes does it take to make a human than a fly? 2008 estimate=20,500 Average gene size in human genome is huge 20,500 But in all organisms, average protein size= 300-500 aa’s What is responsible for the increase in gene size? Are human proteins bigger? Figure 4-15 Molecular Biology of the Cell (© Garland Science 2008) Gene organization: bacterial and archael genomes are densely packed with genes on both strands In bacteria functionally related genes are often clustered and co-transcribed in a polycistronic transcript eg, the Trp and Lac operons. Simplifies regulation. Gene density is low in human genome Intragene distance is greatly increased. Human regulatory elements can be spread out over tens of 10,000s of basepairs Only 1-2% of human genome codes for protein Half the genome is repetitive DNA sequences Our common genetic heritage: basic cellular processes are carried out by roughly same # of genes in E. coli & yeast Specific classes of genes expand in the step from prokaryotes to eukaryotes MCB Fifth edition; Lodish et al. What class of genes would you expect to expand in step from unicellular to multicellular animals? 2006 estimate=25,000 Is your genome my genome? • Humans are 99.9% identical if you look just at single nt sequence=1 in every 1000 bases will differ from generic genome • Genome sequencing missed the largest variations=copy number 1. Structural differences may encompass 4x as much variation 2. Multiple copies of genes or deletion of genes Studies identify 21,000 copy # variations > 1000bp, 17,000 100-1000 bp in length & 499 inversions Sci News (2009) 175:16 Where is the gene? Computational Methods. GGATCCACGAAAATGATGTGAATGAATACATGAAAGATTCATGAGATCTGACAACAT GGTAGACGTGTGTGTCTCATGGAAATTGATGCAGTTGAAGACATGTGCGTCACGAA AAAAGAAATCAATCCTACACAGGGCTTAAGGGCAAATGTATTCATGTGTGTCACGAA AAGTGATGTAACTAAATACACGATTACCATGGAAATTAACGTACCTTTTTTGTGCGTG TATTGAAATATTATGACATATTACAGAAAGGGTTCGCAAGTCCTGTTTCTATGCCTTTC TCTTAGTAATTCACGAAATAAACCTATGGTTTACGAAATGATCCACGAAAATCATGTT ATTATTTACATCAACATATCGCGAAAATTCATGTCATGTTCACATTAACATCATTGCAG AGCAACAATTCATTTTCATAGAGAAATTTGCTACTATCACTCATTAGTACTACCATTGG TACCTACTACTTTGAATTGTACTACCGCTGGGCGTTATTAGGTGTGAAACCACGAAA AGTTCAACATAACTTCGAATAAAGTCGCGGAAGAAAGTAAACAGCTATTGCTACTCA AATGAGGTTTGCAGAAGCTTGTTGAAGCATGATGAAGCGTTCTAAACGCACTATTCA TCATTAAATATTTAAAGCTCATAAAATTGTATTCAATTCCTATTCTAAATGGCTTTTATT TCTATTACAACTATTAGCTCTAAATCCATATCCTCATAAGCAGCAATCAATTCCATCTAT ACTTTAAAATGCTTTCTGAAAACACGACTATTCTGATGGCTAACGGTGAAATTAAAG ACATCGCAAACGTCACGGCTAACTCTTACGTTATGTGCGCAGATGGCTCCGCTGCCC GCGTCATAAATGTCACACAGGGCTATCAGAAAATCTATAATATACAGCAAAAAACCA AACACAGAGCTTTTGAAGGTGAACCTGGTAGGTTAGATCCCAGGCGTAGAACAGTT TATCAGCGTCTTGCATTACAATGTACTGCAGGTCATAAATTGTCAGTCAGGGTCCCTA CCAAACCACTGTTGGAAAAAAGTGGTAGAAATGCCACCAAATATAAAGTGAGATGG AGAAATCTGCAGCAATGTCAGACGCTTGATGGTAGGATAATAATAATTCGTGCAACG Identification of ORFs: how will a eukaryotic ORF differ from a prokaryotic OF? =stop codon =ORF In eukaryotes, ORFs are harder to identify Comparative genomics=most powerful tool for identifying exons Point mutations Deletions or insertions What do you notice about the distribution of nucleotide differences between the aligned genes? Why? All parts of the genome are subject to mutation, but not all parts are subject to natural selection Sequences that do not encode protein (or do not function in gene expression) are not under selective pressure & diverge more rapidly Comparing homologous genes, exons stand out as islands of conservation is a sea of random sequence Other signatures of genes? GU Point mutations Deletions or insertions What about all the non-coding DNA? Is it inert? Can we simply disregard it? We are increasingly being forced to pay attention to non-gene DNA sequences • conserved, but not coding • not coding, but transcribed into RNA • mutations that are associated with various human diseases map in these M obi l e Genetic Elements remodel the Genome I. DNA-only transposons II. Retroviral-like retrotrans poso ns a . Yeast Ty elements look like integrated retroviruses b. mechanism of movement III. Nonviral R etrotrans poso ns (p o ly A-retrotrans poso ns ) a. LINES (long interspersed nuclear element) b. SINES c. mechanism of movement I V. fgf 4 retroge n e Mobile genetic elements are DNA sequences that can move from one place in the genome to another DNA-only transposable elements Nondefective elements encode transposase, an enzyme required for the DNA recombination events that result in movement DNA-only transposons move by a cut and paste mechanism Ds break repair The IS sequences at the ends of transposable elements are necessary and sufficient for the element to move. Transposase recognizes and cleaves the DNA at these sites. In eukaryotic cells, including humans, most mobile elements are retrotransposons Retrotransposons are retroviruses sans capsid Viral integrase Short direct repeats Gag gene Env gene Pol gene Comparing retroviral genomes to retrotransposons HIV LTR RT, integrase Yeast Ty element Reprinted from Molecular Biology of the Cell, Alberts et al. Transposases & retroviral and retrotransposon integrases are all related proteins with common catalytic domain Ds break repair DNA binding domains of 3 proteins are different; so each enzyme only catalyzes recombination on its own element Non-retroviral retrotransposons dominate human genome LINES and SINES account for 34% of human genome LINE-long interspersed nuclear element. SINE=short Non-retroviral retro-transposons move via RNA intermediates General structure of the L1 LINE element L1 elements are 6-7 kb in length 600,000 copies of this element in the human genome=15% of genome •ORF1 encodes an RNA binding protein •ORF2 encodes a protein with both RT and endonuclease activity •90% of L1 elements are missing variable amounts of sequence from left end of element. •Most L1’s are immobile. L1 transposition requires an endonuclease and a RT Chrondroplasia in short legged dogs is linked to chromosome 18 Science (2009) 325: 995-7 The new FGF4 gene is inserted ~30Mb from the native locus. It lacks its native promoter and introns, but encodes a polyA tail. How did this novel gene originate? H G Parker et al. Science 2009;325:995-998 Published by AAAS What if ORF1/2 proteins bound to a cellular RNA? What characteristics would distinguish a retrogene from original gene? Experimental set up to test if yeast Ty elements move via an RNA intermediate =bacterial DNA sequence not found in other Ty elements Southern blot technique 2. 3. Probe=ss DNA 1. + EcoRI 4. Southern blot using a probe that hybridizes to red line sequence not found in other Ty elements in genome Why can’t you see any bands in the genomic DNA of colonies grown on glucose, and lots of bands when cells are grown on gal? ...
View Full Document

This note was uploaded on 12/05/2011 for the course BIO 344 taught by Professor Herrin during the Spring '08 term at University of Texas.

Ask a homework question - tutors are online