12.06.2011 - Genome Basics Lecture 25 Eukaryotic genomes...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Genome Basics: Lecture 25! Eukaryotic genomes December 6, 2011! Genomes and Genomics! Pt. I (Ch. 14)! Determining genome sequence Identifying genes within raw genomic sequence Eukaryotic Genome Structure: Eukaryotic genomes are much less compact than prokaryotic genomes Eukaryotic genomes possess large amount/ regions of DNA that do not encode proteins 70% of the yeast genome encodes protein; less than 3% of the human genome encodes protein The C-Value Paradox: C-value = DNA content Genome size does not always equate with organismal complexity Salamander genome is approximately 20 times larger than the human genome There were initially 2 projects to sequence the human genome Traditional Whole-Genome Shotgun Sequencing: Steps In Obtaining Genome Sequence contig -means short sequences, the term is used very loosely "reads" Generate genomic library: collection of genomic DNA fragments (from restriction enzyme digestion) cloned into vectors (plasmids, artificial chromosomes) collectively representing the entire genome Sequence individual clones Paired-end reads Paired-End Reads Paired-end reads from multiple inserts Paired-end reads can be used to join sequence contigs into a single ordered and oriented scaffold: DNA primer Sequences of overlapping reads are assembled into sequence contigs you're going to want to find some vector that matches the sequence Whole-Genome Shotgun Sequencing Assembly: iClicker Question 1:! Large regions (e.g., 10 kb) of tandemly repeated DNA are problematic when determining genomic sequence. Which of the following artificial DNA vectors would you expect to be most helpful in sequencing these regions and in placing these repeated regions with respect to other chromosomal DNA sequences? E: cosmids accommodating up to 30-40 kb DNA inserts and Bacterial artificial chromosomes (BACs) accommodating DNA sequence up to 300 kb -here you have diff. contigs and you're trying to find paired-end reads An alternative to bridging sequence gaps through use of genomic libraries An alternative to bridging sequence gaps through use of genomic libraries Paired-end reads for high-throughput sequencing can be produced without genomic library construction Paired-end reads for high-throughput sequencing can be produced without genomic library construction Genome Content: The Personalized Genome: Transposable elements As of 2008, the cost of sequencing the genome of an individual was approximatley $1 million The NIH has established funding to advance research towards decreasing this cost to $1,000 Repetitive DNA Non-protein-coding genes Protein-coding genes Centromeric and telomeric sequences Identifying Protein-Coding Genes Within Genomic DNA Sequence Open Reading Frame (ORF) detection Evidence from cDNA sequence Predicting protein-DNA binding sites Similarity searching (BLAST) Open Reading Frame (ORF) Detection: ORF: protein-coding sequence beginning with an ATG start codon and ending with an in-frame stop codon. A very large number of possible open reading frames in genomic DNA (Discussion sections this week) Gene-finding from codon bias Evidence For the Identification of Genes by cDNA Sequence: iClicker Question 2:! Consider the method by which cDNA is generated in the lab. Also note that cDNA sequences are often “incomplete” or partial representations of the parent mRNA template. Keeping this in mind, would you expect the first 20 codons of a gene or the last 20 codons of a gene to be more frequently represented in cDNA sequences and ESTs? Expressed Sequence Tags (ESTs): short sequence reads of cDNA from sequencing only the 5’ or 3’ ends priming off of the 3' end of mRNA, oftentimes, the mRNA is very easily degraded,so it isn't complete. So, you usually end up with a better rep of the 3' end. SO you're more likely to get the last 20 codons of a gene represented. Sequence Similarity Searching: BLAST (Basic Local Alignment Search Tool): computational tool for identifying related nt or AA sequences Predicting ProteinDNA Binding Sites: A database of nt or AA sequences is searched for sequences similar to a “query” sequence (e.g., putative gene) If the putative gene is similar to a known gene in another organism (ortholog), this provides evidence that the putative gene is a real gene. The sequence similarity can also be used to infer gene function. Predicting Genes Based on Codon Bias: Synonymous codons: multiple codons for a single amino acid Codon bias: certain synonymous codons are used preferentially within proteincoding genes in a given organism. Cys UGC (73%) UGU (27%) (D. melanogaster) Each pattern of codon bias is organismSpecific; all eukaryotes exhibit codon bias ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online