15 Gene prediction

15 Gene prediction - Introduction to Bioinformatics/...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
Introduction to Bioinformatics/ Elements of Bioinformatics Gene prediction
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
References Mount D.W. (2004) Bioinformatics: Sequence and Genome Analysis. 2nd ed. Cold Spring Harbor Lab. Press, N.Y. Chapter 9. Baxevanis, A.D., and Ouellette, B.F.F. (2005) Bioinformatics - A practical guide to the analysis of genes and proteins (3rd ed). John Wiley and Sons, NY. Chapter 5. • Mathe C., Sagot M-F, Schiex T., and Rouze P. (2002) Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Research 30: 4103- 4117.
Background image of page 2
Gene prediction • given a DNA sequence – predict the protein coding regions: • which strand is used as template? • which reading frame is used? • which region is protein coding (exon)? – predict the regulatory regions
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Approaches of finding protein-coding regions in DNA sequences Usually start by masking of repetitive DNA – rarely overlap with regulatory region and coding regions • Similarity to other protein coding sequence • Cross-species comparison to identify conserved regions Similarity measures Codon usage bias, hexamer measure, etc Content measures Transcription start and termination signals, splice junctions, start and stop codons, polyadenylation site Signals Examples Approach
Background image of page 4
Prokaryotic genes • Gene structure is relatively simple: – Transcription start and stop sites, translation start and stop sites, ribosome binding sites (Shine-Dalgarno sequences 5'- AGGAGGU). – Promoter signals occur in conserved locations (-10 and -35 sequences).
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Finding open reading frame • For gene predictions in prokaryotic sequences or yeast which has few introns • A long open reading frame suggest a protein coding gene –In E. coli , only <1.8% of genes have fewer than 60 codons
Background image of page 6
analysis of E. coli adhE gene (alcohol dehydrogenase; EMBL ID: ecadheg) to identify coding region. Translation in
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 07/29/2010 for the course BIOC BIOC1805 taught by Professor Dr.brianwong during the Summer '09 term at HKU.

Page1 / 23

15 Gene prediction - Introduction to Bioinformatics/...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online