GFF - ComputationalBiology: Genomeannotationformats...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
Computational Biology: Genome annotation formats October 2004 Ian Holmes Department of Bioengineering University of California, Berkeley From an original lecture by Irmtraud Meyer
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Overview:   What is genome annotation?  In which format can a genome annotation be saved to files?  Definition of the gff genome annotation format  Other genome annotation formats  Application: evaluating the performance of a gene prediction     program  Exercises
Background image of page 2
What is genome annotation ?   genome annotation  is the localisation of functional elements in a   genomic sequence  For example: the location of  protein coding genes tRNA and other RNA genes promoters ...
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Example 1: protein coding genes   Unannotated DNA Annotated DNA Intron Exon (protein coding) Intergenic sequence Legend: 5' 3'
Background image of page 4
Formats for saving annotations: Motivation:  To save information on a gene, a format should be able to record: the location of the gene in the genome   the position of its exon-intron boundaries   the strand of DNA on which the gene lies the source of annotation the completeness of the gene structure Example 1: DNA with protein coding genes
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The GFF format:   GFF  = Genefinding File Format   a format used to save gene structures   idea: divide gene into its constituents Exon  – transcribed sections of a gene CDS  – translated sections of a gene Start_Codon Stop_Codon
Background image of page 6
The GFF format: + 4 Splicing pattern: Gene structure: Exons: CDS: Start_Codon: Stop_Codon: 1 2 3 5  The information on each Exon, CDS, Start_Codon and Stop_Codon     corresponds to one line within the gff file
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The GFF format:   Format of each gff-line:     name source feature start end score strand frame group where: name : the name of sequence (string) source : the name of the source of annotation (string) feature : feature type: “Exon”, “CDS”, “Start_Codon”, “Stop_Codon” (string) start : start position of feature (integer) end : end position of feature (integer) score : score (rational number) associated with feature, set to “.” if score not  used strand : strand on which feature lies, possible values are “+” or “-” frame :  “0”, “1” or “2” for CDS, Start_Codon and Stop_Codon, “.” for Exon
Background image of page 8
0 The GFF format: remarks the fields in a gff line are tab delimited start < end (important to keep in mind when dealing with genes on the reverse  strand !) the start and end positions are the corresponding positions on the “+” strand  definition of frame for CDS, Start_Codon and Stop_Codon features:  “0”: first nucleotide in feature has codon position 0
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 28

GFF - ComputationalBiology: Genomeannotationformats...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online