G tga signies end of the gene in the middle of a

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Raw Sequence Data •  4 bases: A, C, G, T + other (i.e. N = any, R = G or A (purine), Y = T or (pyrimidine)) –  kb (= kbp) = kilo base pairs = 1,000 bp –  Mb = mega base pairs = 1,000,000 bp –  Gb = giga base pairs = 1,000,000,000 bp. •  Size: –  E. Coli 4.6Mbp (4,600,000) –  Fish 130 Gbp (130,000,000,000) –  Paris japonica (Plant) 150 Gbp –  Human 3.2Gbp Fasta File •  A sequence in FASTA format begins with a single- line descrip9on, followed by lines of sequence data (file extension is .fa). •  It is recommended that all lines of text be shorter than 80 characters in length. 4 8/26/13 Fastq File •  Typically contain 4 lines: –  Line 1 begins with a '@' character and is followed by a sequence iden9fier and an op#onal descrip9on. –  Line 2 is the sequence. –  Line 3 is the delimiter ‘+’, with an op9onal descrip9on. –  Line 4 is the quality score. –  file extension is .fq @SEQ_ID! GATTTGGGGTTCAAAGCT...
View Full Document

This note was uploaded on 02/10/2014 for the course CS 425 taught by Professor Asaben-hur during the Fall '13 term at Colorado State.

Ask a homework question - tutors are online