8/29/2011 1 Practical Bioinformatics for Life Scientists Week 2, Lecture 3 István Albert Bioinformatics Consulting Center Penn State NGS sequencing platforms Three major platforms - colloquially referred to as: Illumina aka Solexa Solid S equencing by O ligonucleotide Li gation and D etection 454 pyrosequencing Each instrument produces different output formats NGS sequencing read formats Reads : short sequences produced by the instrument Illumina FastQ format ( .fastq or .fq ) Solid colorspace fasta ( .xsq or .csfasta + .qual ) 454 standard flowgram format ( .sff ) Fasta format
8/29/2011 2 Color space fasta and quality files quality file each base in the color space fasta has a quality associated with it sequences in color space format base + number = next base Color-space to letter-space T323 T followed by red (0) color A + yellow(2) --> G + red (3) C T323 TAGC This is almost never decoded as such before alignment because any error in the color space will alter all subsequent basecalls. This makes operating with Solid data a bit more difficult.
BMMB 597D taught by Professor Istvanalbert during the Fall '11 term at Penn State.

