This method will be completed using a 10x redundancy to eliminate errors and

This method will be completed using a 10x redundancy

This preview shows page 45 - 52 out of 57 pages.

This method will be completed using a 10x redundancy to eliminate errors and reduce the possibility of having misses any targeted regions. The Celera Assembler is one of the core competencies and makes this Herculean task possible. The first pass through the data the shotgunned fragments are compared against each other and equivalent sequences greater than 40 base pairs long identified. These 40 base pairs matches are statistically impossible to occur by chance. These matches are then determined to be true or repeat induced. True matches are overlapping sections and are the desired fragments; repeat-induced fragments occur in multiple locations of the genome and do not belong together.
Image of page 45
Whole Genome Shotgun Sequencing The assembler then searches for overlapping fragments that have a common sequence and are not contested elsewhere in the dataset. The uncontested data is assembled into unitigs containing approximately 30 fragments. These assembled unitigs are 99 % accurate and repeats are filtered out using the Discriminator algorithm. Unitigs passing this filter are identified and renamed U-untigs that are ready for ordering. The scaffolding stage starts and the order found by looking at the mate pairs and organizing these into contigs. By constantly looking at these contigs and looking at the orientation the scaffold become complete except for some sequencing gaps. This strategy is repeated until the gaps are filled using the Discriminator algorithm and a method using sequence “rocks” and “pebbles”.
Image of page 46
Whole Genome Shotgun Sequencing As HGP has been making public the incremental sequence the shotgun approach utilized this data to help eliminate errors and speed the scaffolding process.
Image of page 47
Sequence Gaps Brown. Genomes 2
Image of page 48
Advances The following advances in robotics and automation reduced the labor by 80% while combining the microbiological advances: Development of Perkin-Elmer (ABI PRISM 3700) gene sequence. 1000 sample per day 15 minutes instead of 8 hours for first automated sequencers A parallel system of 300 sequencers ($300,000 each) Use of supercomputers to assemble fragments Development of process support instrumentation to process 100 K template preps and 200 K sequence reactions per day. 24 hour per day unattended operation of sequencers
Image of page 49
Map of Chromosome 16
Image of page 50
Advances In addition to the above advances the field of computational biology (bioinformatics) became increasingly important as the software and processors required to assemble a puzzle of this size still needed to be developed.
Image of page 51
Image of page 52

You've reached the end of your free preview.

Want to read all 57 pages?

  • Fall '13
  • Omair Gul
  • DNA, Human genome, Energy Genome Programs, U.S. Department of Energy Genome Programs

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture