Bi1X_2009_Week6_prelab_BLAST - Bi 1x Spring 2009 Week 6...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Page 1 of 8 Bi 1x, Spring 2009 Week 6 Prelab Problem 1: thinking about genomes Genome science is fast becoming one of the most important facets of modern biology. The idea is to examine the content of genomes as a window on biological function, the evolutionary history of organisms and a host of other issues. In this problem, use a few simple rules of thumb e.g. that a “typical” protein is 300 amino acids long. A. Given that the E. coli genome has roughly 5 million basepairs, make a simple estimate of the number of genes in the genome of E. coli . B. Compare and comment on your result from part 1A with the roughly 4400 genes actually observed in the E. coli genome. “GeneSweep was an informal gene-count betting pool that began at the 2000 Cold Spring Harbor Laboratory Genome Meeting.” This betting pool was aimed at guessing how many genes would be found in the human genome. C. Try out a similar estimate on the human genome with its 3 billion basepair genome to that you tried for a bacterium in part A. Explain the rationale behind your estimate and then comment on what bet you would make and why? D. No matter what the ultimate gene count turns out to be, this estimate is way off! Give at least two possible reasons why this naïve reasoning failed in the human case.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Page 2 of 8 Problem 2: the basics The most famous polymer in the world, DNA, can be thought of as just a simple sequence of 4 letters: A, T, C and G, and yet put a few billion or so of these letters together and we can have the code for making a tree, a frog or Jack Black. Modern initiatives like the Human Genome project have generated reams of genetic sequence data that have become one of biologists’ most treasured resources. Some examples of research questions arising from this sequence data are: Which parts of the DNA sequence code for genes? How has evolution changed sequences over time and between species? How do we determine the evolutionary relationship between living things based on their DNA? Bioinformatics is the field of science that attempts to tackle these questions among many others concerned with how information is stored, used and passed on in living things. Both DNA and proteins are linear polymers that can be thought of usefully in terms of their sequences: DNA as a sequence of nucleotides and proteins as a sequence of amino acids. Most of you are familiar with the alphabet of DNA; the (perhaps) less familiar alphabet of amino acids is shown in Table 1 below. One-letter code Three-letter code Amino acid One-letter code Three-letter code Amino acid A Ala alanine M Met methionine C Cys cysteine N Asn asparagine D Asp aspartic acid P Pro proline E Glu glutamic acid Q Gln glutamine F Phe phenylalanine R Arg arginine G Gly glycine S Ser serine H His histidine T Thr threonine I Ile isoleucine V Val valine K Lys lysine W Trp tryptophan L Leu leucine Y Tyr tyrosine Table 1 – One and three letter codes for the 20 commonly occurring amino acids. One useful way to compare two sequences (DNA or protein) is to align them so that the
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 01/03/2012.

Page1 / 8

Bi1X_2009_Week6_prelab_BLAST - Bi 1x Spring 2009 Week 6...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online