{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

problem_set_2 - 7.36/7.91/BE.490 Homework 2 Due March 11 at...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
7.36/7.91/BE.490 Homework 2 Due March 11 at 1:00 PM Note: Please see the class website for a handout describing how to submit your programming problems electronically. Also a list of useful programming hints is included at the end of this assignment. 1. Paleogenomics I - BLAST searches with nucleotide sequences Your lab has developed a revolutionary technology to extract DNA from dinosaur fossils. In order to learn about the evolutionary history of dinosaurs and the origins of flight, you extract and sequence DNA from a fossilized pterodactyl . This sequence is stored in the file dino1.fa. a. Do a BLASTN search of the ‘dino1.fa’ sequence against the nr database. What is the top hit with the default settings? Does it look real? Why/why not? b. Now change the mismatch penalty from the default (-3) to –1 and do the same BLASTN search. What is the top hit now? What is the E-value of this hit? Qualitatively, how did the alignments change from part a? Is this what you expect when reducing the magnitude of the mismatch penalty? c. Now do a BLASTX search with dino1.fa. What is the top BLASTX hit? What are the E-values and bit scores of this hit? Does it look real? What type of organism is this? What could this mean for the evolution of flight in dinosaurs? d. Explain why the BLASTN and BLASTX results are so different. 2. Paleogenomics II – the effect of introns on BLAST searches You obtain a new pterodactyl sequence to study in the file ‘dino2.fa’. You try BLASTN and BLASTX but can’t find any related sequence in the nr nucleotide and protein databases (do the searches to convince yourself). Because this sequence comes from a very gene-rich part of the pterodactyl genome, you suspect that there may be a gene hidden in the sequence which for some reason is undetectable by BLAST. To explore this possibility, you decide to investigate the splice site motifs recognized by the pterodactyl splicing machinery to see whether you can extract the exons from the dino2.fa sequence. You do some more sequencing and identify a set of 56 pterodactyl genomic DNA
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
fragments that have BLASTX hits to known proteins. You infer these hits are likely to contain exons and splice sites. Since BLASTX hits often do not correspond precisely to the boundaries of exons (e.g., because of amino acid changes near splice sites resulting in truncated hits, or spurious hits to translates splice sites/introns resulting in extended hits), you construct a dataset including ~25 bases on each side of the START of the BLASTX hit and call it ‘3primesplicesites.txt’ (for finding the 3’ splice site motif) and another dataset including ~25 bases on each side of the END of the BLASTX hit, which you call ‘5primesplicesites.txt’ (for finding the 5’ splice site motif).
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}