{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Homework2 - Computational Biology 03-510/710 42-434/734...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Computational Biology 03-510/710 & 42-434/734 Spring 2009 Homework 2 (revised February 12, 2009) R. F. Murphy Page 1 of 3 Homework 2 Building Profile HMMs Due: February 19, 2009 50 points Your task is to write and test a program in C, C++, Java, Perl, Python or Matlab for Andrew Unix, Linux, MacOs or Windows that will find build models of sequence features using profile Hidden Markov Models (HMMs). Your program should: 1. parse from the command line the following inputs a. name of a FASTA file containing a set of aligned sequences b. a character indicating whether the sequences are amino acid (A) or nucleic acid (N) c. the number of match states to be used in the HMM d. the number of folds of cross-validation to use e. the name of an output file 2. read the FASTA file 3. build a profile HMM for the specified number of match states using all sequences (see below) 4. calculate the best match between the profile HMM and the sequences, print the score for each, and remember them 5. divide the sequences into the specified number of folds 6. for each fold, use all other folds as training data, build a profile HMM, calculate the best match between that profile and the sequences in the test fold ONLY, remember the ratio between that score and the score for the corresponding sequence found in step 4, and print
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}