An Introduction to Bioinformatics Algorithms (Computational Molecular Biology)

Biological Data Analysis (CSE 182) : Assignment 3 Logistics Submit a hard copy containing the code and results. Create a compressed file containing the code and output as separate files, and email Julio Ng. Sequence Alignment and Gap penalties 1. Build an automaton for a dictionary containing 3 words. Show all failure links, and transition links. Submit a sheet of paper with the automaton hand-drawn. The words are: CAMPERS, AMPERE, and AMINO (18pts.). 2. You are given the following: A database D (represented as a single sequence), a family F of 20 sequences, and a scoring matrix M (40pts.). (a) Design an appropriate algorithm to find homologs of F in D . Submit a written (pseudo-code) de- scription of the algorithm, and your reasoning on why it is appropriate. (b) Implement the algorithm, and apply it to finding novel homologs of F in the database D . Report the homologs you found in the output file. (c) Compute an empirical P-value for the homologs, by first computing a distribution of scores on a random database.
