This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: CS573: Homework 5 Due date: Monday December 6, 4pm to CS mailroom Programming assignment In this assignment you will implement an sequential pattern mining algorithm and apply it to the Molecular Biology (Promoter Gene Sequences) Data Set from the UCI ML repos- itory. The data set is available at: http://archive.ics.uci.edu/ml/datasets/Molecular+Biology+%28Promoter+Gene+Sequences%29. It consists of 108 snippets of the DNA sequence for E. coli, where positive instances correspond to promoters (which initiate the process of gene expression) and negative instances correspond to non-promoters. In this assignment, you will consider the sequences associated with the first 10 positive instances, using the 57 characters of their sequences to look for patterns. Again, you can use a language of your choice to implement the algorithms (e.g., Java, C, Python, R). Please hand in a hard copy of your code with the assignment....
View Full Document
This note was uploaded on 03/13/2012 for the course CS 573 taught by Professor Staff during the Fall '08 term at Purdue University-West Lafayette.
- Fall '08