This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: CS1132 Fall 2011 Assignment 2 Adhere to the Code of Academic Integrity. You may discuss background issues and general strategies with others and seek help from course staff, but the implementations that you submit must be your own. In particular, you may discuss general ideas with others but you may not work out the detailed solutions with others. It is never OK for you to see or hear another student’s code and it is never OK to copy code from published/Internet sources. If you feel that you cannot complete the assignment on you own, seek help from the course staff. Note: In this last homework in the course, you will design the solutions to the problems posed. Minimal specifications are given so that you will design the appropriate program structure, including any subfunctions, yourself. The focus is for you to turn a problem statement written in English into a solution written as a Matlab program. Read carefully and take some time to think about the design —don’t just jump into coding immediately. 1 Finding Genes A four-letter alphabet, A, C, T, and G, is used to represent the four nucleotides, Adenine, Cytosine, Thymine, and Guanine, in the DNA of living organisms. There is one (long) sequence of these nucleotides, or bases, for each chromosome, and the set of sequences for all the chromosomes in an organism is called the genome. The genomic sequences for many organisms are now known, including a human genome, which is a sequence of about three billion characters! A gene is a substring of a genome that codes for a specific function in an organism. It is a chain of codons , each of which is a triplet of bases that encodes one amino acid (e.g., the codon CAA is the amino acid Glutamine, which plays a role in regulating the acid-base balance in the kidney). In a sequence of bases, the start codon ATG marks the beginning of a gene, and one of three stop codons , TAG, TAA, or TGA, marks the end of a gene (but the start and stop codons themselves are not part of the gene). One of the first steps in analyzing a genome is to identify its genes. You will write a program to find all the genes in a partial genomic sequence. Specifically, you will write a function findGenes such that the statement n= findGenes(’data.txt’, ’result.txt’) reads a DNA data file called data.txt in the current directory, identifies all the genes in the sequence in the data file, writes the genes to a file called...
View Full Document
This note was uploaded on 09/18/2011 for the course CS 1132 at Cornell.