Introduction to Computer Science and
Problem Set 3
Tuesday, September 16, 2008.
Due: 11:59pm, Tuesday, September 23, 2008
This problem set will introduce you to using functions and recursion, as well as string operations in Python.
You may work with other students. However, each student should write up and hand in his or her assignment
Be sure to indicate with whom you have worked.
For further detail, please review the collaboration
policy as stated in the syllabus.
This problem set, and future ones, will be graded by a test harness. The test harness program will expect your
files to include just function definitions, with no executable code outside the function definitions (besides what's
already in the template). So remember to comment out your testing code. (And *do* test your code
Strings and string searching
As we have seen in lecture, strings are a common data type in many programming languages, and are used to
represent textual information. You have already seen common examples of string searching.
finding words or phrases in documents involves searching one sequence of characters (i.e., the document) to
find instances of another sequence of characters (the word or phrase to be found).
Similarly, for Web searches
such as Google, one needs to count instances of key words in documents, in order to rank pages.
Matching strings: a biological perspective
String matching is also is very valuable in less obvious settings, such as biology. A common problem in modern
biology is to understand the structure of DNA molecules, and the role of specific structures in determining the
function of the molecule. A DNA sequence is commonly represented as a sequence of one of four nucleotides –
adenine (A), cytosine (C), guanine (G), or thymine (T) –and hence a DNA molecule or strand is represented by
a string composed of elements from an alphabet of only four symbols, for example, the string
AAACAACTTCGTAAGTATA represents a particular strand of DNA.
One way to understand the function of a particular strand of DNA (or even a sub-strand of DNA) is to match that
strand against a library of known DNA sequences – that is, sequences whose function and structure is known –
with the idea that similar structure tends to imply similar function. Simple organisms such as bacteria may have
millions of nucleotides in their DNA sequence, and the human chromosome is believed to have on the order of
246 million bases, so any matching scheme must be very efficient in order to be useful.
In this problem set, we won’t ask you to build a practically useful tool, but hope to give you a sense of some of