This preview shows pages 1–13. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CSC 30155 Wednesday 20/10/10 Dr. Daniel Hughes daniel.hughes@xjtlu.edu.cn Plan for Today l Review of the LCS Problem 20 minutes l A Dynamic LCS Algorithm 30 minutes l LCS questions 40 minutes l Feedback 10 minutes CSC 30155 The Longest Common Subsequence Problem Dr. Daniel Hughes daniel.hughes@xjtlu.edu.cn Supporting Reading l Optional reading: l Cormen et al., Introduction to Algorithms , MIT Press, 2001, Chapter 15: Longest Common Subsequence Problem (15.4) Problem Definition l A subsequence of a given sequence is the given sequence with zero or more elements left out. l Given two sequences: X and Y, we say that a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y. l In the longest common subsequence problem we are given two sequences X and Y and wish to find a maximumlength common subsequence (or LCS) of both X and Y. Comparing to the Longest Common Substring Problem l A string is a sequence of characters. However, a substring is not synonymous with a subsequence. l Substrings are consecutive parts of a string, while subsequences may not be. l From now on, when using the term LCS, we are referring to the Longest Common Subsequence Problem . Example l (b) is a substring and a subsequence of (a): (a) ba babc (b) babc (c) aabc l (c) is a subsequence of (a) but not a substring: (a) b a b abc (b) babc (c) aabc substring subsequence A Key Application: Genetic Identification (1/2) l DNA may be viewed as a text written in the alphabet (A,C,G,T). l Evolution occurs due to a combination of DNA mutation and natural selection. l Mutations include: l Point mutations : replacement of a character. l Insertion mutations : addition of some characters. l Deletion mutations : deletion of some characters. A Key Application: Genetic Identification (2/2) l So if we want to identify the closest genetic relative of an unknown species, we can: l Take a sample of its DNA. l Run LCS against known species. l The one with the longest LCS is probably the closest relative. Why we Need and Efficient LCS Algorithm l While computers grow more powerful, the quantity of information that we want to work with also grows. l The Human Genome has 3 billion base pairs while the Wheat Genome has 17 billion base pairs . l If we want to use LCS to identify a species of wheat, our algorithm will have to be very fast. The BruteForce Approach l Yesterday you worked on a BRUTEFORCE algorithm to solve this problem. l The algorithm took two strings as parameters X and Y and: l Enumerated all subsequences in X . l Checked if each subsequence appears in Y . l Returned the longest matching subsequence . Lets Look at the Performance of some implementations: l Java implementations will be uploaded at the end of the week....
View Full
Document
 Spring '11
 GaryLi
 Databases

Click to edit the document details