{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

12_LongestCommonSubsequence

# 12_LongestCommonSubsequence - Wednesday Dr Daniel Hughes...

This preview shows pages 1–13. Sign up to view the full content.

CSC 30155 Wednesday 20/10/10 Dr. Daniel Hughes

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Plan for Today Review of the LCS Problem 20 minutes A Dynamic LCS Algorithm 30 minutes LCS questions 40 minutes Feedback 10 minutes
CSC 30155 The Longest Common Subsequence Problem Dr. Daniel Hughes

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Supporting Reading Optional reading: Cormen et al., Introduction to Algorithms , MIT Press, 2001, Chapter 15: Longest Common Subsequence Problem (15.4)
Problem Definition A subsequence of a given sequence is the given sequence with zero or more elements left out. Given two sequences: X and Y, we say that a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y. In the longest common subsequence problem we are given two sequences X and Y and wish to find a maximum-length common subsequence (or LCS) of both X and Y.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Comparing to the Longest Common Substring Problem A string is a sequence of characters. However, a substring is not synonymous with a subsequence. Substrings are consecutive parts of a string, while subsequences may not be. From now on, when using the term LCS, we are referring to the Longest Common Subsequence Problem .
Example (b) is a substring and a subsequence of (a): (a) ba babc (b) babc (c) aabc (c) is a subsequence of (a) but not a substring: (a) b a b abc (b) babc (c) aabc substring subsequence

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
A Key Application: Genetic Identification (1/2) DNA may be viewed as a text written in the alphabet (A,C,G,T). Evolution occurs due to a combination of DNA mutation and natural selection. Mutations include: Point mutations : replacement of a character. Insertion mutations : addition of some characters. Deletion mutations : deletion of some characters.
A Key Application: Genetic Identification (2/2) So if we want to identify the closest genetic relative of an unknown species, we can: Take a sample of its DNA. Run LCS against known species. The one with the longest LCS is probably the closest relative.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Why we Need and Efficient LCS Algorithm While computers grow more powerful, the quantity of information that we want to work with also grows. The Human Genome has 3 billion base pairs while the Wheat Genome has 17 billion base pairs . If we want to use LCS to identify a species of wheat, our algorithm will have to be very fast.
The Brute-Force Approach Yesterday you worked on a BRUTE-FORCE algorithm to solve this problem. The algorithm took two strings as parameters X and Y and: Enumerated all subsequences in X . Checked if each subsequence appears in Y . Returned the longest matching subsequence .

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Lets Look at the Performance of some implementations: Java implementations will be uploaded at the end of the week.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}