This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: MIT OpenCourseWare http://ocw.mit.edu 6.006 Introduction to Algorithms Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms . Lecture 1 Introduction and Document Distance 6.006 Spring 2008 Lecture 1: Introduction and the Document Distance Problem Course Overview Ecient procedures for solving problems on large inputs (Ex: entire works of Shake speare, human genome, U.S. Highway map) Scalability Classic data structures and elementary algorithms (CLRS text) Real implementations in Python Fun problem sets! version of the class feedback is welcome! Prerequisites Familiarity with Python and Discrete Mathematics Contents The course is divided into 7 modules each of which has a motivating problem and problem set (except for the last module). Modules and motivating problems are as described below: 1. Linked Data Structures: Document Distance (DD) 2. Hashing: DD, Genome Comparison 3. Sorting: Gas Simulation 4. Search: Rubiks Cube 2 2 2 5. Shortest Paths: Caltech MIT 6. Dynamic Programming: Stock Market 7. Numerics: 2 Document Distance Problem Motivation Given two documents, how similar are they? Identical easy? Modified or related (Ex: DNA, Plagiarism, Authorship) 1 Lecture 1 Introduction and Document Distance 6.006 Spring 2008 Did Francis Bacon write Shakespeares plays? To answer the above, we need to define practical metrics. Metrics are defined in terms of word frequencies. Definitions 1. Word : Sequence of alphanumeric characters. For example, the phrase 6.006 is fun has 4 words. 2. Word Frequencies : Word frequency D ( w ) of a given word w is the number of times it occurs in a document D . For example, the words and word frequencies for the above phrase are as below: Count : 1 1 1 1 W ord : 6 the is 006 easy fun In practice, while counting, it is easy to choose some canonical ordering of words....
View
Full
Document
This note was uploaded on 09/24/2010 for the course CS 6.006 taught by Professor Erikdemaine during the Spring '08 term at MIT.
 Spring '08
 ErikDemaine
 Algorithms

Click to edit the document details