lec1 - MIT OpenCourseWare http://ocw.mit.edu 6.006...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: MIT OpenCourseWare http://ocw.mit.edu 6.006 Introduction to Algorithms Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms . Lecture 1 Introduction and Document Distance 6.006 Spring 2008 Lecture 1: Introduction and the Document Distance Problem Course Overview Ecient procedures for solving problems on large inputs (Ex: entire works of Shake- speare, human genome, U.S. Highway map) Scalability Classic data structures and elementary algorithms (CLRS text) Real implementations in Python Fun problem sets! version of the class- feedback is welcome! Pre-requisites Familiarity with Python and Discrete Mathematics Contents The course is divided into 7 modules- each of which has a motivating problem and problem set (except for the last module). Modules and motivating problems are as described below: 1. Linked Data Structures: Document Distance (DD) 2. Hashing: DD, Genome Comparison 3. Sorting: Gas Simulation 4. Search: Rubiks Cube 2 2 2 5. Shortest Paths: Caltech MIT 6. Dynamic Programming: Stock Market 7. Numerics: 2 Document Distance Problem Motivation Given two documents, how similar are they? Identical- easy? Modified or related (Ex: DNA, Plagiarism, Authorship) 1 Lecture 1 Introduction and Document Distance 6.006 Spring 2008 Did Francis Bacon write Shakespeares plays? To answer the above, we need to define practical metrics. Metrics are defined in terms of word frequencies. Definitions 1. Word : Sequence of alphanumeric characters. For example, the phrase 6.006 is fun has 4 words. 2. Word Frequencies : Word frequency D ( w ) of a given word w is the number of times it occurs in a document D . For example, the words and word frequencies for the above phrase are as below: Count : 1 1 1 1 W ord : 6 the is 006 easy fun In practice, while counting, it is easy to choose some canonical ordering of words....
View Full Document

This note was uploaded on 09/24/2010 for the course CS 6.006 taught by Professor Erikdemaine during the Spring '08 term at MIT.

Page1 / 7

lec1 - MIT OpenCourseWare http://ocw.mit.edu 6.006...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online