lec1 - MIT OpenCourseWare http/ocw.mit.edu 6.006...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: MIT OpenCourseWare http://ocw.mit.edu 6.006 Introduction to Algorithms Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms . Lecture 1 Introduction and Document Distance 6.006 Spring 2008 Lecture 1: Introduction and the Document Distance Problem Course Overview • Efficient procedures for solving problems on large inputs (Ex: entire works of Shake- speare, human genome, U.S. Highway map) • Scalability • Classic data structures and elementary algorithms (CLRS text) Real implementations in Python Fun problem sets! • ⇔ β version of the class- feedback is welcome! • Pre-requisites • Familiarity with Python and Discrete Mathematics Contents The course is divided into 7 modules- each of which has a motivating problem and problem set (except for the last module). Modules and motivating problems are as described below: 1. Linked Data Structures: Document Distance (DD) 2. Hashing: DD, Genome Comparison 3. Sorting: Gas Simulation 4. Search: Rubik’s Cube 2 × 2 × 2 5. Shortest Paths: Caltech MIT → 6. Dynamic Programming: Stock Market 7. Numerics: √ 2 Document Distance Problem Motivation Given two documents, how similar are they? • Identical- easy? • Modified or related (Ex: DNA, Plagiarism, Authorship) 1 Lecture 1 Introduction and Document Distance 6.006 Spring 2008 • Did Francis Bacon write Shakespeare’s plays? To answer the above, we need to define practical metrics. Metrics are defined in terms of word frequencies. Definitions 1. Word : Sequence of alphanumeric characters. For example, the phrase “6.006 is fun” has 4 words. 2. Word Frequencies : Word frequency D ( w ) of a given word w is the number of times it occurs in a document D . For example, the words and word frequencies for the above phrase are as below: Count : 1 1 1 1 W ord : 6 the is 006 easy fun In practice, while counting, it is easy to choose some canonical ordering of words....
View Full Document

{[ snackBarMessage ]}

Page1 / 7

lec1 - MIT OpenCourseWare http/ocw.mit.edu 6.006...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online