class_11_12 - Statistical Data Mining ORIE 474 Spring 2007...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Data Mining ORIE 474 Spring 2007 Tatiyana Apanasovich 11/12/07 Memory-based Reasoning(MBR) The human ability to reason from experience depends on the ability to recognize appropriate examples form the past. One first identifies similar cases from experience and then applies what their knowledge of those cases to the problem in hand. A data base of known records is searched to find preclassified records similar to a new record. There neighbors are used for classification and estimation. Applications of MBR: fraud detection, customer response prediction, medical treatment. Memory-based Reasoning (MBR) MBR does not care about the format of the records. It only cares about the existence of two operations: a distance function and a combination function Classifying new records can require processing all the historical records to find the most similar neighbors-a more time consuming process than applying already trained NN or decision tree. Challenges of MBR Choosing an appropriate set of training records Choosing the most efficient way to represent the training records Choosing the distance function, the combination function, and the number of neighbors Memory-Based Reasoning Example Estimate apartment rent at Tuxedo, NY Neighbors selected on population and median home value Variables & distance metric are crucial ! Combine May use average of averages (using midpoint of ranges) May use average of midpoints of most common ranges (for Shelter Island it is $875, North Salem it is $1,250, so the average is $1,062.50) May use average of medians ( for Shelter Island it is $804 and for North Salem it is $1150, so the average is $977) In Tuxedo: midpoint of most common range is $1,250, median in $907 Memory-Based Reasoning Example Tuxedo data Performance of MBR The performance of MBR depends on how the training set is represented. A random sample may not provide sufficient coverage for all values. Some categories are more frequent then than others and the...
View Full Document

This note was uploaded on 12/23/2009 for the course ORIE 474 at Cornell University (Engineering School).

Page1 / 24

class_11_12 - Statistical Data Mining ORIE 474 Spring 2007...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online