{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Data mining process 10 original data target data

Info icon This preview shows pages 10–18. Sign up to view the full content.

View Full Document Right Arrow Icon
Data Mining Process  10 Original Data Target Data Preprocessed Data Transformed Data Patterns Knowledge Selection Preprocessing Transformation Data Mining Interpretation * This slide is from Prof. Mohammed J. Zaki data mining course
Image of page 10

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11 What is (not) Data Mining? What is Data Mining? Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area) Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,) What is not Data Mining? Look up phone number in phone directory Query a Web search engine for information about “Amazon”
Image of page 11
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12 Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems Traditional Techniques may be unsuitable due to Enormity of data High dimensionality of data Heterogeneous, distributed nature of data Origins of Data Mining Machine Learning/ Pattern Recognition Statistics/ AI Data Mining Database systems
Image of page 12

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13 13 Data Mining: Confluence of Multiple Disciplines   Data Mining Database Technology Statistics Machine Learning Pattern Recognition Algorithm Other Disciplines Visualization
Image of page 13
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14 Data Mining Tasks Prediction Methods Use some variables to predict unknown or future values of other variables. Description Methods Find human-interpretable patterns that describe the data. From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996
Image of page 14

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15 Data Mining Tasks... Classification [Predictive] Clustering [Descriptive] Association Rule Discovery [Descriptive] Sequential Pattern Discovery [Descriptive] Regression [Predictive] Deviation Detection [Predictive]
Image of page 15
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16 Classification: Definition Given a collection of records ( training set ) Each record contains a set of attributes , one of the attributes is the class . Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
Image of page 16

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17 Classification Example Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes categorical categorical continuous class Refund Marital Status Taxable Income Cheat No Single 75K ?
Image of page 17
Image of page 18
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}