lec7 - CS 6093 Lecture 7 Spring 2011 Basic Data Mining Cong...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS 6093 Lecture 7 Spring 2011 Basic Data Mining Cong Yu 03/21/2011 Announcements • No regular office hour next Monday (March 28 th ) – Office hour next week will be on Tuesday through Thursday by appointment only • I will be out of town from April 1 st to April 16 th – Sporadic email access, please plan accordingly if you need to discuss your projects with me • Midterm report will be graded soon – We are aiming for the end of the week • Quiz next week will be based on today’s lecture – And it will be closed notes • Any question on the projects? Today’s Outline • Overview of Data Mining – What, Why, How • Classic Studies – Association Rule Mining – Data Cube Analysis – Rule Interestingness What is Data Mining? • You are familiar with Data/Information Retrieval – Querying the database using SQL – Search the Web via keyword queries • Data mining is NOT data retrieval • Data mining = Discover hidden and useful knowledge from large amounts of data – Hidden : you can’t easily write a query to fetch what you want … because you don’t even know what you want – Interesting : not every piece of hidden knowledge is useful ... trivial discoveries can overwhelm the user – Large amounts : simple techniques are no longer sufficient … need efficient and scalable techniques Examples of Data Mining Results • Rules and Patterns – Customers who buy Harry Potter books often buy Twilight books – Users in NYC tend to search for expensive restaurants on Valentines Day • Clusters and Classification – TV viewers who watch 2+ hours of cable news every day can be divided into three groups: CNN, MSNBC, and Fox – Given a viewer, predict which group s/he falls into (for advertising purpose) Why Study Data Mining? Lots of Data • Opportunities for: – Purchase recommendation – Credit card fraud detection • Challenging for: – Hidden information detection beyond human eyes – Efficiency and scalability Amazon, Walmart, Citibank, etc. How Data Mining Become a Field • Started within the Database Systems community – OLAP instead of OLTP – OLTP: online transactional processing • ATM transactions, Shopping transactions, etc. – OLAP: online analytical processing • Business intelligence, business reporting • Heavily influenced by – Machine Learning – Statistics • More recently – Information Retrieval – Recommendation Systems How is Data Mining Done? • Descriptive (closer to database): – Classic topics: • Association rule mining • Frequent pattern mining • Data cube analysis – Clustering • Group similar data points and separate dissimilar data points – Anomaly detection • Detect data points that significantly deviates from others • Predictive (closer to statistics and machine learning): – Classification • Predict which label to be assigned to a data point based on its features – Regression analysis • Predict the value of a dependent variable (e.g., sales) based on the underlying variables (e.g., time and location) Today’s Outline...
View Full Document

This note was uploaded on 01/22/2012 for the course CS 6093 taught by Professor Diaz during the Spring '11 term at NYU Poly.

Page1 / 71

lec7 - CS 6093 Lecture 7 Spring 2011 Basic Data Mining Cong...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online