datamining-intro-IEP

datamining-intro-IEP - An Introduction to Data Mining Prof....

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: An Introduction to Data Mining Prof. S. Sudarshan CSE Dept, IIT Bombay Most slides courtesy: Prof. Sunita Sarawagi School of IT, IIT Bombay Why Data Mining Credit ratings/targeted marketing : Given a database of 100,000 names, which persons are the least likely to default on their credit cards? Identify likely responders to sales promotions Fraud detection Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer? Customer relationship management : Which of my customers are likely to be the most loyal, and which are most likely to leave for a competitor? : Data Mining helps extract such information Data mining Process of semi-automatically analyzing large databases to find patterns that are: valid: hold on new data with some certainity novel: non-obvious to the system useful: should be possible to act on the item understandable: humans should be able to interpret the pattern Also known as Knowledge Discovery in Databases (KDD) Applications Banking: loan/credit card approval predict good customers based on old customers Customer relationship management: identify those who are likely to leave for a competitor. Targeted marketing: identify likely responders to promotions Fraud detection: telecommunications, financial transactions from an online stream of event identify fraudulent events Manufacturing and production: automatically adjust knobs when process parameter changes Applications (continued) Medicine: disease outcome, effectiveness of treatments analyze patient disease history: find relationship between diseases Molecular/Pharmaceutical: identify new drugs Scientific data analysis: identify new galaxies by searching for sub clusters Web site/store design and promotion: find affinity of visitor to pages and modify layout The KDD process Problem fomulation Data collection subset data: sampling might hurt if highly skewed data feature selection: principal component analysis, heuristic search Pre-processing: cleaning name/address cleaning, different meanings (annual, yearly), duplicate removal, supplying missing values Transformation: map complex objects e.g. time series data to features e.g. frequency Choosing mining task and mining method: Result evaluation and Visualization: Knowledge discovery is an iterative process Relationship with other fields Overlaps with machine learning, statistics, artificial intelligence, databases, visualization but more stress on scalability of number of features and instances...
View Full Document

Page1 / 48

datamining-intro-IEP - An Introduction to Data Mining Prof....

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online