This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: CSE-5120-Fall-2009 TOPICS in DATA MINING What is Data Mining? Nontrivial extraction of implicit, previously unknown, and potentially useful information from data. From data to “information”, “knowledge”, “regularity”, “overview”, and to “automatic construction of knowledge-bases”. Why Data Mining? Necessity is the mother of invention. Data explosion: automated data generation and gathering. “We are drowning in data, but starving for knowledge” Big benefits in data mining: Marketing and Sales, Strategic Planning, Op- timization, Forecasting What to Mine ? • Characteristic Rule A is a bird ⇒ A has a mother But a cat can also have a mother. • Classification Rule A can fly ⇒ A is a bird A cat cannot fly characteristics of A- characteristics of con- trasting B = classification of A • Association Rule purchase(T, bread), purchase(T,butter) ⇒ purchase(T,milk) • Clustering Group a set of data based on the conceptual clustering principle: maximize the intraclass similarity and min- imize the interclass similarity . • Sequences: Evolution analysis data changing with time. Time series. e.g. financial data When stock A increases on two consecutive days then plausibly, stock C increases the day after. • Deviation analysis detect data which does not follow the gen- eral trend of most of the other data Data Mining Techniques Database-oriented, statistics, machine learn- ing, etc. Specification of data mining tasks • Data sets: any sets of data in DBs. • kind of knowledge or form of rules to be mined. • Background knowledge • Interestingness measurement: e.g. signifi- cance, confidence, thresholds, etc. 17 Output results • Generalized relations, rules, patterns, char- acteristics, forecast, abnormally • feature tables, charts, curves, and other vi- sual outputs....
View Full Document
This note was uploaded on 04/23/2010 for the course CSC CSC5120 taught by Professor Adafu during the Fall '09 term at CUHK.
- Fall '09
- Data Mining