Data mining - Overview
.discovering meaningful patterns from large quantities of data
Process of maximizing the value of business data
Models
Input
x
MODEL
Output
y
y = f (x)
What is a model?
y = w1x1 + w2 x2 + + wk xk
IF [(age < 35) and ( $30K< income< $

Text Mining
Extracting information from normal text
unstructured data
(Pre) processing text documents for mining
Information Retrieval techniques
Unstructured data
- not directly amenable to regular data-mining
news reports, articles
web content- pages

G = (AL - AU)/AU
Normalize each variable to remove scale effect Divide by std. deviation
(may subtract mean first)
Normalization (= standardization) is usually performed in PCA; otherwise
measurement units affect results
wkjt 1 wkjt k h j ( wkjt wkjt 1

Wordnet and SentiWordnet
Wordnet
- lexical database that groups words into synsets (synonym sets)
- each synset defines a distinct concept
- synset definition, usage examples
- relations among synsets and members
Nouns, verbs, adverbs, adjectives (lexical

Collaborative filtering
Social filtering of information to find out what a user
may be interested in (wisdom of the crowds)
Recommender systems
Recommend items (books, CDs, movies, web-sites, ) likely to be of
interest to a user
Compares data on the use

[Artificial] Neural Networks
Computation based on Biological Neural Net
A class of powerful, general-purpose tools
Prediction
Classification
Clustering
Computerized Neural Nets
Predicting time-series in financial world
Diagnosing medical conditions
Identi

Association Rules
Association Rules
Things, events, etc. that occur together
customers who bought X also bought Y
what symptoms go with what diagnosis
Transaction-based or event-based
Set of associations (related items)
Rules pertaining to the associati

Boosted trees
Boosting: Foundations and Algorithms by Robert Schapire, Yoav Freund
(https:/mitpress.mit.edu/sites/default/files/titles/content/boosting_foundations_algorithms/toc.html)
A Short Introduction to Boosting by Y Freund, R Schapire
(http:/www.si

[Artificial] Neural Networks
Computation based on Biological Neural Net
A class of powerful, general-purpose tools
Prediction
Classification
Clustering
Computerized Neural Nets
Predicting time-series in financial world
Diagnosing medical conditions
Identi

Decision Trees
Non-Buyer
Residence
<>NY
=NY
Age
< 35
>=35
Buyer
Non-Buyer
Trees and Rules
Goal: Classify or predict an outcome based on a set of predictors
(independent variables)
Partitions the predictor space into multiple simple regions
The output is a

Data
- data exploration, transformations
- data reduction - PCA
Steps in Data Mining
1.
2.
3.
4.
5.
6.
7.
8.
9.
Define/understand purpose
Obtain data (may involve random sampling)
Explore, clean, pre-process data
Reduce the data; if supervised DM, partiti

A Users Guide to Support Vector Machines
Asa Ben-Hur
Department of Computer Science
Colorado State University
Jason Weston
NEC Labs America
Princeton, NJ 08540 USA
Abstract
The Support Vector Machine (SVM) is a widely used classifier. And yet, obtaining t

Classification performance evaluation
Why evaluate?
Multiple methods are available to classify or predict
For each method, multiple choices are available for
settings
To choose best model, need to assess each
models performance
Does performance measure

Distance and Similarity Measures
Bamshad Mobasher
DePaul University
Distance or Similarity Measures
Many data mining and analytics tasks involve the comparison of
objects and determining in terms of their similarities (or
dissimilarities)
Clustering
Near

Logistic regression
and some linear regression basics
Binary dependent variable: Odds
p: probability of an event occurring
1-p: probability of the event not occurring
Odds = p/(1-p)
Odds of winning is 1:3 => odds of 1 win in 3 losses
Odds = 0.25/(1-0.25

Nave Bayes classifier
data-driven, not model-driven
Example
Auditing interested in whether a fraudulent financial report was submitted (Y)
Other information: whether legal charges were filed against the company (X)
legal charges
(x=1)
No legal charges
(x=

Kernel methods, SVM
Consider ridge regression
We want to learn =
=1
Obtain w as = argmin
11 . .
1
=
1
= , =
1
2
( ( ) )2 +
=1
1
=1
(for r-th training example)
= argmin
2
+
2
Notation:
X is a matrix, x is a vector
Solve by setting derivatives to zero

Memory-based reasoning
Nearest neighbors methods
K-nearest neighbors
Predict unknown values for a case based on similarity
with K most similar cases
Reasoning by analogy
Collaborative Filtering
Use preferences in addition to similarity with past
cases

Clustering
Clustering
Cluster: a collection of data objects
Similar to one another within the same cluster
Dissimilar to objects in other clusters
Cluster analysis
Grouping a set of data objects into clusters
Unsupervised classification: no predefin

Decision Trees
Non-Buyer
Residence
<>NY
=NY
Age
< 35
>=35
Buyer
Non-Buyer
Trees and Rules
Goal: Classify or predict an outcome based on a set of predictors
(independent variables)
The output is a decision tree or a set of rules
Example:
Goal: classify a

Collaborative filtering
Social filtering of information to find out what a user
may be interested in (wisdom of the crowd)
Recommender systems
Recommend items (books, CDs, movies, web-sites, )
likely to be of interest to a user
Compares data on the user

Random Forests
Bias, variance
bias2
variance
error
Bias: ability of a technique to
accurately model the problem
Variance: different accuracies
with different training data
Bias-variance tradeoff
(lower variance models have
high bias, and vice versa)
http:

Nave Bayes classifier
data-driven, not model-driven
Example
Auditing interested in whether a fraudulent financial report was submitted (Y)
Other information: whether legal charges were filed against the company (X)
legal charges
(x=1)
No legal charges
(x=