High Performance Computing
Solutions for Data Mining
Prof. Navneet Goyal
BITS -PILANI, PILANI CAMPUS
Topics
o Big Data
Sources
Characteristics
Management
Analytics
The Road Ahead
o The need for HPC for taming BIG DATA
o The holy grail of Programming: Perf

Support Vector Machines
Text Book Slides
Support Vector Machines
Find a linear hyperplane (decision boundary) that will separate the data
Support Vector Machines
One Possible Solution
Support Vector Machines
Another possible solution
Support Vector Mac

DIMENSIONALITY REDUCTION USING PCA & SVD
Prof. Navneet Goyal
CS & IS Department
BITS, Pilani
Methods for Dimensionality Reduction
Two main methods
Feature Selection
Feature Extraction
Methods for Dimensionality Reduction
Feature selection: Choosing k<d im

Ensemble Classifiers
Prof. Navneet Goyal
Ensemble Classifiers
Introduction & Motivation
Construction of Ensemble Classifiers
Boosting (Ada Boost)
Bagging
Random Forests
Empirical Comparison
Introduction & Motivation
Suppose that you are a patient w

Clustering
Prof. Navneet Goyal
BITS, Pilani
What is Cluster Analysis?
Finding groups of objects such that the objects in a group
will be similar (or related) to one another and different from
(or unrelated to) the objects in other groups
Intra-cluster
dis

Types of Data &
Data Preprocessing
Prof. Navneet Goyal
Department of Computer Science &
Information Systems
BITS, Pilani
Data Preprocessing
Why preprocess the data?
Data cleaning
Data integration and transformation
Data reduction
Discretization and c

DATA MINING
Prof. Navneet Goyal
Department of Computer Science & Information Systems,
BITS, Pilani.
Importance of Data
"Data is the new 'oil' and there is a growing
need for the ability to refine it,"
Dhiraj Rajaram, founder of Mu Sigma
BIG Data!
11/01/

Classification
Prof. Navneet Goyal
BITS, Pilani
CS C415/IS C415 Data Mining
Classification & Prediction
What is Classification?
What is Prediction?
Any relationship between the
two?
Supervised or Unsupervised?
Issues
Applications
Algorithms
Classifier Acc

Distance-based
Classification
Prof. Navneet Goyal
BITS, Pilani
Classification: Eager & Lazy
Learners
Decision
Tree classifier is an example of an
eager learner
Because they are designed to learn a model
that maps the input attributes to the class
label

Bayesian Classification
Dr. Navneet Goyal
BITS, Pilani
Bayesian Classification
What are Bayesian Classifiers?
Statistical Classifiers
Predict class membership
probabilities
Based on Bayes Theorem
Nave Bayesian Classifier
Computationally Simple
Comparable

ASSOCIATION RULE
MINING
Prof. Navneet Goyal
CSIS Department, BITS-Pilani
Association Rule Mining
Find all rules of the form Itemset1 Itemset2
having:
support minsup threshold
confidence minconf threshold
Brute-force approach:
List all possible association

Association Rules
Dr. Navneet Goyal
BITS, Pilani
Association Rules &
Frequent Itemsets
Market-Basket Analysis
Grocery Store: Large no. of ITEMS
Customers fill their market baskets with subset of
items
98% of people who purchase diapers also buy beer
Used