Artificial Neural Networks
Mitchell
Introduction
Network of neurons
Original motivation model biological
neurons
A Neuron is a simple computational
unit
Computes weighted sum of its inputs
Output is some threshold function of this
sum
1
Perceptron
Decisio
Clustering
Why cluster?
Identify patterns
Earthquakes
Crime-prone areas
Why cluster
Recommendation engines
What do we need to cluster data?
1. Dataset of Points
2. Distance Function
Anything else?
Distance Functions
Which distance function should I
use?
CSE 634
Data Mining Techniques
Presentation on Neural
Network
Jalal Mahmud ( 105241140)
Hyung-Yeon, Gu(104985928)
Course Teacher : Prof. Anita Wasilewska
State University of New York at Stony Brook
References
Data Mining Concept and Techniques (Chapter 7.
Cluster Analysis
Han and Kamber
Duda, Hart and Stork
What is Cluster Analysis?
Cluster: a collection of data objects
Similar to one another within the same cluster
Dissimilar to the objects in other clusters
Cluster analysis
Finding similarities between d
Supervised Learning Classification and Regression
Han and Kamber Witten and Frank Mitchell
Experience
Training Data
1
Labeled Training Data
Classification
Possible Classifiers
2
Prediction or Regression
The Process
Training Set X1 , Y1 X2 , Y2 X3 , Y3 X4
A note on cluster purity
Delip Rao February 8, 2006
One of the ways of measuring the quality of a clustering solution is cluster purity. Let there be k clusters (the k in k-means) of the dataset D and size of cluster Cj be |Cj |. Let |Cj |class=i denote n
Prediction
Han and Kamber
Witten and Frank
What is prediction?
Output is a continuous valued variable
Predict income given qualifications
Predict mean time to failure given attributes
Similar to classification
Similar approaches work here also
Regression
Nave Bayes Classifier
Objective: Developing a Nave Bayes classifier for the spam mail dataset. Dataset: Download the dataset from http:/www.ics.uci.edu/~mlearn/databases/spambase/spambase_data The dataset consists of 57 features extracted from email along
Kernel Methods for
Pattern Analysis
Dr. C. Chandra Sekhar
Dept. of Computer Science and Engineering
Indian Institute of Technology Madras
Chennai-600036
chandra@cse.iitm.ernet.in
1
Outline of the Talk
Support vector machines for classification
Support vec
Instance Based Methods
References
Han and Kamber
Mitchell
Lazy Learning
Rote learning!
Just remember training points (or some subset
of them)
Work done while answering queries
Pick set of nearest points
Majority classification
Local linear fit
Eager learn
Midsem Exam CS 672
Data Mining
Time: 2 hours
Max. marks: 60
1. (a) (5 marks) Draw a Bayesian Network that represents the Naive
Bayes assumption.
(b) (6 marks) What kind of bias is present in a polynomial kernel
SVM? Specically discuss the cases of languag
Machine Learning
A
Brief
Introduction
B. Ravindran
RISE Group
Reconfigurable and Intelligent Systems Engineering
Dept. of CSE, IIT Madras
What is Machine Learning?
". said to learn from experience with respect to some class of tasks, and a performance me
Data Preprocessing
Chapter 2
Han and Kamber
Outline
Why pre-process
Data Cleaning
Missing Values
Inconsistencies
Normalization
Data Transformation
Data Reduction
Feature Selection
Dimensionality reduction
Data Visualization
Summarization
1
Why Pre-process
Classification - Rules
Witten and Frank
Han and Kamber
Why Rules?
More intuitive
Expert evaluation
Incorporating prior knowledge
Maintenance
Fast
Well studied
Frequently used as a rough estimate
Represent the knowledge in the form of IF-THEN
rules
R: IF a
Bayesian Classification
Mitchell
Duda, Hart, and Stork
Russell and Norvig
Han and Kamber
Inductive Learning
Learning from examples
Can never be sure of class label!
Try to find the most probable class label
given the input
Bayesian learning gives a formal
PageRank
Announcements
HW3 posted
Due 1st march
HW2 solution also available
Graph Data: Social Networks
Facebook social graph
4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011]
Graph Data: Information Nets
Citation networks and Maps of