CS584 MACHINE LEARNING
FALL 2016
SYLLABUS
Course Description
Introduce fundamental problems in machine learning. Provide understanding of techniques,
mathematical concepts, and algorithms used in machine learning. Provide understanding of the
limitations
Concept Learning
CS584 Machine Learning
Shlomo Argamon
1
Concept Learning
Goal: Induce a general function
from specific training examples
Concept: spam; training examples:
emails labeled as spam/~spam
Concept: flu; training examples:
patient records l
Rule and LogicBased
Classifiers
CS 584 Machine Learning
RuleBased Classifier
IDEA: Classify records by using a
collection of ifthen rules
Rule:
(Condition) y
where
Condition is a conjunctions of attributes
y is the class label
LHS: rule antecedent
CS584: Machine Learning
Spring 2017
Lecturer: Prof. Shlomo Argamon, SB 237C, [email protected]
Office Hours: Tuesdays, 10:0012:00
Place/Time: SB 104, TTh, 1:503:05
Overview:
This course is about the theory and practice of building computational systems t
VCdimension for
characterizing
classifiers
Note to other teachers and users of
these slides. Andrew would be delighted
if you found this source material useful in
giving your own lectures. Feel free to use
these slides verbatim, or to modify them
to fit
QUESTION 1
We have a coin whose fairness is unknown. We toss it multiple times and get the sequence below:
H, H, T, T, H, T, T, T, T, H
Based on this sequence, with no prior belief, what is the best estimate of the probability of getting H, using this co
Title: Communities and Crime
Abstract: Communities within the United States. The data combines socioeconomic data
from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime
data from the 1995 FBI UCR.

Data Set Characterist
Sequence Learning
Based on slides by Erik Sudderth
Speech Recognition
Given an audio
waveform, would
like to robustly
extract & recognize
any spoken words
Statistical models
can be used to
Provide greater
robustness to noise
Adapt to accent of
differe
CS 584 Machine Learning
Homework Project: Crime Prediction
You have been hired by the FBI to develop predictive models for crime, to help the
Bureau and police departments around the country to use machine learning to
better focus their resources on locat
TOPIC: GRADIENT
OPTIMIZATION
Mustafa Bilgic
http:/www.cs.iit.edu/~mbilgic
https:/twitter.com/bilgicm
CS584 Machine Learning Illinois Institute of Technology Please do not distribute.
CS584 MACHINE LEARNING
FALL 2016
MOTIVATION
Maximize / minimize a functi
Lecture 3
Linear Classification Models
Perceptron
Classification problem
Lets look at the problem of spam filtering
Now anyone can learn how to earn $200  $943 per day or More ! If you can type (hunt and peck is ok to
start) and fill in forms, you can sc
Bayesian Classifier
f:XV, finite set of values
Instances x X can be described as a collection of features
x (x1 , x 2 ,., x n )
x i cfw_0,1
Given an example, assign it the most probable value in V
v MAP argmax v jV P(v j  x) argmax v jV P(v j  x1 , x
Bayesian Decision
Theory
CS584 Machine Learning
Shlomo Argamon
1
2
Probability and Inference
Result of tossing a coin is
cfw_Heads,Tails
Random var X cfw_1,0
Bernoulli: P cfw_X=1 = poX (1 po)(1 X)
Sample: X = cfw_xt t =1.N
Estimation:
po = # cfw_Head
Parametric Estimation
Regression
CS584 Machine Learning
Shlomo Argamon
1
2
Parametric Estimation
X = cfw_ xt t where xt ~ p (x)
Parametric estimation:
Assume a form for p (x  ) and estimate
its sufficient statistics, using X
e.g., N ( , 2) where
= cf
TOPIC: INTRODUCTION
Mustafa Bilgic
http:/www.cs.iit.edu/~mbilgic
https:/twitter.com/bilgicm
CS584 Machine Learning Illinois Institute of Technology Please do not distribute.
CS584 MACHINE LEARNING
FALL 2016
WHAT IS MACHINE LEARNING?
Learning
the acquisiti
TOPIC: LOGIC
Mustafa Bilgic
http:/www.cs.iit.edu/~mbilgic
https:/twitter.com/bilgicm
CS584 Machine Learning Illinois Institute of Technology Please do not distribute.
CS584 MACHINE LEARNING
FALL 2016
OUTLINE
Very basic discussion of logic
Well cover only
TOPIC: CONCEPT LEARNING
Mustafa Bilgic
http:/www.cs.iit.edu/~mbilgic
https:/twitter.com/bilgicm
CS584 Machine Learning Illinois Institute of Technology Please do not distribute.
CS584 MACHINE LEARNING
FALL 2016
MOTIVATION
Induce a general function from sp
TOPIC: DECISION TREES
Mustafa Bilgic
http:/www.cs.iit.edu/~mbilgic
https:/twitter.com/bilgicm
CS584 Machine Learning Illinois Institute of Technology Please do not distribute.
CS584 MACHINE LEARNING
FALL 2016
TREE USES NODES AND LEAVES
2
Credit: Ethem Alp
TOPIC: EXTENDED OUTLINE
Mustafa Bilgic
http:/www.cs.iit.edu/~mbilgic
https:/twitter.com/bilgicm
CS584 Machine Learning Illinois Institute of Technology Please do not distribute.
CS584 MACHINE LEARNING
FALL 2016
EXTENDED OUTLINE
Here is a tentative outline
TOPIC: NEURAL NETWORKS
Mustafa Bilgic
http:/www.cs.iit.edu/~mbilgic
https:/twitter.com/bilgicm
CS584 Machine Learning Illinois Institute of Technology Please do not distribute.
CS584 MACHINE LEARNING
FALL 2016
MOTIVATION
Inspired by neurons in the brain
2
TOPIC: LOGISTIC REGRESSION
Mustafa Bilgic
http:/www.cs.iit.edu/~mbilgic
https:/twitter.com/bilgicm
CS584 Machine Learning Illinois Institute of Technology Please do not distribute.
CS584 MACHINE LEARNING
FALL 2016
LOGISTIC REGRESSION
Learns () directly,
Partial least squares (PLS)
!
Supervised alternative to PCA.
!
Attempts to find set of orthogonal directions that
explain both response and predictors.
Jeff Howbert
Introduction to Machine Learning
Winter 2014
30
PLS algorithm
!
First direction:
Calculat
KNearest Neighbor
Learning
Slides modified from D Chakraborty and R Jang
Different Learning
Methods
Eager Learning
Explicit description of target
function on the whole training set
Instancebased Learning
Learning=storing all training
instances
Classifi
Introduction to
Machine Learning
CS584 Machine Learning
Shlomo Argamon
1
2
What is Machine Learning?
H. Simon: Any process by which a
system improves its performance
M. Minsky: Learning is making useful
changes in our minds
R. Michalsky: Learning is c
Machine Learning
Group
Support Vector Machines
Machine Learning Group
Department of Computer Sciences
University of Texas at Austin
University of Texas at
Austin
Machine Learning
Group
Perceptron Revisited: Linear Separators
Binary classification can be
Dimensionality Reduction
Aarti Singh
Machine Learning 10701/15781
Nov 17, 2010
Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul
1
HighDimensional data
HighDimensions = Lot of Features
Document classification
Features per document =
thousands o
TOPIC: DECISION TREES
Mustafa Bilgic
http:/www.cs.iit.edu/~mbilgic
https:/twitter.com/bilgicm
CS584 Machine Learning Illinois Institute of Technology Please do not distribute.
CS584 MACHINE LEARNING
FALL 2016
TREE USES NODES AND LEAVES
2
Credit: Ethem Alp