1 .1 [3 poi nte]
Amunic we have a decision tree to clemify hires}: vectors oi Length 1M {that is. each input ls oi sine mm.
Gen we specify a LEN that uonhi mutt in em]? the sinie cleseiﬂcetlon an our decision tree? Ii so. ﬂplﬂl‘il
why. Ii not. either expl
L [2 points] If two hilw'y Imulﬂnl variahlrs X and Y an; inllcpcmleut. urn.-
.i' {X L‘s the complement n[ X} and ‘I' aim intian-mleut? 11mm: 3.12"]: L'Iuinl.
Yes they am. PDT: n r} = my} — Pp: n y] 2 Pp'} — P{X]PI[Y] :
[1— I’{X}j!’{}"} = put-my].
l'hJ
.
1.1 Probability distribution
Distinction between pi"ol:nal:rilit\,.I densit‘.r {continuous} and mass ldiscrete].
1.2 Mean and variance
1
- 1
Eat} = I xp{x}c|x = Egm = E
1)2_1 1_ 1
‘12 15-49
1
Var-[X] = E012) — [EICQ]2 = [3:21:23 —
1. 3 5a m pling
Let X =
1.1 Probability distribution [2 points]
Mickey is a novice in probability theory and is facing a seeming paradox. Con-
sider a uniform prlzlbabilitg.r distribution of variable X on the interval [IL
namer f = 2. for :r: E Ell. Given that probabilities und
1. Logistic regression
1.1 Logistic vs linear regression
1.1.1 The simplest option is to set a threshold T s.t. class affiliation is
determined by comparing W): with T, e.g. mt s: T —> classl. The
drawbacks are mainly two 1] susceptibility to outliers 2]
The Laplace Approximation
Sargur N. Srihari
University at Buffalo, State University of New York
USA
Machine Learning
Srihari
Laplaces Method
Approximate bintegrals of the form
e Mf ( x ) dx
a
Assume f(x) has global maximum at x0
Then f(x0) > other valu
Linear Models for Classification
Discriminant Functions
Sargur N. Srihari
University at Buffalo, State University of New York
USA
1
Machine Learning
Srihari
Topics
Linear Discriminant Functions
Definition (2-class), Geometry
Generalization to K > 2 cla
Machine Learning
!
!
!
!
!Srihari
Linear Models for Regression
Sargur Srihari
srihari@cedar.buffalo.edu
1
Machine Learning
!
!
!
!
!Srihari
Linear Regression with One Input
Simplest form of linear regression:
linear function of single input variable
Tas
Machine Learning
Srihari
Discrete Probability
Distributions
Sargur N. Srihari
1
Machine Learning
Srihari
Binary Variables
Bernoulli, Binomial and Beta
2
Machine Learning
Srihari
Bernoulli Distribution
Expresses distribution of Single binary-valued random
Machine Learning
!
!
!
!
!Srihari
Bias-Variance Decomposition
Choosing in maximum likelihood/least
squares estimation
Five part discussion:
1. On-line regression demo
2. Point estimate
Chinese Emperors Height
3. Formulation for regression
4. Example
5. Ch
Machine Learning
srihari
Gaussian Distribution
Sargur N. Srihari
1
Machine Learning
srihari
The Gaussian Distribution
Carl Friedrich Gauss
1777-1855
For single real-valued variable x
N(x | , 2 ) =
1
1
exp 2 (x ) 2
2
(2 2 )1/ 2
Parameters:
68% of data
Machine Learning
Srihari
The Hessian Matrix
Sargur Srihari
1
Machine Learning
Srihari
Denitions of Gradient and Hessian
First derivative of a scalar function E(w) with respect to a
vector w=[w1,w2]T is a vector called the Gradient of E(w)
d
E(w) =
E(w) =
Machine Learning
Srihari
Error Backpropagation
Sargur Srihari
1
Machine Learning
Srihari
Topics
Neural Network Learning Problem
Need for computing derivatives of Error function
Forward propagation of activations
Backward propagation of errors
Statement of
Machine Learning
!
!
!
!
!
Srihari
Genetic Inheritance and
Bayesian Networks
Sargur Srihari
srihari@cedar.buffalo.edu
1
Machine Learning
!
!
!
!
!
Srihari
Genetics Pedigree Example
One of the earliest uses of Bayesian
Networks
Before general framework w