The model selection problem
Objective
Cross validation
I
Often necessary to consider many different models (e.g., types of
classifiers) for a given problem.
I
Sometimes model simply means particular setting of hyper-parameters
(e.g., k in k-NN, number of
Learning from data
Overview
I
Machine learning: study of computational mechanisms that learn from
data in order to make predictions and decisions.
1 / 21
Example 1: image classification
2 / 21
Example 2: recommender system
I
I
I
I
Birdwatcher takes pictur
Nearest neighbor search
I
Nearest neighbor search
Nave implementation of NN classifiers based on n labeled examples
requires n distance computations to compute the prediction on any test
point x X .
I
If using Euclidean distance in Rd , then each distance
Binomial distribution
Number of heads when a coin with heads bias p [0, 1] is tossed n times:
binomial distribution
S Bin(n, p) .
Probability mass function: for any k cfw_0, 1, 2, . . . , n,
!
n k
P(S = k) =
p (1 p)nk .
k
Binomial distribution
0.1
Pr[S=k]
Decision trees
Directly optimize tree structure for good classification.
A decision tree is a function f : X Y, represented by a binary tree in which:
Decision trees
I
I
Each tree node is associated with a splitting rule g : X cfw_0, 1.
Each leaf node is
Parametric models
I
A (statistical) model P = cfw_P : is a family of probability
distributions indexed by a set (the parameter space).
I
In a parametric (statistical) model, the distributions are indexed by a finite
number of parameters (i.e., Rk for som