CSC 411 / CSC D11
Classification
8
Classification
In classification, we are trying to learn a map from an input space to some finite output space. In
the simplest case we simply detect whether or not the input has some property or not. For example,
we might want to determine whether or not an email is spam, or whether an image contains a face.
A task in the health care field is to determine, given a set of observed symptoms, whether or not a
person has a disease. These detection tasks are
binary classification
problems.
In
multiclass classification
problems we are interested in determining to which of multiple
categories the input belongs. For example, given a recorded voice signal we might wish to rec
ognize the identity of a speaker (perhaps from a set of people whose voice properties are given in
advance). Another well studied example is optical character recognition, the recognition of letters
or numbers from images of handwritten or printed characters.
The input
x
might be a vector of real numbers, or a discrete feature vector. In the case of
binary classification problems the output
y
might be an element of the set
{−
1
,
1
}
, while for
a multidimensional classification problem with
N
categories the output might be an integer in
{
1
,...,N
}
.
The general goal of classification is to learn a
decision boundary
, often specified as the level
set of a function, e.g.,
a
(
x
) = 0
. The purpose of the decision boundary is to identity the regions
of the input space that correspond to each class. For binary classification the decision boundary is
the surface in the feature space that separates the test inputs into two classes; points
x
for which
a
(
x
)
<
0
are deemed to be in one class, while points for which
a
(
x
)
>
0
are in the other. The
points on the decision boundary,
a
(
x
) = 0
, are those inputs for which the two classes are equally
probable.
In this chapter we introduce several basis methods for classification. We focus mainly on
on binary classification problems for which the methods are conceptually straightforward, easy
to implement, and often quite effective. In subsequent chapters we discuss some of the more
sophisticated methods that might be needed for more challenging problems.
8.1
Class Conditionals
One approach is to describe a “generative” model for each class. Suppose we have two mutually
exclusive classes
C
1
and
C
2
. The prior probability of a data vector coming from class
C
1
is
P
(
C
1
)
,
and
P
(
C
2
) = 1
−
P
(
C
1
)
. Each class has a distribution for its data:
p
(
x

C
1
)
, and
p
(
x

C
2
)
. In other
words, to sample from this model, we would first randomly choose a class according to
P
(
C
1
)
,
and then sample a data vector
x
from that class.
Given labeled training data