CSCD11 Machine Learning and Data Mining, Fall 2010
Assignment 3: Classification and Bayesian Methods
Due Thursday, November 18, 3pm (before tutorial)
Note: This assignment comprises two theoretical questions and one programming question. For the
theoretical questions, either handwritten or computer formatted answers should be handed in on
paper. Please make sure that handwritten solutions are legible. For the programming part of this
assignment you will write several functions and one main script in Matlab. You will hand in a tar
file containing these files electronically. Parts of Question 3 which ask for written responses can be
answered as comments in your Matlab scipt.
1. Bayesian Prediction [18 marks; 2 marks per part]
Suppose you visit a province where license plates numbers are numbered sequentially. After seeing some cars
go by on the road and reading their license plate numbers, you wonder: how many cars in total are there? What
might the next number I see be?
To formalize the problem, we assume that all cars in the province are numbered from
1
to
L
, where
L
is the
largest licence plate number. Let
M
be the largest possible value of
L
. To make things simple, we’ll assume
that license plate numbers are three digits, so that
M
= 999
. We assume that all values of
L
are equally likely,
so our prior for
L
is a uniform distribution from
1
to
M
. Furthermore, we assume that, when we see a new car,
we are equally likely to see any of the
L
cars out there, so the likelihood of seeing licence plate number
X
is
also uniform. Our observations will be the numbers
X
i
of the
N
cars we see go by.
To specify the model, we define
f
(
Z, A, B
) =
braceleftBigg
1
B

A
+1
A
≤
Z
≤
B
0
otherwise
(1)
P
(
L
)
=
f
(
L,
1
, M
)
(the prior)
(2)
P
(
X

L
)
=
f
(
X,
1
, L
)
(the likelihood of a single license plate number
X
)
(3)
P
(
X
1:
N

L
)
=
N
productdisplay
i
=1
P
(
X
i

L
)
(the likelihood of observing numbers
X
1:
N
)
(4)
Additionally, define
X
max
=
max
X
1:
N
(5)
to be the largest license plate number observed.
