(c) We can set
~w
0
such that
w
0
j
=
w
j
σ
j
,
b
0
=
b
+
~w
·
~μ
. Then we have
y
i
(
~w
0
·
~
z
i
+
b
0
)
=
y
i
(
d
X
j
=1
w
0
j
z
ij
+
b
0
)
=
y
i
(
d
X
j
=1
w
j
σ
j
(
x
ij

μ
j
σ
j
) +
b
+
~w
·
~μ
)
=
y
i
(
d
X
j
=1
w
j
x
ij

d
X
j
=1
w
j
μ
j
+
b
+
~w
·
~μ
)
=
y
i
(
~w
·
~x
i

~w
·
~μ
+
b
+
~w
·
~μ
)
=
y
i
(
~w
·
~x
i
+
b
)
>
0
(1)
Thus a linearly separable dataset is still separable after standardization.
2
More than Average
(a) After 3 iterations. Number of mistakes made: 2,2,1,0,0
(b) 1. The original ﬁle list all the positive examples followed by all the negative ones.
When the algorithm sweeps through the positive examples there are no updates from
the negative examples, and vice versa. Therefore it takes more passes to collect enough
updates from positive and negative examples for the perceptron to converge. However,
although the number of epochs increases, the number of updates made could be similar.
(c)
training:
[0.82174999999999998, 0.82837499999999997, 0.82206250000000003, 0.79881250000000004,
0.80199999999999994, 0.80924999999999991, 0.82174999999999998,
0.81587500000000013, 0.8183125, 0.82081250000000006]
test:
[0.82375000000000009, 0.82850000000000001, 0.81874999999999998, 0.79225000000000001,
0.79499999999999993, 0.80075000000000007, 0.82275000000000009,