In order to deemphasize the impact of very large absolute values of
e
ij
, a new set of error
terms is defined:
ǫ
ij
=
braceleftBigg
e
ij
if

e
ij
 ≤
Δ
f
(

e
ij

)
if

e
ij

>
Δ
(12.14)
Here Δ is a userdefined threshold, which defines the case when an entry becomes large.
f
(

e
ij

) is a
damped
(i.e., sublinear) function of

e
ij

satisfying
f
(Δ) = Δ. This condition
ensures that
ǫ
ij
is a continuous function of
e
ij
at
e
ij
=
±
Δ. The damping ensures that large
values of the error are not given undue importance. An example of such a damped function
is as follows:
f
(

e
ij

) =
radicalBig
Δ(2

e
ij
 −
Δ)
(12.15)
This type of damped function has been used in [
428
]. The objective function for robust
matrix factorization then replaces the error values
e
ij
with the adjusted values
ǫ
ij
as follows:
Minimize
J
robust
=
1
2
summationdisplay
(
i,j
)
∈
S
ǫ
2
ij
+
λ
2
m
summationdisplay
i
=1
k
summationdisplay
s
=1
u
2
is
+
λ
2
n
summationdisplay
j
=1
k
summationdisplay
s
=1
v
2
js
An iterative reweighted leastsquares algorithm, which is described in [
426
], is used for the
optimization process. Here, we describe a simplified algorithm. The first step is to compute
the gradient of the objective function
J
robust
with respect to each of the decision variables:
∂J
robust
∂u
iq
=
1
2
summationdisplay
j
:(
i,j
)
∈
S
∂ǫ
2
ij
∂u
iq
+
λu
iq
,
∀
i
∈ {
1
. . . m
}
,
∀
q
∈ {
1
. . .k
}
∂J
robust
∂v
jq
=
1
2
summationdisplay
i
:(
i,j
)
∈
S
∂ǫ
2
ij
∂v
jq
+
λv
jq
∀
j
∈ {
1
. . . n
}
,
∀
q
∈ {
1
. . .k
}
12.5. STRATEGIES FOR ROBUST RECOMMENDER DESIGN
407
Note that the aforementioned gradients contain a number of partial derivatives with respect
to the decision variables. The value of
∂ǫ
2
ij
∂u
iq
can be computed as follows:
∂ǫ
2
ij
∂u
iq
=
braceleftBigg
2
·
e
ij
(
−
v
jq
)
if

e
ij
 ≤
Δ
2
·
Δ
·
sign(
e
ij
)(
−
v
jq
)
if

e
ij

>
Δ
Here, the sign function takes on the value of +1 for positive quantities and
−
1 for negative
quantities. The casewise description of derivative can be consolidated to simplified form as
follows:
∂ǫ
2
ij
∂u
iq
= 2
·
min
{
e
ij

,
Δ
} ·
sign(
e
ij
)
·
(
−
v
jq
)
It is noteworthy that the gradient is damped when the error is larger than Δ. This damping
of the gradient directly makes the approach more robust to a few large errors in the ratings
matrix. Similarly, we can compute the partial derivative with respect to
v
jq
as follows:
∂ǫ
2
ij
∂v
jq
=
braceleftBigg
2
·
e
ij
(
−
u
iq
)
if

e
ij
 ≤
Δ
2
·
Δ
·
sign(
e
ij
)(
−
u
iq
)
if

e
ij

>
Δ
As before, it is possible to consolidate this derivative as follows:
∂ǫ
2
ij
∂v
jq
= 2
·
min
{
e
ij

,
Δ
} ·
sign(
e
ij
)
·
(
−
u
iq
)
One can now derive the update steps as follows, which need to be executed for each user
i
and each item
j
:
u
iq
⇐
u
iq
+
α
⎛
⎝
summationdisplay
j
:(
i,j
)
∈
S
min
{
e
ij

,
Δ
} ·
sign(
e
ij
)
·
v
jq
−
λ
·
u
iq
⎞
⎠
∀
i,
∀
q
∈ {
1
. . .k
}
v
jq
⇐
v
jq
+
α
⎛
⎝
summationdisplay
i
:(
i,j
)
∈
S
min
{
e
ij

,
Δ
} ·
sign(
e
ij
)
·
u
iq
−
λ
·
v
jq
⎞
⎠
∀
j,
∀
q
∈ {
1
. . .k
}
These updates are performed to convergence. The aforementioned steps correspond to global
updates. These updates can be executed within the algorithmic framework of gradient
descent (cf. Figure
3.8
of Chapter
3
).
You've reached the end of your free preview.
Want to read all 518 pages?
 Fall '19
 Collaborative filtering, Cold start, Recommender system