In order to de emphasize the impact of very large absolute values of e ij a new

# In order to de emphasize the impact of very large

This preview shows page 426 - 428 out of 518 pages.

In order to de-emphasize the impact of very large absolute values of e ij , a new set of error terms is defined: ǫ ij = braceleftBigg e ij if | e ij | ≤ Δ f ( | e ij | ) if | e ij | > Δ (12.14) Here Δ is a user-defined threshold, which defines the case when an entry becomes large. f ( | e ij | ) is a damped (i.e., sublinear) function of | e ij | satisfying f (Δ) = Δ. This condition ensures that ǫ ij is a continuous function of e ij at e ij = ± Δ. The damping ensures that large values of the error are not given undue importance. An example of such a damped function is as follows: f ( | e ij | ) = radicalBig Δ(2 | e ij | − Δ) (12.15) This type of damped function has been used in [ 428 ]. The objective function for robust matrix factorization then replaces the error values e ij with the adjusted values ǫ ij as follows: Minimize J robust = 1 2 summationdisplay ( i,j ) S ǫ 2 ij + λ 2 m summationdisplay i =1 k summationdisplay s =1 u 2 is + λ 2 n summationdisplay j =1 k summationdisplay s =1 v 2 js An iterative re-weighted least-squares algorithm, which is described in [ 426 ], is used for the optimization process. Here, we describe a simplified algorithm. The first step is to compute the gradient of the objective function J robust with respect to each of the decision variables: ∂J robust ∂u iq = 1 2 summationdisplay j :( i,j ) S ∂ǫ 2 ij ∂u iq + λu iq , i ∈ { 1 . . . m } , q ∈ { 1 . . .k } ∂J robust ∂v jq = 1 2 summationdisplay i :( i,j ) S ∂ǫ 2 ij ∂v jq + λv jq j ∈ { 1 . . . n } , q ∈ { 1 . . .k }
12.5. STRATEGIES FOR ROBUST RECOMMENDER DESIGN 407 Note that the aforementioned gradients contain a number of partial derivatives with respect to the decision variables. The value of ∂ǫ 2 ij ∂u iq can be computed as follows: ∂ǫ 2 ij ∂u iq = braceleftBigg 2 · e ij ( v jq ) if | e ij | ≤ Δ 2 · Δ · sign( e ij )( v jq ) if | e ij | > Δ Here, the sign function takes on the value of +1 for positive quantities and 1 for negative quantities. The case-wise description of derivative can be consolidated to simplified form as follows: ∂ǫ 2 ij ∂u iq = 2 · min {| e ij | , Δ } · sign( e ij ) · ( v jq ) It is noteworthy that the gradient is damped when the error is larger than Δ. This damping of the gradient directly makes the approach more robust to a few large errors in the ratings matrix. Similarly, we can compute the partial derivative with respect to v jq as follows: ∂ǫ 2 ij ∂v jq = braceleftBigg 2 · e ij ( u iq ) if | e ij | ≤ Δ 2 · Δ · sign( e ij )( u iq ) if | e ij | > Δ As before, it is possible to consolidate this derivative as follows: ∂ǫ 2 ij ∂v jq = 2 · min {| e ij | , Δ } · sign( e ij ) · ( u iq ) One can now derive the update steps as follows, which need to be executed for each user i and each item j : u iq u iq + α summationdisplay j :( i,j ) S min {| e ij | , Δ } · sign( e ij ) · v jq λ · u iq i, q ∈ { 1 . . .k } v jq v jq + α summationdisplay i :( i,j ) S min {| e ij | , Δ } · sign( e ij ) · u iq λ · v jq j, q ∈ { 1 . . .k } These updates are performed to convergence. The aforementioned steps correspond to global updates. These updates can be executed within the algorithmic framework of gradient descent (cf. Figure 3.8 of Chapter 3 ).

#### You've reached the end of your free preview.

Want to read all 518 pages?

• Fall '19
• Collaborative filtering, Cold start, Recommender system