ORIE 4520: Stochastics at ScaleFall 2015Homework 5: SolutionsSid Banerjee ([email protected])Problem 1: (LSH for Angular Similarity)For any vectorsx, y∈Rd, the angular distance is the angle (in radians) between the two vectors– formally,dθ(x, y) = cos-1x.y||x||2||y||2(where cos-1(·) returns the principle angle, i.e., angles in[0, π]). The (normalized) angular similarity is given bysθ(x, y) = 1-dθ(x, y)/π.We now want to construct a LSH for the angular similarity metric.Consider the followingfamily of hash functions: we first choose a random unit vectorσ(i.e.,σ∈Rdwith||σ||2= 1), andfor any vectorx, definehσ(x) =sgn(x.σ) (i.e., the sign of the dot product ofxandσ). Argue thatfor anyx, y∈Rd, we have:P[hσ(x) =hσ(y)] =sθ(x, y)Hint: For any pairxandyinRd, there is a unique plane passing through the origin containingxandy– convince yourself thatdθ(x, y)is precisely the angle betweenxandyin this plane. Also,given any vectorσ, its dot product withxandyonly depends on the projection ofσon this plane.Now what can you say about the signs of the dot products ofxandywith a random unit vector?
Get answer to your question and much more
ORIE 4520: Stochastics at ScaleFall 2015Homework 5: SolutionsSid Banerjee ([email protected])projection onto the plane ofxandyis above the dashed line in Fig. (1), thenσ.xis positive, whileσ.yis negative.The normal vectorσinstead might extend in the opposite direction, below thedashed line. In that caseσ.xis negative andσ.yis positive, but the signs are still different.On the other hand, the randomly chosen vectorσcould be normal to a hyperplane like thedotted line in Fig. (1). In that case, bothσ.xandσ.yhave the same sign. If the projection ofσextends to the right, then both dot products are positive, while ifσextends to the left, then bothare negative.What is the probability that the randomly chosen vector is normal to a hyperplane that lookslike the dashed line rather than the dotted line? All angles for the line that is the intersection ofthe random hyperplane and the plane ofxandyare equally likely. Thus, the hyperplane will looklike the dashed line with probabilityθ/πand will look like the dotted line otherwise.