Mean Shift Is a Bound Optimization
Mark Fashing and Carlo Tomasi,
—We build on the current understanding of mean shift as an optimization
procedure. We demonstrate that, in the case of piecewise constant kernels, mean
shift is equivalent to Newton’s method. Further, we prove that, for all kernels, the
mean shift procedure is a quadratic bound maximization.
—Mean shift, bound optimization, Newton’s method, adaptive
gradient descent, mode seeking.
MEAN shift is a nonparametric, iterative procedure introduced by
Fukunaga and Hostetler  for seeking the mode of a density
function represented by a set
of samples. The procedure uses so-
, which are decreasing functions of the distance from a
to a point
For every point
in a given set
, the sample means of all points
weighted by a kernel at
are computed to form a new version
. This computation is repeated until convergence. The resulting
contains estimates of the modes of the density underlying
. The procedure will be reviewed in greater detail in Section 2.
Cheng  revisited mean shift, developing a more general
formulation and demonstrating its potential uses in clustering and
global optimization. Recently, the mean shift procedure has met
with great popularity in the computer vision community. Applica-
tions range from image segmentation and discontinuity-preserving
smoothing ,  to higher level tasks like appearance-based
clustering ,  and blob tracking .
Despite the recent popularity of mean shift, few attempts have
been made since Cheng  to understand the procedure
theoretically. For example, Cheng  showed that mean shift is
an instance of gradient ascent and also notes that, unlike naı¨ve
gradient ascent, mean shift has an adaptive step size. However, the
basis of step size selection in the mean shift procedure has
remained unclear. We show that, in the case of piecewise constant
kernels, the step is exactly the Newton step and, in all cases, it is a
step to the maximum of a quadratic bound.
Another poorly understood area is that of mean shift with an
evolving sample set. Some variations on the mean shift procedure
use the same set for samples and cluster centers. This causes the
sample set to evolve over time. The optimization problem solved
by this variation on mean shift has yet to be characterized.
In this paper, we build on the current understanding of mean
shift as an optimization procedure. Fukunaga and Hostetler 
suggested that mean shift might be an instance of gradient ascent.
Cheng  clarified the relationship between mean shift and
optimization by introducing the concept of the
showed that mean shift is an instance of gradient ascent with an
adaptive step size. We explore mean shift at a deeper level by
examining not only the gradient, but also the Hessian of the
shadow kernel density estimate. In doing so, we establish a
connection between mean shift and the Newton step and we