Mean Shift Is a Bound Optimization
Mark Fashing and Carlo Tomasi,
Member
,
IEEE
Abstract
—We build on the current understanding of mean shift as an optimization
procedure. We demonstrate that, in the case of piecewise constant kernels, mean
shift is equivalent to Newton’s method. Further, we prove that, for all kernels, the
mean shift procedure is a quadratic bound maximization.
Index Terms
—Mean shift, bound optimization, Newton’s method, adaptive
gradient descent, mode seeking.
æ
1I
NTRODUCTION
MEAN shift is a nonparametric, iterative procedure introduced by
Fukunaga and Hostetler [1] for seeking the mode of a density
function represented by a set
S
of samples. The procedure uses so-
called
kernels
, which are decreasing functions of the distance from a
given point
t
to a point
s
in
S
.
For every point
t
in a given set
T
, the sample means of all points
in
S
weighted by a kernel at
t
are computed to form a new version
of
T
. This computation is repeated until convergence. The resulting
set
T
contains estimates of the modes of the density underlying
set
S
. The procedure will be reviewed in greater detail in Section 2.
Cheng [2] revisited mean shift, developing a more general
formulation and demonstrating its potential uses in clustering and
global optimization. Recently, the mean shift procedure has met
with great popularity in the computer vision community. Applica-
tions range from image segmentation and discontinuity-preserving
smoothing [3], [4] to higher level tasks like appearance-based
clustering [5], [6] and blob tracking [7].
Despite the recent popularity of mean shift, few attempts have
been made since Cheng [2] to understand the procedure
theoretically. For example, Cheng [2] showed that mean shift is
an instance of gradient ascent and also notes that, unlike naı¨ve
gradient ascent, mean shift has an adaptive step size. However, the
basis of step size selection in the mean shift procedure has
remained unclear. We show that, in the case of piecewise constant
kernels, the step is exactly the Newton step and, in all cases, it is a
step to the maximum of a quadratic bound.
Another poorly understood area is that of mean shift with an
evolving sample set. Some variations on the mean shift procedure
use the same set for samples and cluster centers. This causes the
sample set to evolve over time. The optimization problem solved
by this variation on mean shift has yet to be characterized.
In this paper, we build on the current understanding of mean
shift as an optimization procedure. Fukunaga and Hostetler [1]
suggested that mean shift might be an instance of gradient ascent.
Cheng [2] clarified the relationship between mean shift and
optimization by introducing the concept of the
shadow kernel
and
showed that mean shift is an instance of gradient ascent with an
adaptive step size. We explore mean shift at a deeper level by
examining not only the gradient, but also the Hessian of the
shadow kernel density estimate. In doing so, we establish a
connection between mean shift and the Newton step and we