DAGM’04 Pattern Recognition Symposium
,T¨ubingen, Germany, Aug 2004.
Scale-Invariant Object Categorization using a
Scale-Adaptive Mean-Shift Search
and Bernt Schiele
Perceptual Computing and Computer Vision Group, ETH Zurich, Switzerland
Multimodal Interactive Systems, TU Darmstadt, Germany
The goal of our work is object categorization in real-world scenes.
That is, given a novel image we want to recognize and localize unseen-before
objects based on their similarity to a learned object category. For use in a real-
world system, it is important that this includes the ability to recognize objects at
In this paper, we present an approach to multi-scale object categorization using
scale-invariant interest points and a scale-adaptive Mean-Shift search. The ap-
proach builds on the method from , which has been demonstrated to achieve
excellent results for the single-scale case, and extends it to multiple scales. We
present an experimental comparison of the influence of different interest point
operators and quantitatively show the method’s robustness to large scale changes.
Many current object detection methods deal with the scale problem by performing an
exhaustive search over all possible object positions and scales [17–19]. This exhaus-
tive search imposes severe constraints, both on the detector’s computational complexity
and on its discriminance, since a large number of potential false positives need to be
excluded. An opposite approach is to let the search be guided by image structures that
give cues about the object scale. In such a system, an initial interest point detector tries
to find structures whose extend can be reliably estimated under scale changes. These
structures are then combined to derive a comparatively small number of hypotheses for
object locations and scales. Only those hypotheses that pass an initial plausibility test
need to be examined in detail. In recent years, a range of scale-invariant interest point
detectors have become available which can be used for this purpose [13–15,10].
In this paper, we apply this idea to extend the method from [12,11]. This method
has recently been demonstrated to yield excellent object detection results and high ro-
bustness to occlusions . However, it has so far only been defined for categorizing
objects at a known scale. In practical applications, this is almost never the case. Even
in scenarios where the camera location is relatively fixed, objects of interest may still
exhibit scale changes of at least a factor of two simply because they occur at different
distances to the camera. Scale invariance is thus one of the most important properties
for any system that shall be applied to real-world scenarios without human intervention.
This paper contains four main contributions: (1) We extend our approach from [12,