Object Detection in Crowded Scenes

Bastian Leibe (ETH Zurich, Switzerland)

The detection of object classes in real-world images is a challenging problem which is further complicated by the effects of overlaps and partial occlusions. We present a novel algorithm which addresses this problem by considering object categorization and top-down segmentation as two interleaved processes that closely collaborate towards a common goal. As we will show, the close coupling between those two processes allows our method to accumulate additional evidence about object hypotheses and resolve ambiguities caused by overlaps and partial visibility.

The core part of our approach is a flexible formulation for object shape that can combine the information observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach can detect categorical objects in novel images and automatically infer a top-down segmentation from the recognition result. The segmentation is then used to again improve recognition by allowing the system to focus on object pixels and discard misleading influences from the background. Moreover, the information from where in the image a hypothesis draws its support is used in an MDL based verification stage to resolve ambiguities between overlapping hypotheses and factor out the effects of partial occlusion.

As an application, we address the problem of detecting objects such as cars, motorbikes, and pedestrians in real-world street scenes. Qualitative and quantitative results on several challenging data sets confirm that our method is able to reliably detect objects in crowded scenes, even when they overlap and partially occlude each other. In addition, the flexible nature of our approach allows it to operate on very small training sets.