Abstract

EPOS is a new method for estimating the 6D pose of rigid objects with available 3D models from a single RGB input image. The method is applicable to a broad range of objects, including challenging ones with global or partial symmetries.

An object is represented by a set of compact surface fragments which allows handling symmetries in a systematic manner. Correspondences between densely sampled pixels and the fragments are predicted by an encoder-decoder network. At each pixel, the network predicts: (i) the probability of each object's presence, (ii) the probability of the fragments given the object's presence, and (iii) the precise 3D location on each fragment. A data-dependent number of corresponding 3D locations is selected per pixel, and poses of possibly multiple object instances are estimated by an efficient variant of the PnP-RANSAC algorithm.

In the BOP Challenge 2019, the method outperforms all RGB and most RGB-D and D methods on the T-LESS and LM-O datasets. On the YCB-V dataset, it is superior to all competitors, with a large margin over the second-best RGB method.

Code and pre-trained models

Code and models evaluated in the CVPR 2020 paper and in the BOP Challenge 2020 are available on GitHub.

Real-world demo

EPOS was applied frame by frame (no temporal consistency was enforced) to a video of YCB objects captured by a cell phone. The RGB input image is on the left and the 3D object models rendered in the estimated 6D poses on the right. All pose estimates with at least 100 inlier correspondences are shown. Each pose estimate is annotated with the object ID and the number of inlier correspondences.