Join the SIXD Challenge Google group to stay up to date.
A pose of a rigid object has six degrees of freedom and its full knowledge is required in many robotic and augmented reality applications. The goal of the challenge is to evaluate methods for 6D object pose estimation from RGB or RGB-D images and to establish the state of the art.
The methods are evaluated on the task of 6D localization of a single instance of a single object. All test images are used for the evaluation, even those with multiple instances of the object of interest.
This task, which is a special variant of the 6D localization task described in , allows to evaluate most of the state-of-the-art methods out of the box. The task is relevant for industry, e.g. for an assembly robot when it needs to find a bolt to complete an assembly step. Even if there are multiple bolts in its workspace, the robot needs to know the pose of a single bolt, arbitrarily chosen.
The difficulty of the "multiple instances, find one that you pick" is close to "find the instance in most favorable pose" (least occlusion, unambiguous view). Most methods are expected, but not required, to report the most favorable pose, treating the rest as clutter.
The datasets selected for the challenge were converted to a standard format. All contain 3D object models and training and test RGB-D images. The training images show individual objects from different viewpoints and were either captured by a Kinect-like sensor or obtained by rendering of the 3D object models. The test images were captured in scenes with varying complexity, often with clutter and occlusion. For more information about the datasets, see files dataset_info.md in the respective download folders.
The error of 6D object pose estimates will be measured as described in this document. In short: A slightly modified version of the Visible Surface Discrepancy  will be used as the main pose error function. For legacy reasons, we will use also the Average Distance by Hinterstoisser et al. .
We provide SIXD toolkit with python scripts for reading the standard dataset format, rendering, evaluation etc.
1 Center for Machine Perception, Czech Technical University in Prague, Czech Republic
2 Computer Vision Lab, TU Dresden, Germany
3 Imperial Computer Vision & Learning Lab, Imperial College London, United Kingdom