Training object class detectors from eye tracking data.

Vittorio Ferrari (University of Edinburgh, UK)

A central task in Computer Vision is detecting object classes such as cars and horses in complex scenes. Training an object class detector typically requires a large set of images annotated with bounding-boxes, which is expensive and time consuming to create. In this task I will present a novel approach to annotate object locations which can substantially reduce annotation time. We first track the eye movements of annotators instructed to find the object and then propose a technique for deriving object bounding-boxes from these fixations. To validate this idea, we collected eye tracking data for 10 object classes of the Pascal VOC 2012 benchmark (6270 images, 5 observers). Our technique correctly produces bounding-boxes in 50% of the images, while reducing the total annotation time by factor 7x, compared to drawing bounding-boxes. Any standard object class detector can be trained on the bounding-boxes predicted by our model. Our large scale eye tracking dataset is available here.