From Images to Scenes: using lots of data to infer geometric, photometric and semantic scene properties from a single image.

Alexei (Alyosha) Efros
(Carnegie Mellon University)

Reasoning about a scene from a photograph is an inherently ambiguous task. This is because a single image in itself does not carry enough information to disambiguate the world that it's depicting. Of course, humans have no problems understanding photographs because of all the prior visual experience they can bring to bear on the task. How can we help computers do the same? We propose to "brute force" the problem by using massive amounts of visual data, both labeled and unlabeled, as a way of capturing the statistics of the natural world.

In this talk, I will present some of our recent results on inferring geometric, photometric, and semantic scene properties from a single image. I will first briefly describe our system for estimating the rough geometric surface layout of a scene as well as the camera viewpoint. I will show how this information, in turn, can be useful for modeling objects in the scene. Next, I will describe a very simple way of using the surface layout information as a way of estimating a rough illumination map for the scene. Finally, I will describe a new system that uses millions of unlabeled photographs from Flickr to capture some implicit semantic scene structure of an image.

Applications of our methods to computer graphics might be shown.