Recognizing people and their actions in videos.

Josef Sivic
(Ecole Normale Superieure, Paris, France)

Abstract:

Automatic recognition of people, their actions as well as interactions in videos is still a very challenging problem. Common actions, such as 'drinking coffee' or 'opening a door', can be performed in a different manner by different individuals in different scenes. In addition, collecting training samples from realistic videos, such as feature length movies, is a difficult and time-consuming process as large quantities of video need to be manually inspected.

First, I show that face-based person recognizers as well as human action detectors can be automatically learned from videos together with readily-available but imprecise and noisy text annotation in the form of movie scripts and subtitles. Second, I describe an intermediate-level video representation for recognition, where video is decomposed into spatio-temporal segments that incorporate long-range motion cues in the form of groups of point-tracks with coherent motion.

Results will be shown on challenging videos from feature length movies.

Joint work with K. Alahari, F. Bach, M. Everingham, O. Duchenne, J. Lezama, I. Laptev, J. Ponce and A. Zisserman