Human Pose Estimation and Segmentation in Videos

Karteek Alahari (INRIA Grenoble, France)

Abstract:

The first part of the talk presents a method to obtain pixel-wise segmentation and pose estimation of multiple people in the context of stereoscopic videos. This task involves challenges such as dealing with unconstrained stereoscopic video, non-stationary cameras, and complex indoor and outdoor dynamic scenes. We cast the problem as a discrete labelling task involving multiple person labels, devise a suitable cost function, and optimize it efficiently. The model incorporates person detection, pose estimation, as well as colour, motion, and disparity cues. We also introduce a stereoscopic dataset with frames extracted from feature-length movies "StreetDance 3D" and "Pina".

The second part will show the use of temporal constraints for estimating articulated human poses in videos, which is also cast as an optimization problem. We present a new approximate scheme to solve it, with two steps dedicated to pose estimation. First, our approach takes into account temporal links with subsequent frames for the less-certain parts, namely elbows and wrists. Second, our method decomposes poses into limbs, generates limb sequences across time, and recomposes poses by mixing these body part sequences.