Human Pose Estimation and Segmentation in Videos
Karteek Alahari
(INRIA Grenoble, France)
Abstract:
The first part of the talk presents a method to obtain pixel-wise
segmentation and pose estimation of multiple people in the context of
stereoscopic videos. This task involves challenges such as dealing with
unconstrained stereoscopic video, non-stationary cameras, and complex
indoor and outdoor dynamic scenes. We cast the problem as a discrete
labelling task involving multiple person labels, devise a suitable cost
function, and optimize it efficiently. The model incorporates person
detection, pose estimation, as well as colour, motion, and disparity
cues. We also introduce a stereoscopic dataset with frames extracted
from feature-length movies "StreetDance 3D" and "Pina".
The second part will show the use of temporal constraints for
estimating articulated human poses in videos, which is also cast as an
optimization problem. We present a new approximate scheme to solve it,
with two steps dedicated to pose estimation. First, our approach takes
into account temporal links with subsequent frames for the less-certain
parts, namely elbows and wrists. Second, our method decomposes poses
into limbs, generates limb sequences across time, and recomposes poses
by mixing these body part sequences.