This talk targets the automatic recognition of human actions in videos. Human
action recognition is defined as a requirement to determine what human actions
occur in videos. This problem is particularly hard due to enormous variations
visual and motion appearance of people and actions, camera viewpoint changes,
moving background, occlusions, noise, and enormous amount of video data.
Firstly, I will present two local spatio-temporal descriptors for action
recognition in videos. The first descriptor is based on a covariance matrix
representation, and it models linear relations between low-level features. The
second descriptor is based on a Brownian covariance and it models all kinds of
possible relations between low-level features.
Then, I will talk about two higher-level feature representations to go beyond
the limitations of the local feature encoding techniques.
The first representation is based on the idea of relative dense trajectories. I
will present an object-centric local feature representation of motion
trajectories, which allows to use the spatial information by a local feature
The second representation captures statistics of pairwise co-occurring visual
words within multi-scale feature-centric neighborhoods. The proposed contextual
features based representation encodes information about local density of
features, local pairwise relations among the features, and spatio-temporal
Finally, I will show that the proposed techniques obtain better or similar
performance in comparison to the state-of-the-art on various, real, and
challenging human action recognition datasets (Weizmann, KTH, URADL, MSR Daily
Activity 3D, UCF50, HMDB51, and CHU Nice Hospital).
Dr. Piotr Tadeusz Biliński is a Post-Doctoral Research Fellow at STARS team at
INRIA Institute, Sophia Antipolis Research Center, France. He obtained his
Bachelor's Degree in 2008 and Master's Degree in 2009 from Poznan University of
Technology in Poland. He has been working on Human Action Recognition in Videos
since 2010, under the supervision of Francois Bremond. In 2013, he was a
Research Intern at the Microsoft Research in Redmond, United States, where he
was working in Audio and Signal Processing domain, on Computer Vision and
Machine Learning techniques for Head-Related Transfer Function Personalization
using Anthropometric Features. In 2014, he received his Ph.D. Degree from the
University of Nice in France. His Ph.D. Thesis was reviewed by Ram Nevatia,
Frederic Jurie, and Ivan Laptev.