Realistic face animation for speech
Prof. Luc van Gool (ETH Zurich, Switzerland and KU Leuven, Belgium)
The realistic animation of faces still is a challenge. We are all
experts in the detection of the slightest deviations from natural
performances, but unfortunately equally bad at pointing out the specific
flaws that gave the animation away. In deciding on an appropriate
approach, there are several choices to be made. Depending on the
flexibility that is needed in integrating the animation in 3D worlds, a
truly 3D approach may be necessary, or a 2D or 2.5D approach may do. We
have chosen a 3D approach. Given this direction, one can distinguish
between work that starts from detailed sub-skin anatomy, or from 3D
exterior appearances. We start from detailed, learned 3D deformations
that are observed during speech. To that end about 20 3D face snapshots
per second have been taken for several test persons, while they were
speaking.
Independent Component Analysis allows for an efficient and
effective representation of the relevant deformation modes. A speech
animation then amounts to varying the IC coefficients appropriately over
time. Starting from aural speech and a transcription from phonemes to
so-called visemes, these variations are determined fully automatically.
Co-articulation effects are taken account of as well. Another issue is
to automatically adapt the deformations to the fysiognomy of a face for
which no 3D dynamics could be observed. Our animation pipeline allows to
mix visemes of different test subjects, so that the visemes for a novel
face come closer to those observed from more similar faces. The mix is
determined in Face Space. In this space, also purely synthetic
characters can be produced and subsequently animated.