Realistic face animation for speech

Prof. Luc van Gool (ETH Zurich, Switzerland and KU Leuven, Belgium)

The realistic animation of faces still is a challenge. We are all experts in the detection of the slightest deviations from natural performances, but unfortunately equally bad at pointing out the specific flaws that gave the animation away. In deciding on an appropriate approach, there are several choices to be made. Depending on the flexibility that is needed in integrating the animation in 3D worlds, a truly 3D approach may be necessary, or a 2D or 2.5D approach may do. We have chosen a 3D approach. Given this direction, one can distinguish between work that starts from detailed sub-skin anatomy, or from 3D exterior appearances. We start from detailed, learned 3D deformations that are observed during speech. To that end about 20 3D face snapshots per second have been taken for several test persons, while they were speaking.

Independent Component Analysis allows for an efficient and effective representation of the relevant deformation modes. A speech animation then amounts to varying the IC coefficients appropriately over time. Starting from aural speech and a transcription from phonemes to so-called visemes, these variations are determined fully automatically. Co-articulation effects are taken account of as well. Another issue is to automatically adapt the deformations to the fysiognomy of a face for which no 3D dynamics could be observed. Our animation pipeline allows to mix visemes of different test subjects, so that the visemes for a novel face come closer to those observed from more similar faces. The mix is determined in Face Space. In this space, also purely synthetic characters can be produced and subsequently animated.