Video generation consists of generating a video sequence so that an object in a source image is animated according to some external information (a conditioning label or the motion of a driving video). In this talk I will present some of our recent achievements adressing these specific aspects: 1) generating facial expressions, e.g., smiles that are different from each other (e.g., spontaneous, tense, etc.) using diversity as the driving force. 2) generating videos without using any annotation or prior information about the specific object to animate. Once trained on a set of videos depicting objects of the same category (e.g. faces, human bodies), our method can be applied to any object of this class. To achieve this, we decouple appearance and motion information using a self-supervised formulation. To support complex motions, we use a representation consisting of a set of learned keypoints along with their local affine transformations. A generator network models occlusions arising during target motions and combines the appearance extracted from the source image and the motion derived from the driving video. Our solutions score best on diverse benchmarks and on a variety of object categories.
Nicu Sebe is a professor in the University of Trento, Italy, where he is leading the research in the areas of multimedia information retrieval and human-computer interaction in computer vision applications. He received his PhD from the University of Leiden, The Netherlands and has been involved in the past with the University of Amsterdam, The Netherlands and the University of Illinois at Urbana-Champaign, USA.