Variational Audio-Visual Representation Learning

Tuesday 19th December 2023 17:00 CET

Dr. Xavier Alameda-Pineda

ABSTRACT

Learning robust and powerful representations is at the core of many problems in multimedia, including content representation, multi-modal fusion, social signals, etc. While the supervised and self-supervised learning paradigms showed great progress in many applications, the learned representations are strongly tailored to one application or domain, and their adaptation to a different scenario or dataset might require large amounts of data, not always available. Deep probabilistic models provide an opportunity to exploit various unsupervised mechanisms that enable several interesting properties. First, they can combined with other deep or shallow probabilistic models within the same methodological framework. Second, they can include unsupervised mixture mechanisms useful for modality and/or model selection on-the-fly. Third, they are naturally suitable not only for unsupervised learning, but also for unsupervised adaptation, thus overcoming a potential domain shift with few data. In this talk, we will discuss the methodology of deep probabilistic models, i.e. variational learning, and showcase their interest for multi-modal applications with auditory and visual data of human activities (speech and motion).

LECTURER SHORT CV

Xavier Alameda-Pineda is a (tenured) Research Scientist at Inria and the Leader of the RobotLearn Team. He obtained the M.Sc. (equivalent) in Mathematics in 2008, in Telecommunications in 2009 from BarcelonaTech, and in Computer Science in 2010 from Univ. Grenoble-Alpes (UGA). He then worked towards his Ph.D. in Mathematics and Computer Science, and obtained it in 2013, from UGA. After a two-year post-doc at the Multimodal Human Understanding Group, at the University of Trento, he was appointed to his current position. Xavier is an active member of SIGMM, a senior member of IEEE, and a member of ELLIS. He is the Coordinator of the H2020 Project SPRING: Socially Pertinent Robots in Gerontological Healthcare and is co-leading the Audio-visual machine perception and interaction for companion robots. He is chair of the Multidisciplinary Institute of Artificial Intelligence. Xavier’s research interests are at the crossroads of machine learning, computer vision, and audio processing for scene and behavior analysis and human-robot interaction.

ZOOM LINK & PASSCODE

ZOOM

Meeting ID: 966 6099 7457
Passcode: 405011

VIDEO