Variational Audio-Visual Representation Learning

Tuesday 19th December 2023 17:00 CET


Dr. Xavier Alameda-Pineda


Learning robust and powerful representations is at the core of many problems in multimedia, including content representation, multi-modal fusion, social signals, etc. While the supervised and self-supervised learning paradigms showed great progress in many applications, the learned representations are strongly tailored to one application or domain, and their adaptation to a different scenario or dataset might require large amounts of data, not always available. Deep probabilistic models provide an opportunity to exploit various unsupervised mechanisms that enable several interesting properties. First, they can combined with other deep or shallow probabilistic models within the same methodological framework. Second, they can include unsupervised mixture mechanisms useful for modality and/or model selection on-the-fly. Third, they are naturally suitable not only for unsupervised learning, but also for unsupervised adaptation, thus overcoming a potential domain shift with few data. In this talk, we will discuss the methodology of deep probabilistic models, i.e. variational learning, and showcase their interest for multi-modal applications with auditory and visual data of human activities (speech and motion).

Xavier Alameda-Pineda is a (tenured) Research Scientist at Inria and the Leader of the RobotLearn Team. He obtained the M.Sc. (equivalent) in Mathematics in 2008, in Telecommunications in 2009 from BarcelonaTech, and in Computer Science in 2010 from Univ. Grenoble-Alpes (UGA). He then worked towards his Ph.D. in Mathematics and Computer Science, and obtained it in 2013, from UGA. After a two-year post-doc at the Multimodal Human Understanding Group, at the University of Trento, he was appointed to his current position. Xavier is an active member of SIGMM, a senior member of IEEE, and a member of ELLIS. He is the Coordinator of the H2020 Project SPRING: Socially Pertinent Robots in Gerontological Healthcare and is co-leading the Audio-visual machine perception and interaction for companion robots. He is chair of the Multidisciplinary Institute of Artificial Intelligence. Xavier’s research interests are at the crossroads of machine learning, computer vision, and audio processing for scene and behavior analysis and human-robot interaction.


Meeting ID: 966 6099 7457
Passcode: 405011


More events

Cookie Settings

A AIDA - AI Doctoral Academy may use cookies to remember your login data, collect statistics to optimize the functionality of the site and to perform marketing actions based on your interests.

These cookies are necessary to allow the main functionality of the website and are automatically activated when you use this website.
These cookies allow us to analyze the use of the website, so that we can measure and improve its performance.
Allow you to stay in touch with your social network, share content, send and post comments.

Required Cookies They allow you to personalize the commercial offers that are presented to you, directing them to your interests. They can be own or third party cookies. We warn you that, even if you do not accept these cookies, you will receive commercial offers, but without meeting your preferences.

Functional Cookies They offer a more personalized and complete experience, allow you to save preferences, show you content relevant to your taste and send you the alerts you have requested.

Advertising Cookies Allow you to stay in touch with your social network, share content, send and post comments.