Lecture by Prof. LP Morency: “Multimodal AI: Understanding Human Behaviors”

The talk was held on 6th April 2021.


Human face-to-face communication is a little like a dance, in that participants continuously adjust their behaviors based on verbal and nonverbal cues from the social context. Today’s computers and interactive devices are still lacking many of these human-like abilities to hold fluid and natural interactions. Leveraging recent advances in machine learning, audio-visual signal processing and computational linguistic, my research focuses on creating computational technologies able to analyze, recognize and predict human subtle communicative behaviors in social context. Central to this research effort is the introduction of new probabilistic models able to learn the temporal and fine-grained latent dependencies across behaviors, modalities and interlocutors. In this talk, I will present some of our recent achievements in multimodal machine learning, addressing five core challenges: representation, alignment, fusion, translation and co-learning.

Short CV

Multimodal Interaction, Machine Learning, Computer Vision: 16k h-index: 68 Speaker’s bio: Louis-Philippe Morency is Associate Professor in the Language Technology Institute at Carnegie Mellon University where he leads the Multimodal Communication and Machine Learning Laboratory (MultiComp Lab). He was formerly research faculty in the Computer Sciences Department at University of Southern California and received his Ph.D. degree from MIT Computer Science and Artificial Intelligence Laboratory. His research focuses on building the computational foundations to enable computers with the abilities to analyze, recognize and predict subtle human communicative behaviors during social interactions. He received diverse awards including AI’s 10 to Watch by IEEE Intelligent Systems, NetExplo Award in partnership with UNESCO and 10 best paper awards at IEEE and ACM conferences. His research was covered by media outlets such as Wall Street Journal, The Economist and NPR.


Lecture by Prof. Andreas Geiger: Towards Robust End-to-End Driving

The talk was held on 23th March 2021.


I will present several recent results of my group on learning robust driving policies that have advanced the state-of-the-art in the CARLA self-driving simulation environment. To generalize across diverse conditions, humans leverage multiple types of situation-specific reasoning and learning strategies. Motivated by this observation, I will first present a framework for learning situational driving policies that effectively captures reasoning under varying types of scenarios and leads to 98% success rate on the CARLA self-driving benchmark as well as state-of-the-art performance on a novel generalization benchmark. Next, I will discuss the problem of covariate shift in imitation learning. I will demonstrate that existing data aggregation techniques for addressing this problem have poor generalization performance, and present a novel approach with empirically better generalization performance. Finally, I will talk about the importance of intermediate representations and attention for learning robust self-driving models.

Short CV

Andreas Geiger is professor at the University of Tübingen and group leader at the Max Planck Institute for Intelligent Systems. Prior to this, he was a visiting professor at ETH Zürich and a research scientist at MPI-IS. He studied at KIT, EPFL and MIT and received his PhD degree in 2013 from KIT. His research interests are at the intersection of 3D reconstruction, motion estimation, scene understanding and sensory-motor control. He maintains the KITTI vision benchmark and coordinates the ELLIS PhD and PostDoc program. Website: http://www.cvlibs.net/


Lecture by Prof. Björn Schuller: There will be Artificial Emotional Intelligence

The talk was held on 9th March 2021.


Computers are still largely not connotated with emotional intelligence – even more than two decades after the kick-off of the Affective Computing as the core discipline in this regard. Yet, recently significant advancement took place in the recognition of human emotion and generation of simulated emotional behaviour by computing devices increasingly lending them “Artificial Emotional Intelligence”. This can open up a rich selection of exciting applications to become reality such as completely changing how we interact with computing devices. In this talk, we will dive deep into the latest developments in multimodal Affective Computing from the AI perspective. This includes self-learning of neural architectures by AutoML, reinforcement learning, lifelong and self-supervised learning, “green” efficient learning, federated learning, but also using emotion in learning itself. Furthermore, we will look into robustness issues such as against adversarial attacks or package loss. Beyond showing these and further recent trends and developments largely basing on deep learning techniques, the talk will end on the major needed final steps at “T-minus 3” to make Artificial Emotional Intelligence take-off and “fly” in real-world applications at scale.

Short CV

Björn W. Schuller received his diploma, doctoral degree, habilitation, and Adjunct Teaching Professor in Machine Intelligence and Signal Processing all in EE/IT from TUM in Munich/Germany. He is Full Professor of AI and the Head of GLAM at Imperial College London/UK, Full Professor and Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg/Germany, co-founding CEO and current CSO of audEERING, Guest Professor at Southeast University in Nanjing/China and permanent Visiting Professor at HIT/China amongst other Professorships and Affiliations. He is a Fellow of the IEEE and Golden Core Awardee of the IEEE Computer Society, Fellow of the BCS, Fellow of the ISCA, President-Emeritus of the AAAC, and Senior Member of the ACM. He (co-)authored 1,000+ publications (35k+ citations, h-index=86), is Field Chief Editor of Frontiers in Digital Health and was Editor in Chief of the IEEE Transactions on Affective Computing amongst manifold further commitments and service to the community. His is an ERC Starting and DFG Reinhart-Koselleck Grantee, and consultant of companies such as Barclays, GN, Huawei, or Samsung.


Lecture by Prof. Pietro Perona: Measuring algorithmic bias in face analysis — towards an experimental approach

The talk was held on 23rd February 2021.


Measuring algorithmic bias is crucial both to assess algorithmic fairness, and to guide the improvement of algorithms. Current methods to measure algorithmic bias in computer vision, which are based on observational datasets, are inadequate for this task because they conflate algorithmic bias with dataset bias. To address this problem I will propose experimental method for measuring algorithmic bias of face analysis algorithms, which manipulates directly the attributes of interest, e.g., gender and skin tone, in order to reveal causal links between attribute variation and performance change. The method is based on generating synthetic “transects” of matched sample images that are designed to differ along specific attributes while leaving other attributes constant. A crucial aspect of our approach is relying on the perception of human observers, both to guide manipulations, and to measure algorithmic bias. Besides allowing the measurement of algorithmic bias, synthetic transects have other advantages with respect to observational datasets: sampling  attributes more evenly, allowing for more straightforward bias analysis on minority and intersectional groups, enabling prediction of bias in new scenarios, reducing ethical and legal challenges, and they are economical and fast to obtain, helping make bias testing affordable and widely available. The method is validated by comparing it to a study that employs the traditional observational method for analyzing bias in gender classification algorithms. The two methods reach different conclusions. While the observational method reports gender and skin color biases, the experimental method reveals biases due to gender, hair length, age, and facial hair.

Short CV

Professor Perona is the Allen E. Puckett Professor of Electrical Engineering at Caltech. He directs the Computation and Neural Systems (www.cns.caltech.edu), a PhD program centered on the study of biological brains and intelligent machines. Professor Perona’s research centers on vision. He has contributed to the theory of partial differential equations for image processing and boundary formation, and to modeling the early visual system’s function. He is currently interested in visual categories and visual recognition. Professor Pietro Perona‘s research focusses on vision: how do we see and how can we build machines that see. Professor Perona is also interested in studying how humans perform visual tasks, such as searching and recognizing image content. One of his recent projects studies how to harness the visual ability of thousands of people on the web.

Lecture by Prof. Efstratios Gavves: The Machine Learning of Time: Past and Future

The talk was held on 9th February 2021.


Visual artificial intelligence automatically interprets what happens in visual data like videos. Today’s research strives with queries like: “Is this person playing basketball?”; “Find the location of the brain stroke”; or “Track the glacier fractures in satellite footage”. All these queries are about visual observations already taken place. Today’s algorithms focus on explaining past visual observations. Naturally, not all queries are about the past: “Will this person draw something in or out of their pocket?”; “Where will the tumour be in 5 seconds given breathing patterns and moving organs?”; or, “How will the glacier fracture given the current motion and melting patterns?”. For these queries and all others, the next generation of visual algorithms must expect what happens next given past visual observations. Visual artificial intelligence must also be able to prevent before the fact, rather than explain only after it. In this talk, I will present my vision on what these algorithms should look like, and investigate possible synergies with other fields of science, like biomedical research, astronomy and others. Furthermore, I will present some recent works and applications in this direction within my lab and spinoff.

Dr. Efstratios Gavves is an Associate Professor with the University of Amsterdam in the Netherlands and Scientific Director of the QUVA Deep Vision Lab. He is a recipient of the ERC Career Starting Grant 2020 and NWO VIDI grant 2020 to research on the Computational Learning of Temporality for spatiotemporal sequences. Also, he is a co-founder of Ellogon.AI, a University spinoff and in collaboration with the Dutch Cancer Institute (NKI), with the mission of using AI for pathology and genomics. He is currently supervising more than 12 Ph.D. and postdoctoral students in projects with the University of Amsterdam, the Dutch Cancer Institute, Ellogon.AI, and BMW. Efstratios has authored several papers in the top Computer Vision and Machine Learning conferences and journals and he is also the author of several patents. Further, Efstratios teaches Deep Learning in the MSc in Artificial Intelligence at the University of Amsterdam. All material is available on the project website, uvadlc.github.io. His research focus is on Temporal Machine Learning and Dynamics, Efficient Computer Vision, and Machine Learning for Oncology.


Lecture by Prof. Tinne Tuytelaars: ‘Keep on learning without forgetting’

The talk was held on January 26th 2021


A core assumption behind most machine learning methods is that training data should be representative for the data seen at test time. While this seems almost trivial, it is, in fact, a particularly challenging condition to meet in real world applications of machine learning: the world evolves and distributions shift over time in an unpredictable way (think of changing weather conditions, fashion trends, social hypes, wear and tear, etc.). This means models get outdated and in practice need to be re-trained over and over again. A particular subfield of machine learning, known as continual learning, aims at addressing these issues. The goal is to develop learning schemes that can learn from non-i.i.d. distributed data. The challenges are to realise this without storing all the training data (ideally none at all), with fixed memory and model capacity, and without forgetting concepts learned previously. In this talk, I will give an overview of recent work in this direction, with a focus on learning deep models for computer vision.

Short CV

Tinne Tuytelaars is a full professor at KU Leuven, Belgium, working on computer vision and, in particular, topics related to image representations, vision and language, continual learning and more. She has been program chair for ECCV14 and CVPR21, and general chair for CVPR16. She also served as associate-editor-in-chief of the IEEE Transactions on Pattern Analysis and Machine Intelligence from 2014-2018. She was awarded an ERC Starting Grant in 2009 and received the Koenderink test-of-time award at ECCV16.


Generative Adversarial Networks in Multimedia Content Creation (8/10/2020)