From Images to Text New forms of Human-AI Interaction

author img
Assistant Prof. Lorenzo Baraldi:
From Images to Text New forms of Human-AI Interaction
Recent progress in the Computer Vision and Natural Language Processing communities have made it possible to connect Vision and Language together in a variety of different tasks which lie at the intersection of Vision, Language, and Embodied AI. Those tasks range from generating meaningful descriptions of images, to answering questions and navigating agents in unseen environments via natural language instructions. This integration has grown up to the point that it is becoming endemic in literature, and a fundamental tool to develop AI algorithms. The lecture will provide an overview of these advancements, focusing on our recent works. We will delve into cutting-edge techniques for generating text from images and videos, addressing the controllability of AI systems with human involvement, and training large-scale models using web-based datasets. Additionally, we will explore the application of these approaches to embodied agents, which interact with the physical world for tasks like navigation and other embodied activities. Throughout the talk, we will emphasize the importance of developing appropriate evaluation metrics and discuss the emerging challenges in the field.
Lecturer short CV
Lorenzo Baraldi is a Tenure Track Assistant Professor at the University of Modena and Reggio Emilia. He works under the supervision of Prof. Rita Cucchiara on Deep Learning, Video Analysis and Multimedia, and teaches in the courses of “Computer Vision and Cognitive Systems” and Scalable AI. Among his research interests, he worked on Egocentric Vision and Gesture Recognition, Temporal Video Segmentation and Retrieval, Saliency, Video Captioning, Visual-Semantic alignment and Embodied AI. He is the author of more than 80 publications in international journals and conferences, and serves as Associate Editor for Pattern Recognition Letters and as Area Chair for major multimedia conferences. He has been elected as a Scholar in the ELLIS society, the European Laboratory for Learning and Intelligent Systems, and coordinates the Modena ELLIS Unit. Since 2021, he has been appointed as deputy director of the Interdepartmental Center on Digital Humanities of the University of Modena and Reggio Emilia. In 2017, he worked in the Facebook AI Research laboratory in Paris, under the supervision of Hervé Jégou, where he developed a video copy detection algorithm that has been adopted in production on the social network.
Cookie Settings

A AIDA - AI Doctoral Academy may use cookies to remember your login data, collect statistics to optimize the functionality of the site and to perform marketing actions based on your interests.

These cookies are necessary to allow the main functionality of the website and are automatically activated when you use this website.
These cookies allow us to analyze the use of the website, so that we can measure and improve its performance.
Allow you to stay in touch with your social network, share content, send and post comments.

Required Cookies They allow you to personalize the commercial offers that are presented to you, directing them to your interests. They can be own or third party cookies. We warn you that, even if you do not accept these cookies, you will receive commercial offers, but without meeting your preferences.

Functional Cookies They offer a more personalized and complete experience, allow you to save preferences, show you content relevant to your taste and send you the alerts you have requested.

Advertising Cookies Allow you to stay in touch with your social network, share content, send and post comments.