How to train large scale 3D human foundation models

Tuesday 4th February 2025 17:00 CET

Prof. Dr. Gerard Pons-Moll

ABSTRACT

Understanding 3D humans interacting with the world has been a long standing goal in AI and computer vision for decades. Lack of 3D data has been the major barrier of progress. This is changing with the increasing number of 3D datasets featuring images, videos and multi-view with 3D annotations, as well as large-scale image foundation models. However, learning models from such sources is non-trivial. Some of the challenges are: 1) Datasets are annotated with different 3D skeleton formats and outputs, 2) interactions with a wide variety of objects are still very limited and 3) image foundation models are 2D and extracting 3D information from them is hard. I will present solutions to each of these 3 challenges. I will introduce a universal training procedure to consume any skeleton format, a method to learn from synthetic data of human-object interactions, as well a diffusion based method tailored to lift foundation models to 3D.

Potential papers of the talk: Neural Localizer Fields (NeurIPS’24), Human3Diffusion (NeurIPS’24), HDM CVPR’24

LECTURER SHORT CV

Gerard Pons-Moll is a Professor at the University of Tübingen endowed by the Carl Zeiss Foundation, at the department of Computer Science. He is also core faculty at the Tübingen AI Center, senior researcher at the Max Planck for Informatics (MPII) in Saarbrücken, Germany, and faculty at the IMPRS-IS (International Max Planck Research School – Intelligent Systems in Tübingen). His research lies at the intersection of computer vision, computer graphics and machine learning — with special focus on analyzing people in videos, and creating virtual human models by “looking” at real ones. His research has produced some of the most advanced statistical human body models of pose, shape, soft-tissue and clothing (which are currently used for a number of applications in industry and research), as well as algorithms to track and reconstruct 3D people models from images, video, depth, and IMUs.

His work has received several awards including the prestigious Emmy Noether Grant (2018), a Google Faculty Research Award (2019,2024), a Facebook Reality Labs Faculty Award (2018,2024), the German Pattern Recognition Award (2019), which is given annually by the German Pattern Recognition Society to one outstanding researcher in the fields of Computer Vision and Machine Learning. His work got Best Papers Awards BMVC’13, Eurographics’17, 3DV’18, 3DV’22 and CVPR’20, ECCV’22 and has been published at the top venues and journals including CVPR, ICCV, Siggraph, Eurographics, 3DV, IJCV and PAMI. He serves regularly as area chair for the major conferences in learning and vision and is associate editor of PAMI.

ZOOM LINK & PASSCODE

ZOOM

Meeting ID: 998 8013 8183
Passcode: 114050

PRESENTATION & VIDEO

PDF&VIDEO