Large Language Models (LLMs) have revolutionised machine learning in computer vision in recent years largely due to their capacity for semantic reasoning in supporting visual interpretation in context. Computer vision fundamentally requires answering two questions of ‘what’ and ‘where’. However, LLMs powered multimodal foundation models (LMMs) are poor for solving the `where’ localisation problem underpinning object detection, segmentation, and generative synthesis of details due to a lack of fine-grained domain specific knowledge without sufficient target domain fine-grained training data. Moreover, increasing privacy concerns from data protection and environmental concerns on energy consumption together with an increasing demand for supporting decentralised user-ownership of small data pose fundamental challenges to the established wisdom of model learning on centralised big data with exhaustive labelling. In this talk, I will present progress on exploring LLMs/LMMs for more reliable small data learning from examples in automatic prompt control in instance segmentation, leveraging (not removing) LLM hallucination for more reliable and trustworthy instance segmentation, and diffusion few-shot image generation overcoming limitations of LLMs.
Professor Sean Gong FREng is a computer vision and machine learning scientist. He pioneered person re-identification and video behaviour analysis for law enforcement. Prof Gong is elected a Fellow of the Royal Academy of Engineering, and served on the steering panel of the UK government Chief Scientific Adviser’s Science Review on Security. He has made unique contributions to the engineering of AI video analytics for law enforcement and the security industry and was awarded an Institution for Engineering and Technology Achievement Medal for Vision Engineering for outstanding achievement and superior performance in contributing to public safety. A commercial system built on his research won an Aerospace Defence Security Innovation Award, and a Global Frost & Sullivan Award for Technical Innovation for Law Enforcement Video Forensics Technology. Gong is Professor of Visual Computation and Director of the Computer Vision Laboratory at Queen Mary University of London, a Turing Fellow of the Alan Turing Institute (2018-2023), a member of the UK Computing Research Committee. He founded Vision Semantics and served as the Chief Scientist of three start-ups, two of which have been acquired by NASDAQ listed companies. He is a Distinguished Scientist of Veritone. He received his DPhil from Oxford University.