Prof. Shaogang Gong

School of EECS, Queen Mary University of London, UK

Multimodal Self-Supervised Learning

Deep learning has revolutionised AI machine learning techniques in computer vision over the past decade largely due to the availability of centralised big data with exhaustive labelling and cheap computing power from Nvidia’s GPUs. However, privacy concerns from data protection and environmental concerns on energy consumption together with an increasing demand for decentralised user-ownership of localised unlabelled data pose fundamental challenges to the established wisdom of deep learning on centralised big data from scratch with exhaustive labelling available for model training. In this talk, I will present challenges and progress on developing self-supervised deep learning models by exploring multimodal vision-language foundation models from LLM (Large Language Models), through examples of open-vocabulary object detection, automatic prompt control by self-supervised learning for instance semantic segmentation, and learning fine-grained video- language dynamic details without fine-grained labelling in model training for video moment retrieval.

Professor Sean Gong FREng is a computer vision and machine learning scientist. He pioneered person re-identification and video behaviour analysis for law enforcement. Prof Gong is elected a Fellow of the Royal Academy of Engineering, and served on the steering panel of the UK government Chief Scientific Adviser’s Science Review on Security. He has made unique contributions to the engineering of AI video analytics for law enforcement and the security industry and was awarded an Institution for Engineering and Technology Achievement Medal for Vision Engineering for outstanding achievement and superior performance in contributing to public safety. A commercial system built based on his research won an Aerospace Defence Security Innovation Award and a Global Frost & Sullivan Award for Technical Innovation for Law Enforcement Video Forensics Technology. Gong is Professor of Visual Computation and Director of the Computer Vision Laboratory at Queen Mary University of London, a Fellow of ELLIS (European Laboratory for Learning and Intelligent Systems), a Fellow of AAIA (Asia- Pacific Artificial Intelligence Association), a Turing Fellow of the Alan Turing Institute, a Fellow of the Institution of Electrical Engineers, and a member of the UK Computing Research Committee. He founded Vision Semantics and served as the Chief Scientist of three start-ups, two of which have been acquired by NASDAQ listed companies. He is a Distinguished Scientist of Veritone. He received his DPhil from Oxford University.