This IJCAI Tutorial will be presented by Mihaela van der Schaar and Nabeel Seedat (see presenter bios below) during the 32nd International Joint Conference On Artificial Intelligence, running from 19 – 25 August, 2023. This is a hybrid event (in person/online) you can register for here.
Title
Data-Centric AI: Foundation, Frontiers and Applications in the quest for robust and reliable AI systems
Logistics
The live tutorial will take place on 19 August 2023 (PM), in Macao. More details to follow.
About
Data-Centric AI has recently been raised as an important paradigm shift to change how AI is built — placing the previously undervalued “data work’ at the centre of AI development. This tutorial introduces participants to the foundations of Data-Centric AI by exploring recent state-of-the-art methods and use-cases around characterising, auditing, and improving the data used in machine learning.
Goals of the lab
The quality of data used to train Machine Learning (ML) models is crucial to the success or failure of AI. This is increasingly critical with data-hungry algorithms deployed in high-stakes healthcare or finance settings. Despite its importance, the “data” work has been undervalued as merely operational [9].
Hence, along with algorithmic improvement, there is an urgent need to shift the focus to the data used in AI/ML and its quality. The emergence of Data-Centric AI addresses this issue by developing tools for systematic characterisation, evaluation, and monitoring of the data used to train and evaluate ML models.
This tutorial introduces participants to the foundations of Data-Centric AI. We will provide a comprehensive introduction to recent state-of-the-art Data-Centric AI methods to (1) characterize, (2) generate and (3) evaluate the underlying ML data. A unique focus of the tutorial is showing how these Data-Centric components apply to different stages of the ML pipeline with practical use-cases on tabular, image and text data. This end-to-end approach will enable participants to practically engage with Data-Centric AI for their own problems — from a researcher or practitioner perspective. Additionally, we will explore the future of Data-Centric AI, discussing challenges and opportunities.
After the tutorial, participants will understand the need for Data-Centric AI and its essential components and gain a foundation in state-of-the-art tools and methods such that they can either use or contribute to Data-Centric AI.
Presenter bios
The tutorial will be presented by Mihaela van der Schaar & Nabeel Seedat

Mihaela van der Schaar
Mihaela van der Schaar is the John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge, a Fellow at The Alan Turing Institute in London, and a Chancellor’s Professor at UCLA. Mihaela has received numerous awards, including the Oon Prize on Preventative Medicine from the University of Cambridge (2018), a National Science Foundation CAREER Award (2004), 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award. In 2019, she was identified by National Endowment for Science, Technology and the Arts as the most-cited female AI researcher in the UK. She was also elected as a 2019 “Star in Computer Networking and Communications” by N²Women. Her research expertise span signal and image processing, communication networks, network science, multimedia, game theory, distributed systems, machine learning and AI.
Mihaela’s research focus is on machine learning, AI and operations research for healthcare and medicine. In addition to leading the van der Schaar Lab, Mihaela is founder and director of the Cambridge Centre for AI in Medicine (CCAIM).

Nabeel Seedat
Nabeel Seedat is a PhD candidate at the University of Cambridge. Nabeel’s research is focused on Data-Centric AI, uncertainty quantification and synthetic data. He has published papers on Data-Centric AI in leading ML conferences including, NeurIPS, ICML and AISTATS. Nabeel has recently given talks and presentations on Data-Centric AI to both industry: AstraZeneca, Discovery Limited and academic research groups: Queen Mary University of London, University of Cape Town). He also has experience giving talks to diverse audiences at conferences including IEEE conferences, KDD and PyData.
Beyond Nabeel’s academic background in data-centric AI, he also has extensive industry experience working on data-centric problems. He has worked as a Machine Learning engineer across two multinational corporations (in the USA and South Africa), building real-world computer vision and NLP systems that currently serve millions of customers daily.
You can have a look at our Inspiration Exchange session on the topic of Data-Centric AI in healthcare here.
You can find our most recent Revolutionizing Healthcare sessions on data here and here.
Other useful links:
– Our lab’s publications
– Mihaela van der Schaar on Twitter and LinkedIn