van der Schaar Lab

IJCAI 2023: Data-Centric AI Tutorial

This IJCAI Tutorial will be presented by Mihaela van der Schaar and Nabeel Seedat during the 32nd International Joint Conference On Artificial Intelligence, running from 19 – 25 August, 2023. This is a hybrid event (in person/online) you can register for here.


Data-Centric AI: Foundation, Frontiers and Applications in the quest for robust and reliable AI systems


Data-Centric AI has recently been raised as an important paradigm shift to change how AI is built — placing the previously undervalued “data work’ at the centre of AI development. This tutorial introduces participants to the foundations of Data-Centric AI by exploring recent state-of-the-art methods and use-cases around characterising, auditing, and improving the data used in machine learning.

Goals of the lab

The quality of data used to train Machine Learning (ML) models is crucial to the success or failure of AI. This is increasingly critical with data-hungry algorithms deployed in high-stakes healthcare or finance settings. Despite its importance, the “data” work has been undervalued as merely operational [9].

Hence, along with algorithmic improvement, there is an urgent need to shift the focus to the data used in AI/ML and its quality. The emergence of Data-Centric AI addresses this issue by developing tools for systematic characterisation, evaluation, and monitoring of the data used to train and evaluate ML models.

This tutorial introduces participants to the foundations of Data-Centric AI. We will provide a comprehensive introduction to recent state-of-the-art Data-Centric AI methods to (1) characterize, (2) generate and (3) evaluate the underlying ML data. A unique focus of the tutorial is showing how these Data-Centric components apply to different stages of the ML pipeline with practical use-cases on tabular, image and text data. This end-to-end approach will enable participants to practically engage with Data-Centric AI for their own problems — from a researcher or practitioner perspective. Additionally, we will explore the future of Data-Centric AI, discussing challenges and opportunities.

After the tutorial, participants will understand the need for Data-Centric AI and its essential components and gain a foundation in state-of-the-art tools and methods such that they can either use or contribute to Data-Centric AI.

You can have a look at our Inspiration Exchange session on the topic of Data-Centric AI in healthcare here.

You can find our most recent Revolutionizing Healthcare sessions on data here and here.

Other useful links:
– Our lab’s publications
– Mihaela van der Schaar on Twitter and LinkedIn

Mihaela van der Schaar

Mihaela van der Schaar is the John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge and a Fellow at The Alan Turing Institute in London.

Mihaela has received numerous awards, including the Oon Prize on Preventative Medicine from the University of Cambridge (2018), a National Science Foundation CAREER Award (2004), 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award.

In 2019, she was identified by National Endowment for Science, Technology and the Arts as the most-cited female AI researcher in the UK. She was also elected as a 2019 “Star in Computer Networking and Communications” by N²Women. Her research expertise span signal and image processing, communication networks, network science, multimedia, game theory, distributed systems, machine learning and AI.

Mihaela’s research focus is on machine learning, AI and operations research for healthcare and medicine.

Nabeel Seedat

Before joining the van der Schaar Lab, Nabeel received a merit scholarship for a master’s degree at Cornell University, researching Bayesian deep learning and uncertainty estimation for high stakes applications. In addition, he holds a master’s degree from the University of the Witwatersrand (South Africa), where he was awarded a National Research Foundation grant for his work applying signal processing and machine learning to Parkinson’s disease diagnostics in low-resource settings.

Professionally, Nabeel has worked as a machine learning engineer in the United States and South Africa. The computer vision and natural language processing models he worked on are currently deployed and serving millions of customers on a daily basis.

Nabeel is keenly aware that taking methods from the lab to the bedside “requires a unique focus beyond just high-performance predictive models; it requires the development of a toolkit of methods for transfer learning across domains and locations, learning on smaller datasets, understanding model biases and quantifying model reliability and uncertainty are fundamentally needed to bridge this divide.”

Nabeel’s research is supported by funding from the Cystic Fibrosis Trust.