van der Schaar Lab

Data-Centric AI: What is it all about?

AI-powered applications are becoming increasingly widespread across many areas and industries, including e-commerce, finance, manufacturing, medicine, and many more. However, there are many considerations necessary to successfully develop robust and reliable ML systems.

In data-centric AI, we seek to give data centre stage. Data-centric AI views model or algorithmic refinement as less important (and in certain settings, algorithmic development is even considered as a solved problem), and instead seeks to systematically improve the data used by ML systems.

To dive deeper into the topic, we dedicated one of our Inspiration Exchange sessions on Data-centric AI, what it is, challenges, opportunities, and how it can be utilised effectively.

Enabling robust & reliable ML in healthcare and beyond

In this introductory presentation about data-centric AI, Prof Mihaela van der Schaar lays the groundwork for understanding the significance of machine learning for healthcare, and the need for a data-centric lens.


Nabeel Seedat presents Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes. A new method to making reliable predictions in healthcare based on tabular data. This is a practical example of how data-centric AI can be used to improve real-life work in a clinical setting.

Understanding key data-centric considerations is crucial for reliable machine learning systems in healthcare and beyond. To address this pain point, we created DC-Check: an actionable checklist-style framework to practically engage with data-centric AI — providing the first standardised framework to communicate the design and necessary considerations for data-centric AI/ML pipelines. To learn more and use DC-Check see our dedicated website.

If you would like to learn more about data-centric AI, please have a look at our dedicated research pillar and our publications.

Nabeel Seedat

Before joining the van der Schaar Lab, Nabeel received a merit scholarship for a master’s degree at Cornell University, researching Bayesian deep learning and uncertainty estimation for high stakes applications. In addition, he holds a master’s degree from the University of the Witwatersrand (South Africa), where he was awarded a National Research Foundation grant for his work applying signal processing and machine learning to Parkinson’s disease diagnostics in low-resource settings.

Professionally, Nabeel has worked as a machine learning engineer in the United States and South Africa. The computer vision and natural language processing models he worked on are currently deployed and serving millions of customers on a daily basis.

Nabeel is keenly aware that taking methods from the lab to the bedside “requires a unique focus beyond just high-performance predictive models; it requires the development of a toolkit of methods for transfer learning across domains and locations, learning on smaller datasets, understanding model biases and quantifying model reliability and uncertainty are fundamentally needed to bridge this divide.”

Nabeel’s research is supported by funding from the Cystic Fibrosis Trust.

Mihaela van der Schaar

Mihaela van der Schaar is the John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge and a Fellow at The Alan Turing Institute in London.

Mihaela has received numerous awards, including the Oon Prize on Preventative Medicine from the University of Cambridge (2018), a National Science Foundation CAREER Award (2004), 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award.

In 2019, she was identified by National Endowment for Science, Technology and the Arts as the most-cited female AI researcher in the UK. She was also elected as a 2019 “Star in Computer Networking and Communications” by N²Women. Her research expertise span signal and image processing, communication networks, network science, multimedia, game theory, distributed systems, machine learning and AI.

Mihaela’s research focus is on machine learning, AI and operations research for healthcare and medicine.

Andreas Bedorf