van der Schaar Lab

Revolutionizing Healthcare: What data do I need?

On 22 March, we had our 24th Revolutionizing Healthcare session. In that session, we explored the need for reliable data and what data-centric AI can do for clinicians.

The session started with an introduction to data-centric machine learning for clinicians by Prof Mihaela van der Schaar. This was followed by a highly relevant roundtable discussion with practicing clinicians that joined us from our regular audience.

In our session, we talked through a number of questions regarding ML and data in a clinical setting: 

  1. How much data do I need to do machine learning in the medical setting of my interest/my clinic? 
  2. What should be the quality of that data? (Errors, noise, missingness) 
  3. How do I test the quality of my data? 
  4. What happens if I do not have enough labelled data? 
  5. What are the differences between cross sectional, treatment, and time series data? 
  6. Can machine learning improve the quality of my data? 
  7. Can I share my data without privacy fears? (What role can synthetic data play?) 

We then invited further questions from the audience and the session developed into a thought-provoking discussion including the perspectives from a variety of clinicians.

We thank Prof Carsten Utoft Niemann (Copenhagen University), Dr Mustafa Khanbhai (NHS/Imperial College London), Dr Nazima Pathan (University of Cambridge), and Dr Janak Gunatilleke (KPMG) for their participation.

If you didn’t manage to join us last week, we’d strongly recommend watching the archived video, which is now available on YouTube:

NOTE: This information was up-to-date at the time of the presentation but does not take into account material published since then.

If you would like to learn more about a data-centric AI, have a look at our dedicated research pillar: and our data-centric AI checklist and DC-Check tool:

Sign up for our upcoming sessions here.

Andreas Bedorf