van der Schaar Lab

Spotlight on Cardiovascular Disease

Cardiovascular diseases (CVDs), e.g. heart failure or stroke, are the leading cause of global mortality and majorly contribute to reduced quality of life. Cardiology is a prime example for the hard challenges awaiting those who develop machine learning (ML) for real-world settings. AI tools can empower clinicians in areas such as prevention, prediction, detection, diagnosis, treatment, and care.

Given its high relevance, the van der Schaar lab has contributed to the field with a variety of research projects, inventing new cutting-edge ML along the way.

Risk predictions, e.g. of mortality risk, allow for the optimisation of treatment plans for CVD patients. Typical risk predictions in cardiology often do not perform well across the whole patient population. Machine learning methods can learn risk predictors agnostically and substantially improve predictions. One of the most exciting advancements in the field of risk prediction is the advent of automated machine learning (AutoML). CVD has been one of the first fields in which we applied AutoPrognosis, a cutting-edge AutoML framework developed by the lab, which optimises the modelling process and makes cutting-edge, interpretable ML accessible for non-domain experts. Employing AutoPrognosis on UK Biobank data, we were able to discover novel CVD risk predictors and improve predictions for subpopulations, e.g. individuals with diabetes.

Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants

Ahmed M. Alaa, Thomas Bolton, Emanuele Di Angelantonio, James H. F. Rudd, Mihaela van der Schaar

PLOS One 2019


Making machine learning interpretable is paramount in medicine. Predictions must be clinically sound and meaningful and explainable to clinicians and, most importantly, their patients.

We pioneered several interpretable ML methods. For example, our lab established “Stratified Linear Models” (SLIM) to identify different risk predictors for different risk strata in a population of patients with heart failure. These models are based on Trees of Predictors (ToP), an interpretable type of model envisioned by the lab, which outperforms traditional regression methods. With ToP, we were able to identify new significant predictors, such as rales and shortness of breath at rest, particularly in high-risk patients. SLIM provided more accurate mortality predictions and better identification of risk predictors across strata.

Interpretable Machine Learning Identifies Risk Predictors in Patients With Heart Failure

William Zame, Jinsung Yoon, Folkert Asselbergs, Mihaela van der Schaar

Circulation 2018


Survival prediction, e.g. before and after heart transplantation, can inform transplantation and treatment decisions based on individual predictions for patients on a transplant waitlist. Better predictions prior to heart transplantation may also increase the number of successful transplantations. A Newsweek article strikingly highlights how important it is to be able to precisely determine which patients need heart transplants most urgently, and how “scarily” accurate our methodology is.

We were able to improve individualised pre- and post-transplant survival predictions using interpretable ToPs (Trees of Predictors). Crucially, this has been one of the first interpretable models ever built, which highlights our role as pioneers in the field. ToPs discover specific clusters within patient populations and the optimal predictive model for these clusters. Factoring in the differences between clusters within the same patient population, offers a more personalised, accurate approach that can enhance decision-making for patients, clinicians, and policymakers, not only in cardiology but across medical specialties.

Personalized survival predictions via Trees of Predictors: An application to cardiac transplantation


Competing risks in patients with (potential) co-morbidities complicate the assignment of optimal treatments. For example, the risk of a cardiac disease can determine the mortality risk of a cancer patient undergoing surgery. Adjusting for competing risks is crucial to accurately estimate the probability of the event of interest. Deep multi-task Gaussian process (DMGP) models, a technique developed by the lab, can jointly assess a patient’s risk for multiple (competing) adverse outcomes. We demonstrate the approach for CVD and cancer. Taking competing risks into account makes this strategy superior to state-of-the-art survival models and has higher utility in real-world settings.

Deep Multi-task Gaussian Processes for Survival Analysis with Competing Risks


For many medical disciplines, several related but different databases exist. ML models typically require large amounts of data during training. Translating from one dataset to the other to jointly use them for modelling can improve the predictive performance of ML models. 

Our approach, RadialGAN, allows for related datasets to be jointly used for modelling, a breakthrough, especially for settings in which high quality data is rare and fragmented. By solving feature and distribution mismatch, RadialGANs open the door to effective transfer learning. The practical utility of this approach was demonstrated using 14 different heart failure datasets for improved predictive modelling.

RadialGAN: Leveraging multiple datasets to improve target-specific predictive models using Generative Adversarial Networks


Generating synthetic data to augment small datasets and make them accessible for ML methods, is another approach to the problem of low-data settings. We recently introduced Curated LLM (CLLM), which leverages the prior knowledge of Large Language Models (LLMs) for data augmentation in the low-data regime. To balance the utility of LLMs for data generation against the potential downside of noisy, irrelevant data, CLLM includes a post-generation data curation mechanism, thus offering the best of both worlds. We demonstrate its potential using multiple real-world datasets, one of them the Meta-Analysis Global Group in Chronic Heart Failure (MAGGIC) dataset.

Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in ultra low-data regimes