van der Schaar Lab


Putting research into practice

Our purpose as a lab is to create new and powerful machine learning techniques and methods that can revolutionize healthcare. This page describes the impact of our work.

Clinical impact

Our development of cutting-edge methods and models is guided by our clinical collaborators, academic colleagues, and our partners in the private sector. Much of our work, which is frequently featured in leading medical journals, can be applied to almost any healthcare problem, but a number of projects relevant to specific diseases and settings are presented on this page.

Revolutionizing healthcare through partnership with clinicians

This section highlights our efforts to reach across borders and build a diverse but aligned community committed to the common goal of revolutionizing healthcare—including our engagement sessions, which have roughly 400 clinicians from around the world registered to participate.

Policy impact

This section highlights our efforts to reach across borders and build a diverse but aligned community committed to a common goal: revolutionizing healthcare.

This section demonstrates how our lab has contributed to discussions regarding policies and guidelines at the highest levels.

Impact of previous research

This section briefly introduces Mihaela van der Schaar’s research in the areas of multimedia communications, compression and processing, and real-time stream mining.


The van der Schaar Lab has played an active role in the academic and clinical response to the COVID-19 pandemic, including:

  • developing and implementing Cambridge Adjutorium for COVID-19, a tool that allows clinicians to predict utilization of scarce resources such as ventilators and ICU beds, and entering a partnership with the NHS for real-world use of Cambridge Adjutorium at Acute Trusts in England; and
  • exploring and offering guidance regarding the potential impact of machine learning on clinical trials;
  • conducting research and statistical analysis regarding the nature of the disease, its spread, and its disproportionate impact on certain individuals and communities;
  • creating Policy Impact Predictor (PIP), a machine learning tool developed to guide government decision-making around measures to prevent the spread of COVID-19.

Specifics regarding all of the projects mentioned above can be found on our dedicated COVID-19 page.

Acute care/ICU

In acute care/ICU setting, vitally important decisions must be made on the basis of highly compressed time series datasets containing measurements that may not accurately portray the rapid evolution of a patient’s status. Due to the lagged nature of change in biomarkers, intensivists may already find themselves “behind the game” when deterioration becomes evident.

Additionally, there is a need for sophisticated tools that can provide accurate and actionable recommendations regarding decisions such as when a patient should be intubated and extubated.

Our lab has worked on these problems for many years, and we have developed a host of powerful tools in partnership with our clinical colleagues. Some of these are showcased below.

Learning from Clinical Judgments: Semi-Markov-Modulated Marked Hawkes Processes for Risk Prognosis (ICML 2017)
Personalized Risk Scoring for Critical Care Prognosis Using Mixtures of Gaussian Processes (IEEE Transactions on Biomedical Engineering, January 2018)
A Hidden Absorbing Semi-Markov Model for Informatively Censored Temporal Data: Learning and Inference (Journal of Machine Learning Research, 2018)


The term cancer embraces a wide variety of related disorders/conditions that share many similarities but also many differences. This daunting complexity becomes more apparent with every breakthrough in our quest to understand it: it ranges from the bewildering array of disease subtypes (and subtypes of subtypes) to variations in cause and presentation, to the lengthy and unpredictable pathways inflicted on patients.

While the notion of developing a single “magic bullet” to cure cancer is outdated, ongoing research advancements have at least allowed us to develop a substantial arsenal in areas such as prevention, prediction, detection, diagnosis, treatment, and care. Truly revolutionizing our ability to combat cancer, however, requires an altogether deeper understanding of its disease pathways, and this can only be achieved through the adoption of machine learning methods.

Some of our lab’s key projects relating to machine learning for cancer are introduced below, but much more information can be found on our dedicated cancer spotlight page.

Adjutorium for breast cancer

An extensive study published in Nature Machine Intelligence shows that a prognostic tool developed by the van der Schaar Lab can recommend therapies for breast cancer patients more reliably than methods that are currently considered international clinical best practice. The study makes unprecedented use of complex, high-quality cancer datasets from the U.K. and U.S. to demonstrate the accuracy of Adjutorium, a machine learning system for prognostication and treatment benefit prediction.

Machine learning to guide the use of adjuvant therapies for breast cancer (Nature Machine Intelligence, June 2021)
Adjutorium for breast cancer: web app and other online resources

Personalizing the screening process and improving diagnostic triaging

A key priority in cancer diagnosis is managing the workload of radiologists to optimize accuracy, efficiency, and costs. Our challenge here is to ensure that radiologists can devote the right amount of time to viewing scans that actually need their attention, meaning such scans must be separated out from others which can simply be read using machine learning or similar technologies.

MAMMO, a tool developed by our lab, is a framework for cooperation between radiologists and machine learning. The focus of MAMMO is to triage mammograms between machine learning systems and radiologists.

Our lab has also developed a system called ConfidentCare, which, like MAMMO, aims to improve accuracy and efficiency of resource usage within the overall diagnostic process. ConfidentCare is a clinical decision support system that identifies what type of screening modality (e.g. mammogram, ultrasound, MRI) should be used for specific individuals, given their unique characteristics such as genomic information or past screening history.

Improving Workflow Efficiency for Mammography using Machine Learning (Journal of the American College of Radiology, May 2019)
ConfidentCare: A Clinical Decision Support System for Personalized Breast Cancer Screening (IEEE Transactions on Multimedia, October 2016)

Risk and prognosis

Survival analysis (often referred to as time-to-event analysis) refers to the study of the duration until one or more events occur. Accurate prognostication is crucial in treatment decisions made for cancer patients, but widely-used models rely on prespecified variables, which limits their performance.

In a paper published in The Lancet Digital Health in 2021, we introduced a research project undertaken by our lab in collaboration with clinical colleagues, in which we investigated a novel machine learning approach to develop an improved prognostic model for predicting 10-year prostate cancer-specific mortality.

Application of a novel machine learning framework for predicting non-metastatic prostate cancer-specific mortality in men using the Surveillance, Epidemiology, and End Results (SEER) database (The Lancet Digital Health, February 2021)

Cystic fibrosis

The most common genetic disease in caucasian populations, Cystic fibrosis is defined by a unique mix of complexities that make the lives of patients and the task of healthcare professionals particularly unpredictable. As a chronic condition, its progression at times appears almost random due to the potential presence of a variety of (often competing) complications. These can be hard to disentangle, and usually require targeted prevention or mitigation when identified.

Thanks to support from the UK Cystic Fibrosis Trust and its pioneering patient registry, our lab has developed a range of powerful machine learning tools for diagnosis, prognosis, phenotyping, and treatment related to cystic fibrosis.

Cystic fibrosis is fertile ground to explore machine learning methods, due in part to the creation of the UK Cystic Fibrosis Registry, an extensive database covering 99% of the UK’s cystic fibrosis population, which is managed by the UK Cystic Fibrosis Trust. The Registry holds both static and time-series data for each patient, including demographic information, CFTR genotype, disease-related measures including infection data, comorbidities and complications, lung function, weight, intravenous antibiotics usage, medications, transplantations and deaths.

Turning such rich datasets into medical understanding is a key priority for the future of personalized healthcare. Through our own lab’s ongoing partnership with, and support from, the UK Cystic Fibrosis Trust, we have been able to take the Registry’s data to a completely new level.

Some of our lab’s key projects relating to machine learning for cystic fibrosis are introduced below, but much more information can be found on our dedicated spotlight page.

High-level overview

For a succinct, accessible, and high-level overview of the many opportunities for machine learning to transform care for people with cystic fibrosis, please take a look at a recent article published in the Journal of Cystic Fibrosis by our lab and collaborators.

Opportunities for machine learning to transform care for people with cystic fibrosis (Journal of Cystic Fibrosis, January 2020)

Referral of patients for lung transplants

Our lab has developed individualized prediction methods for patients on the lung transplantation waitlist with cystic fibrosis. In this case, we adapted our AutoPrognosis framework, which can automate the process of constructing clinical prognostic models, and used it to establish the optimal timing for referring patients with terminal respiratory failure for lung transplantation. This work was published in Nature Scientific Reports.

Prognostication and Risk Factors for Cystic Fibrosis via Automated Machine Learning (Nature Scientific Reports, July 2018)

Specific projects

The following section highlights and summarizes some of our other key projects related to cystic fibrosis, including those in which we have leveraged our extensive partnership with the UK Cystic Fibrosis Trust.

All of the projects below are tied together by a common purpose: to better understand and model the trajectory of cystic fibrosis (and other diseases) using time-series datasets. The topic of time series for healthcare is something we have covered in an extensive write-up, which can be found here.

Attentive State-Space Modeling of Disease Progression (NeurIPS 2019)
Dynamic-DeepHit: a Deep Learning Approach for Dynamic Survival Analysis with Competing Risks based on Longitudinal Data (IEEE Transactions on Biomedical Engineering, 2020)
Clairvoyance: a Pipeline Toolkit for Medical Time Series (ICLR 2021)


Thanks to ongoing support from Alzheimer’s Research UK, our lab has been conducting ongoing research into the application of machine learning to Alzheimer’s—a disease that is too often overlooked, despite affecting roughly 1 in 14 people over the age of 65, and 1 in every 6 people over the age of 80 (according to the UK’s NHS).

Machine learning, driven by data, can offer powerful new tools in the fight against Alzheimer’s.

Some of our lab’s key projects relating to machine learning for Alzheimer’s are introduced below, but much more information can be found on our dedicated spotlight page.

Side-note: all of the projects below made use of data provided through the open-access Alzheimer’s Disease Neuroimaging Initiative (ADNI) study, which tracks disease progression for over 1,700 patients.

Forecasting Disease Trajectories in Alzheimer’s Disease Using Deep Learning (KDD Workshop on Machine Learning for Medicine and Healthcare, 2018)
Disease-Atlas: Navigating Disease Trajectories using Deep Learning (MLHC 2018)
Dynamic Prediction in Clinical Survival Analysis using Temporal Convolutional Networks (IEEE Journal of Biomedical and Health Informatics, 2019)
Temporal Phenotyping using Deep Predicting Clustering of Disease Progression (ICML 2020)

Organ transplantation

Organ transplantation is a high-stakes domain in which there is exceptional potential for real-world impact through increased efficiency, but increasing efficiency in any meaningful way would require us to navigate a highly complex set of interrelated problems.

We have now been working on organ transplantation for a number of years, and in this time have developed a portfolio of groundbreaking data-driven machine learning approaches with the support of clinical collaborators representing a range of specializations within the domain. Our projects tackle the challenges raised by transplantation in general, but also address problems specific to a variety of commonly transplanted organs, including hearts, livers, and lungs. Our work is ongoing, and we continue to develop new and improved methods.

Some of our lab’s key projects relating to organ transplantation are introduced below, but much more information can be found on our dedicated spotlight page.

Personalized survival predictions

Survival prediction before and after transplantation is an especially important problem because transplantation and treatment decisions depend on predictions of patient survival on the waitlist and survival after transplantation. Better predictions may, therefore, increase the number of successful transplantations.

In a study published in PLoS ONE in 2018, our lab worked with clinical and academic collaborators from the University of California, Los Angeles (UCLA), University of California, Davis (UC Davis), and University College London (UCL) to develop a methodology for personalized prediction of survival for patients with advanced heart failure, both while on the waitlist and after heart transplantation. The method we developed can capture the heterogeneity of populations by creating clusters of patients and providing specific predictive models for each cluster. It addresses the interaction of multiple features and, importantly, takes into account the difference between long-term survival and short-term survival.

In addition to being published in PLoS One (details below), this work was featured in Newsweek.

Personalized survival predictions via Trees of Predictors: An application to cardiac transplantation (PLoS ONE, March 2018)

Additionally, as introduced in this page’s section on cystic fibrosis, our lab has also developed individualized prediction methods for patients on the lung transplantation waitlist with cystic fibrosis. To navigate back to that section, click here.

Personalized donor-recipient matching

Even though organ transplantation can increase life expectancy and quality of life for the recipient, the operation can entail various complications, including infection, acute and chronic rejection, and malignancy. This is a complicated risk assessment problem, since postoperative patient survival depends on different types of risk factors: recipient-related factors (e.g., cardiovascular disease severity of heart recipients), recipient-donor matching factors (e.g., weight ratio and human leukocyte antigen), race, and donor-related factors (e.g., diabetes).

Through impactful collaborations with clinicians, our lab has developed a range of methods and models that deal with the many complexities inherent in the problem of recipient-donor matching.

Personalized Donor-Recipient Matching for Organ Transplantation (AAAI 2017)
Learning Matching Representations for Individualized Organ Transplantation Allocation (AISTATS 2021)
OrganITE: Optimal transplant donor organ offering using an individual treatment effect (NeurIPS 2020)
Learning Queueing Policies for Organ Transplantation Allocation using Interpretable Counterfactual Survival Analysis (ICML 2021)

Cardiovascular disease

Our lab has spent many years working alongside clinicians to research and develop new cutting-edge models and methods to transform how we diagnose and treat heart and circulatory conditions. Much of this work has been made possible through support from the British Heart Foundation and The Alan Turing Institute.

A few of our key projects in this area are listed below, but an extensive selection of papers can be found on our lab’s publications page.

Cardiovascular disease risk prediction

Identifying people at risk of cardiovascular diseases is a cornerstone of preventative cardiology. Risk prediction models currently recommended by clinical guidelines are typically based on a limited number of predictors, and yield sub-optimal performance across all patient groups. Data-driven techniques based on machine learning might improve the performance of risk predictions by agnostically discovering novel risk predictors and learning the complex interactions between them.

In a collaboration between our lab and a group of clinicians from the University of Cambridge and published in PLoS One, we used UK Biobank data to determine whether machine learning techniques could improve risk prediction compared to traditional approaches, and whether considering non-traditional variables could increase the accuracy of risk predictions.

Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants (PLoS One, May 2019)

Survival and mortality prediction studies

Survival and mortality prediction are crucial in many areas of medical practice, including cardiovascular disease, but existing clinical risk-scoring methods often yield suboptimal results.

Guided by diverse groups of clinical collaborators, our lab has created a range of powerful tools that can accurately and informatively predict survival and mortality in cardiovascular disease patients based on complex relationships learned from healthcare datasets.

One such tool (published in PLoS One and featured in Newsweek) has already been introduced in the organ transplantation section on this page (to navigate to the relevant paper, click here), but a couple other projects of particular note are provided below.

Machine Learning Techniques for Risk Stratification of Non-ST-Elevation Acute Coronary Syndrome: The Role of Diabetes and Age (Circulation, June 2018)
Sex Differences in Outcomes After STEMI: Effect Modification by Treatment Strategy and Age (JAMA Internal Medicine, May 2018)

Partnering with clinicians to revolutionize healthcare

Every clinician has heard that AI will have (or is having) a transformative impact on healthcare. The area is so new and fast-moving, however, that almost no clinicians will have been taught about it in medical school, while coverage in medical journals is often very cursory.

Unlocking the potential of AI and machine learning for healthcare must be a truly interdisciplinary undertaking. To best meet the needs of all healthcare stakeholders, it is imperative that clinical professionals and members of the AI/machine learning community find a strong basis for mutual understanding and collaboration.

To help fill this gap, our lab created Revolutionizing Healthcare, a regular online engagement series for clinicians, in 2020. We now have roughly 400 clinicians from around the world registered to participate in these sessions.

Aims of the Revolutionizing Healthcare series

The aim of Revolutionizing Healthcare is to introduce members of the clinical community to foundational concepts related to AI, machine learning, data science, and operations research, while showing how these can play a valuable role in transforming healthcare.

In our sessions, we demonstrate specifically how real-world challenges facing clinicians can be mapped to solutions using AI/machine learning, etc., through the use of rigorous academic formalisms.

The sessions also explore the complex interdisciplinary nature of this kind of problem-solving, and our ultimate goal is, through engagement, to jointly shape a framework for understanding and planning the integration of AI/machine learning and healthcare.

Exploring machine learning for healthcare—together

Our sessions are tailor-made for the clinical community. Little or no quantitative background is required in order to participate in and benefit from Revolutionizing Healthcare. As mentioned above, practicing clinicians are the primary audience, but we are also happy to be joined by clinicians in training, support staff, hospital administrators, and a wide variety of healthcare professionals.

In each session we examine potential applications of AI/machine learning through the lens of a particular medical domain (e.g., acute care, cancer, organ transplantation, interpretability).

To combine the provision of instructional content with free-ranging discussion and exploration, we start each session with short presentations by our lab members, and then hold a live roundtable with a panel of clinicians, usually including an open Q&A session with participants.

We would encourage any clinicians hoping to learn more about machine learning for healthcare—or share their opinions and discuss new ideas—to sign up for our Revolutionizing Healthcare sessions via the URL below.

Policy impact

While the bulk of our research involves developing machine learning methods and models tailored to real-world healthcare problems, our ultimate goal is a full-fledged transformation that will create an entire ecosystem encompassing everything from (inter)national healthcare networks all the way down to individual practitioners and patients. To that end, our lab has contributed to a number of discussions regarding policies and guidelines at the highest levels.

As part of the 2019 NHS Topol Review, Mihaela van der Schaar co-chaired the Expert Advisory Panel on Artificial Intelligence and Robotics.

Additionally, Mihaela contributed a chapter to the U.K.’s 2018 Annual Report of the Chief Medical Officer, discussing how machine learning can transform medicine and healthcare.

Artificial intelligence and robotics (NHS Topol Review, 2019)
Machine learning for individualised medicine (Annual Report of Chief Medical Officer, Department of Health and Social Care, United Kingdom, 2018)

Impact of previous research

While our lab’s focus is now firmly on machine learning, AI, and operations research for healthcare and medicine, Mihaela van der Schaar’s previous research achieved substantial impact in the areas of multimedia communications, compression and processing, and real-time stream mining.

While working at Philips from 1996 to 2003 (and simultaneously completing her Ph.D.), Mihaela developed both the theoretical foundations and the first practical algorithm for streaming video. Her contributions are embedded in commercial products (including the award-winning Phillips webcam) and she is personally credited as inventor on 35 U.S. patents (the majority of which are listed here), many of which are still frequently cited in other patents and and adopted in standards. Between 1999 and 2003, she was Philips’ representative to the International Standards Organization (ISO) that developed and wrote the MPEG-4 (Motion Picture Experts Group) standards for streaming video, in which she led several working groups, and to which she contributed more than 40 papers. For this contributions, she received 3 ISO awards.

Mihaela has also developed new methods for detecting, characterizing, and forecasting complex events (road traffic collisions, popularity of videos in social networks, energy supply and demand in smart-grids, etc.) based on a novel machine learning and real-time stream mining paradigm. These methods have been implemented as part of the IBM InfoSpheres Platform for a Smarter Planet.

For more information, please see Mihaela’s personal page.