van der Schaar Lab

Spotlight on cystic fibrosis research projects

Thanks to support from the UK Cystic Fibrosis Trust and its pioneering patient registry, our lab has developed a range of powerful machine learning tools for diagnosis, prognosis, phenotyping, and treatment related to cystic fibrosis.

The most common genetic disease in caucasian populations, Cystic fibrosis is defined by a unique mix of complexities that make the lives of its patients and the task of healthcare professionals particularly unpredictable. As a chronic condition, its progression at times appears almost random due to the potential presence of a variety of (often competing) complications. These can be hard to disentangle, and usually require targeted prevention or mitigation when identified.

While significant progress has been made in understanding this disease and improving the lives of sufferers in recent decades, there is much yet to be done: for example, only about half of those born in the UK with cystic fibrosis (as of 2019) are likely to live to the age of 50. Clinical insights gained through machine learning could reduce the burden of this disease and increase longevity through increasingly personalized treatment and intervention choices, accurate clinical predictions, and accelerated medical discovery.

Cystic fibrosis is a fertile ground to explore machine learning methods, due in part to the creation of the UK Cystic Fibrosis Registry, an extensive database covering 99% of the UK’s cystic fibrosis population, which is managed by the UK Cystic Fibrosis Trust. The Registry holds both static and time-series data for each patient, including demographic information, CFTR genotype, disease-related measures including infection data, comorbidities and complications, lung function, weight, intravenous antibiotics usage, medications, transplantations and deaths.

Turning such rich datasets into medical understanding is a key priority for the future of personalized healthcare. Through our own lab’s ongoing partnership with, and support from, the UK Cystic Fibrosis Trust, we have been able to take the Registry’s data to a completely new level.

This post will highlight and summarize some of our key projects related to cystic fibrosis, including (but not limited to) those in which we have leveraged our extensive partnership with the UK Cystic Fibrosis Trust. Each project targets a number of clinical problem types related to cystic fibrosis; these are detailed below.

Risk assessment and diagnosis

Whether diagnosing cystic fibrosis in the first instance or determining the likelihood of any number of potential risks facing patients, common statistical risk evaluation methods are unable to fully integrate the wealth of information available about each individual. By contrast, machine learning methods are able to handle many more features (offering significant informational gains) and can make better use of feature information by better capturing the potentially complex interactions between features (resulting in modeling gains). This can result in more accurate predictions, and hence better treatment guidance, for the patient at hand.


Cystic fibrosis evolves slowly, allowing for development of comorbidities and bacterial infections, and creating distinct responses to therapeutic interventions. This results in great heterogeneity in terms of potential disease pathways and potential interactions between different comorbidities, often resulting in very diverse patient outcomes, even in narrow patient subgroups. Machine learning techniques for patient phenotyping (supported by sufficient data) can help anticipate patients’ prognoses by identifying “similar” patients, and designing treatment guidelines that are tailored to homogeneous patient subgroups.

Forecasting disease trajectories

Due to the wide availability of modern electronic health records, patient care data is now often stored in the form of time-series data. This is particularly relevant to cystic fibrosis, given the slow evolution of the disease (for example, annual follow ups over multi-year horizons are commonplace). Since biomarkers and other risk factors of cystic fibrosis patients are measured repeatedly over time, prognostic tools powered by machine learning can process the longitudinal trajectory of these biomarkers and help clinical decision-makers better understand the disease and predict multiple events or outcomes over time.

Competing risks and comorbidities

Cystic fibrosis patients suffer from, or are at risk of, multiple diseases or conditions; these risks increase as the patient ages. Machine learning methods can help monitor and treat such patients by predicting which diseases or conditions are likely to occur and at what point, and how the risks for various diseases or conditions change over time. By comparison with commonly used statistical models, machine learning is extremely well-suited to analyses involving multiple competing risks where more than one type of event plays a role in the survival setting.

Personalized monitoring and early warning systems

Cystic fibrosis must currently make routine clinical visits even when well, which is inefficient and can adversely impact the lives of patients. Enabled in part by remote monitoring, machine learning can transform this model of care through by enabling the provision of comprehensive and high-quality care. Based on integration of all data relevant to an individual; machine learning-enabled systems can offer assessment of (and feedback regarding) patient progress, predictions regarding likely health development or changes, and alerts related to the need for further action or consultation.


A major challenge across the domain of healthcare is ascertaining whether a given intervention will influence or determines an outcome. For cystic fibrosis patients, such decisions may commonly involve determining whether there is a survival benefit to prescribing a certain medication, or waitlisting a patient for a lung transplant. In addition to providing accurate predictions and granular risk scores that can quantify the severity of future outcomes, machine learning tools can can be used for treatment planning, individualized treatment effect inference, follow-up scheduling, or estimating the time at which a transplant would be needed in the future.

Scientific discovery

Cystic fibrosis is a complex disease that is not yet close to being fully understood. The application of machine learning models can yield new insights into the nature of cystic fibrosis: for example, integrating many features and capturing complex patterns can teach us about the clinical significance of specific features that were not previously believed to be important.

The figure above is a conceptual rendering outlining the process of developing, validating, and deploying tailored machine learning tools that support bespoke medicine and scientific discovery in healthcare.

For a succinct, accessible, and high-level overview of the many opportunities for machine learning to transform care for people with cystic fibrosis, please take a look at a recent article published in the Journal of Cystic Fibrosis by our lab and collaborators.

Prognostication and Risk Factors for Cystic Fibrosis via Automated Machine Learning

Ahmed Alaa, Mihaela van der Schaar
Published in Nature Scientific Reports, 2018


Dynamic-DeepHit: a Deep Learning Approach for Dynamic Survival Analysis
with Competing Risks based on Longitudinal Data

Changhee Lee, Jinsung Yoon, Mihaela van der Schaar
Published in IEEE Transactions on Biomedical Engineering, 2020


Attentive State-Space Modeling of Disease Progression

Ahmed Alaa, Mihaela van der Schaar
NeurIPS 2019


Disease-Atlas: Navigating Disease Trajectories with Deep Learning

Bryan Lim, Mihaela van der Schaar
MLHC 2018


Temporal Phenotyping using Deep Predictive Clustering of Disease Progression

Changhee Lee, Mihaela van der Schaar
ICML 2020


Application of Kernel Hypothesis Testing on Set-valued Data

Alexis Bellot, Mihaela van der Schaar


Clairvoyance: a Pipeline Toolkit for Medical Time Series

Daniel Jarrett, Jinsung Yoon, Ioana Bica, Zhaozhi Qian, Ari Ercole, Mihaela van der Schaar
ICLR 2021


The substantial body of work presented above would not have been possible without the generous support of from the UK Cystic Fibrosis Trust, or without their pioneering work and vision in creating the UK Cystic Fibrosis Registry.

If you are a clinician and would like to learn more about how machine learning can be applied to real-world healthcare problems, please sign up for our Revolutionizing Healthcare online engagement sessions (no machine learning knowledge required).

For a full list of the van der Schaar Lab’s publications, click here.

Mihaela van der Schaar

Mihaela van der Schaar is the John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge and a Fellow at The Alan Turing Institute in London.

Mihaela has received numerous awards, including the Oon Prize on Preventative Medicine from the University of Cambridge (2018), a National Science Foundation CAREER Award (2004), 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award.

In 2019, she was identified by National Endowment for Science, Technology and the Arts as the most-cited female AI researcher in the UK. She was also elected as a 2019 “Star in Computer Networking and Communications” by N²Women. Her research expertise span signal and image processing, communication networks, network science, multimedia, game theory, distributed systems, machine learning and AI.

Mihaela’s research focus is on machine learning, AI and operations research for healthcare and medicine.

Nick Maxfield

From 2020 to 2022, Nick oversaw the van der Schaar Lab’s communications, including media relations, content creation, and maintenance of the lab’s online presence.