van der Schaar Lab

Impact

Putting research into practice

Our purpose as a lab is to create new and powerful machine learning techniques and methods that can revolutionize healthcare. This page describes the impact of our work.

Clinical impact

Our development of cutting-edge methods and models is guided by our clinical collaborators, academic colleagues, and our partners in the private sector. Much of our work, which is frequently featured in leading medical journals, can be applied to almost any healthcare problem, but a number of projects relevant to specific diseases and settings are presented on this page.

Revolutionizing healthcare through partnership with clinicians

This section highlights our efforts to reach across borders and build a diverse but aligned community committed to the common goal of revolutionizing healthcare—including our engagement sessions, which have roughly 400 clinicians from around the world registered to participate.

Policy impact

This section highlights our efforts to reach across borders and build a diverse but aligned community committed to a common goal: revolutionizing healthcare.

This section demonstrates how our lab has contributed to discussions regarding policies and guidelines at the highest levels.

Impact of previous research

This section briefly introduces Mihaela van der Schaar’s research in the areas of multimedia communications, compression and processing, and real-time stream mining.

COVID-19

The van der Schaar Lab has played an active role in the academic and clinical response to the COVID-19 pandemic, including:

  • developing and implementing Cambridge Adjutorium for COVID-19, a tool that allows clinicians to predict utilization of scarce resources such as ventilators and ICU beds, and entering a partnership with the NHS for real-world use of Cambridge Adjutorium at Acute Trusts in England; and
  • exploring and offering guidance regarding the potential impact of machine learning on clinical trials;
  • conducting research and statistical analysis regarding the nature of the disease, its spread, and its disproportionate impact on certain individuals and communities;
  • creating Policy Impact Predictor (PIP), a machine learning tool developed to guide government decision-making around measures to prevent the spread of COVID-19.

Specifics regarding all of the projects mentioned above can be found on our dedicated COVID-19 page.

Acute care/ICU

In acute care/ICU setting, vitally important decisions must be made on the basis of highly compressed time series datasets containing measurements that may not accurately portray the rapid evolution of a patient’s status. Due to the lagged nature of change in biomarkers, intensivists may already find themselves “behind the game” when deterioration becomes evident.

Additionally, there is a need for sophisticated tools that can provide accurate and actionable recommendations regarding decisions such as when a patient should be intubated and extubated.

Our lab has worked on these problems for many years, and we have developed a host of powerful tools in partnership with our clinical colleagues. Some of these are showcased below.

Authors:
Ahmed M. Alaa, Scott Hu, Mihaela van der Schaar

Abstract:
Critically ill patients in regular wards are vulnerable to unanticipated adverse events which require prompt transfer to the intensive care unit (ICU).

To allow for accurate prognosis of deteriorating patients, we develop a novel continuous-time probabilistic model for a monitored patient’s temporal sequence of physiological data. Our model captures “informatively sampled” patient episodes: the clinicians’ decisions on when to observe a hospitalized patient’s vital signs and lab tests over time are represented by a marked Hawkes process, with intensity parameters that are modulated by the patient’s latent clinical states, and with observable physiological data (mark process) modeled as a switching multi-task Gaussian process. In addition, our model captures “informatively censored” patient episodes by representing the patient’s latent clinical states as an absorbing semi-Markov jump process. The model parameters are learned from offline patient episodes in the electronic health records via an EM-based algorithm.

Experiments conducted on a cohort of patients admitted to a major medical center over a 3-year period show that risk prognosis based on our model significantly outperforms the currently deployed medical risk scores and other baseline machine learning algorithms.

Authors:
Ahmed M. Alaa, Jinsung Yoon, Scott Hu, Mihaela van der Schaar

Abstract:

Objective:
In this paper, we develop a personalized real-time risk scoring algorithm that provides timely and granular assessments for the clinical acuity of ward patients based on their (temporal) lab tests and vital signs; the proposed risk scoring system ensures timely intensive care unit admissions for clinically deteriorating patients.

Methods: The risk scoring system is based on the idea of sequential hypothesis testing under an uncertain time horizon. The system learns a set of latent patient subtypes from the offline electronic health record data, and trains a mixture of Gaussian Process experts, where each expert models the physiological data streams associated with a specific patient subtype. Transfer learning techniques are used to learn the relationship between a patient’s latent subtype and her static admission information (e.g., age, gender, transfer status, ICD-9 codes, etc).

Results:
Experiments conducted on data from a heterogeneous cohort of 6321 patients admitted to Ronald Reagan UCLA medical center show that our score significantly outperforms the currently deployed risk scores, such as the Rothman index, MEWS, APACHE, and SOFA scores, in terms of timeliness, true positive rate, and positive predictive value.

Conclusion:
Our results reflect the importance of adopting the concepts of personalized medicine in critical care settings; significant accuracy and timeliness gains can be achieved by accounting for the patients’ heterogeneity. Significance: The proposed risk scoring methodology can confer huge clinical and social benefits on a massive number of critically ill inpatients who exhibit adverse outcomes including, but not limited to, cardiac arrests, respiratory arrests, and septic shocks.

Authors:
Ahmed M. Alaa, Mihaela van der Schaar

Abstract:
Modeling continuous-time physiological processes that manifest a patient’s evolving clinical states is a key step in approaching many problems in healthcare.

In this paper, we develop the Hidden Absorbing Semi-Markov Model (HASMM): a versatile probabilistic model that is capable of capturing the modern electronic health record (EHR) data. Unlike existing models, the HASMM accommodates irregularly sampled, temporally correlated, and informatively censored physiological data, and can describe non-stationary clinical state transitions. Learning the HASMM parameters from the EHR data is achieved via a novel forward-filtering backward-sampling Monte-Carlo EM algorithm that exploits the knowledge of the end-point clinical outcomes (informative censoring) in the EHR data, and implements the E-step by sequentially sampling the patients’ clinical states in the reverse-time direction while conditioning on the future states. Real-time inferences are drawn via a forward-filtering algorithm that operates on a virtually constructed discrete-time embedded Markov chain that mirrors the patient’s continuous-time state trajectory.

We demonstrate the prognostic utility of the HASMM in a critical care prognosis setting using a real-world dataset for patients admitted to the Ronald Reagan UCLA Medical Center. In particular, we show that using HASMMs, a patient’s clinical deterioration can be predicted 8-9 hours prior to intensive care unit admission, with a 22% AUC gain compared to the Rothman index, which is the state-of-the-art critical care risk scoring technology.

Cancer

The term cancer embraces a wide variety of related disorders/conditions that share many similarities but also many differences. This daunting complexity becomes more apparent with every breakthrough in our quest to understand it: it ranges from the bewildering array of disease subtypes (and subtypes of subtypes) to variations in cause and presentation, to the lengthy and unpredictable pathways inflicted on patients.

While the notion of developing a single “magic bullet” to cure cancer is outdated, ongoing research advancements have at least allowed us to develop a substantial arsenal in areas such as prevention, prediction, detection, diagnosis, treatment, and care. Truly revolutionizing our ability to combat cancer, however, requires an altogether deeper understanding of its disease pathways, and this can only be achieved through the adoption of machine learning methods.

Some of our lab’s key projects relating to machine learning for cancer are introduced below, but much more information can be found on our dedicated cancer spotlight page.

Adjutorium for breast cancer

An extensive study published in Nature Machine Intelligence shows that a prognostic tool developed by the van der Schaar Lab can recommend therapies for breast cancer patients more reliably than methods that are currently considered international clinical best practice. The study makes unprecedented use of complex, high-quality cancer datasets from the U.K. and U.S. to demonstrate the accuracy of Adjutorium, a machine learning system for prognostication and treatment benefit prediction.

Authors:
Ahmed M. Alaa, Deepti Gurdasani, Adrian L. Harris, Jem Rashbass, Mihaela van der Schaar

Abstract:
Accurate prediction of the individualized survival benefit of adjuvant therapy is key to making informed therapeutic decisions for patients with early invasive breast cancer. Machine learning technologies can enable accurate prognostication of patient outcomes under different treatment options by modelling complex interactions between risk factors in a data-driven fashion.

Here, we use an automated and interpretable machine learning algorithm to develop a breast cancer prognostication and treatment benefit prediction model—Adjutorium—using data from large-scale cohorts of nearly one million women captured in the national cancer registries of the United Kingdom and the United States.

We trained and internally validated the Adjutorium model on 395,862 patients from the UK National Cancer Registration and Analysis Service (NCRAS), and then externally validated the model among 571,635 patients from the US Surveillance, Epidemiology, and End Results (SEER) programme.

Adjutorium exhibited significantly improved accuracy compared to the major prognostic tool in current clinical use (PREDICT v2.1) in both internal and external validation. Importantly, our model substantially improved accuracy in specific subgroups known to be under-served by existing models.

Visit our dedicated Adjutorium page, try Adjutorium live via our web app, or explore the source code for AutoPrognosis, the AutoML model behind Adjutorium.

Personalizing the screening process and improving diagnostic triaging

A key priority in cancer diagnosis is managing the workload of radiologists to optimize accuracy, efficiency, and costs. Our challenge here is to ensure that radiologists can devote the right amount of time to viewing scans that actually need their attention, meaning such scans must be separated out from others which can simply be read using machine learning or similar technologies.

MAMMO, a tool developed by our lab, is a framework for cooperation between radiologists and machine learning. The focus of MAMMO is to triage mammograms between machine learning systems and radiologists.

Our lab has also developed a system called ConfidentCare, which, like MAMMO, aims to improve accuracy and efficiency of resource usage within the overall diagnostic process. ConfidentCare is a clinical decision support system that identifies what type of screening modality (e.g. mammogram, ultrasound, MRI) should be used for specific individuals, given their unique characteristics such as genomic information or past screening history.

Authors:
Trent Kyono, Fiona J Gilbert, Mihaela van der Schaar

Abstract:

Objective: 
The aim of this study was to determine whether machine learning could reduce the number of mammograms the radiologist must read by using a machine-learning classifier to correctly identify normal mammograms and to select the uncertain and abnormal examinations for radiological interpretation.

Methods: 
Mammograms in a research data set from over 7,000 women who were recalled for assessment at six UK National Health Service Breast Screening Program centers were used. A convolutional neural network in conjunction with multitask learning was used to extract imaging features from mammograms that mimic the radiological assessment provided by a radiologist, the patient’s nonimaging features, and pathology outcomes. A deep neural network was then used to concatenate and fuse multiple mammogram views to predict both a diagnosis and a recommendation of whether or not additional radiological assessment was needed.

Results: 
Ten-fold cross-validation was used on 2,000 randomly selected patients from the data set; the remainder of the data set was used for convolutional neural network training. While maintaining an acceptable negative predictive value of 0.99, the proposed model was able to identify 34% (95% confidence interval, 25%-43%) and 91% (95% confidence interval: 88%-94%) of the negative mammograms for test sets with a cancer prevalence of 15% and 1%, respectively.

Conclusion: 
Machine learning was leveraged to successfully reduce the number of normal mammograms that radiologists need to read without degrading diagnostic accuracy.

Authors:
Ahmed M. Alaa, Kyeong H. Moon, William Hsu, Mihaela van der Schaar

Abstract:

Breast cancer screening policies attempt to achieve timely diagnosis by regularly screening healthy women via various imaging tests. Various clinical decisions are needed to manage the screening process: selecting initial screening tests, interpreting test results, and deciding if further diagnostic tests are required.

Current screening policies are guided by clinical practice guidelines (CPGs), which represent a “one-size-fits-all” approach, designed to work well (on average) for a population, and can only offer coarse expert-based patient stratification that is not rigorously validated through data. Since the risks and benefits of screening tests are functions of each patient’s features,personalized screening policies tailored to the features of individuals are desirable.

To address this issue, we developed ConfidentCare: a computer-aided clinical decision support system that learns a personalized screening policy from electronic health record (EHR) data. By a “personalized screening policy,” we mean a clustering of women’s features, and a set of customized screening guidelines for each cluster. ConfidentCare operates by computing clusters of patients with similar features, then learning the “best” screening procedure for each cluster using a supervised learning algorithm. The algorithm ensures that the learned screening policy satisfies a predefined accuracy requirement with a high level of confidence for every cluster.

By applying ConfidentCare to real-world data, we show that it outperforms the current CPGs in terms of cost efficiency and false positive rates: a reduction of 31% in the false positive rate can be achieved.

Risk and prognosis

Survival analysis (often referred to as time-to-event analysis) refers to the study of the duration until one or more events occur. Accurate prognostication is crucial in treatment decisions made for cancer patients, but widely-used models rely on prespecified variables, which limits their performance.

In a paper published in The Lancet Digital Health in 2021, we introduced a research project undertaken by our lab in collaboration with clinical colleagues, in which we investigated a novel machine learning approach to develop an improved prognostic model for predicting 10-year prostate cancer-specific mortality.

Authors:
Changhee Lee, Alexander Light, Ahmed Alaa, David Thurtle, Mihaela van der Schaar, Vincent J. Gnanapragasam

Abstract:

Background:
Accurate prognostication is crucial in treatment decisions made for men diagnosed with non-metastatic prostate cancer. Current models rely on prespecified variables, which limits their performance. We aimed to investigate a novel machine learning approach to develop an improved prognostic model for predicting 10-year prostate cancer-specific mortality and compare its performance with existing validated models.

Methods:
We derived and tested a machine learning-based model using Survival Quilts, an algorithm that automatically selects and tunes ensembles of survival models using clinicopathological variables. Our study involved a US population-based cohort of 171 942 men diagnosed with non-metastatic prostate cancer between Jan 1, 2000, and Dec 31, 2016, from the prospectively maintained Surveillance, Epidemiology, and End Results (SEER) Program. The primary outcome was prediction of 10-year prostate cancer-specific mortality. Model discrimination was assessed using the concordance index (c-index), and calibration was assessed using Brier scores. The Survival Quilts model was compared with nine other prognostic models in clinical use, and decision curve analysis was done.

Findings:
647 151 men with prostate cancer were enrolled into the SEER database, of whom 171 942 were included in this study. Discrimination improved with greater granularity, and multivariable models outperformed tier-based models. The Survival Quilts model showed good discrimination (c-index 0·829, 95% CI 0·820–0·838) for 10-year prostate cancer-specific mortality, which was similar to the top-ranked multivariable models: PREDICT Prostate (0·820, 0·811–0·829) and Memorial Sloan Kettering Cancer Center (MSKCC) nomogram (0·787, 0·776–0·798). All three multivariable models showed good calibration with low Brier scores (Survival Quilts 0·036, 95% CI 0·035–0·037; PREDICT Prostate 0·036, 0·035–0·037; MSKCC 0·037, 0·035–0·039). Of the tier-based systems, the Cancer of the Prostate Risk Assessment model (c-index 0·782, 95% CI 0·771–0·793) and Cambridge Prognostic Groups model (0·779, 0·767–0·791) showed higher discrimination for predicting 10-year prostate cancer-specific mortality. c-indices for models from the National Comprehensive Cancer Care Network, Genitourinary Radiation Oncologists of Canada, American Urological Association, European Association of Urology, and National Institute for Health and Care Excellence ranged from 0·711 (0·701–0·721) to 0·761 (0·750–0·772). Discrimination for the Survival Quilts model was maintained when stratified by age and ethnicity. Decision curve analysis showed an incremental net benefit from the Survival Quilts model compared with the MSKCC and PREDICT Prostate models currently used in practice.

Interpretation:
A novel machine learning-based approach produced a prognostic model, Survival Quilts, with discrimination for 10-year prostate cancer-specific mortality similar to the top-ranked prognostic models, using only standard clinicopathological variables. Future integration of additional data will likely improve model performance and accuracy for personalised prognostics.

Cystic fibrosis

The most common genetic disease in caucasian populations, Cystic fibrosis is defined by a unique mix of complexities that make the lives of patients and the task of healthcare professionals particularly unpredictable. As a chronic condition, its progression at times appears almost random due to the potential presence of a variety of (often competing) complications. These can be hard to disentangle, and usually require targeted prevention or mitigation when identified.

Thanks to support from the UK Cystic Fibrosis Trust and its pioneering patient registry, our lab has developed a range of powerful machine learning tools for diagnosis, prognosis, phenotyping, and treatment related to cystic fibrosis.

Cystic fibrosis is fertile ground to explore machine learning methods, due in part to the creation of the UK Cystic Fibrosis Registry, an extensive database covering 99% of the UK’s cystic fibrosis population, which is managed by the UK Cystic Fibrosis Trust. The Registry holds both static and time-series data for each patient, including demographic information, CFTR genotype, disease-related measures including infection data, comorbidities and complications, lung function, weight, intravenous antibiotics usage, medications, transplantations and deaths.

Turning such rich datasets into medical understanding is a key priority for the future of personalized healthcare. Through our own lab’s ongoing partnership with, and support from, the UK Cystic Fibrosis Trust, we have been able to take the Registry’s data to a completely new level.

Some of our lab’s key projects relating to machine learning for cystic fibrosis are introduced below, but much more information can be found on our dedicated spotlight page.

High-level overview

For a succinct, accessible, and high-level overview of the many opportunities for machine learning to transform care for people with cystic fibrosis, please take a look at a recent article published in the Journal of Cystic Fibrosis by our lab and collaborators.

Authors:
Mahed Abroshan, Ahmed M. Alaa, Oli Rayner, Mihaela van der Schaar

Abstract:
The availability of high-quality data from patient registries provides a robust starting point for using Machine Learning (ML) techniques to enhance the care of the patient with cystic fibrosis (CF).

Capitalizing on the wealth of information provided by registry data, ML techniques can augment clinical workflows by making individual-level predictions for a patient’s prognosis that are tailored to their specific traits, features, and medical history. Such personalized approaches become especially relevant as CFTR modulators precipitate a shift to mutation-based medicine. ML-based techniques can help provide clinicians with a refined understanding of patient heterogeneity.

Here, we discuss several areas where ML techniques can help underpin a personalized approach to patient management.

Referral of patients for lung transplants

Our lab has developed individualized prediction methods for patients on the lung transplantation waitlist with cystic fibrosis. In this case, we adapted our AutoPrognosis framework, which can automate the process of constructing clinical prognostic models, and used it to establish the optimal timing for referring patients with terminal respiratory failure for lung transplantation. This work was published in Nature Scientific Reports.

Authors:
Ahmed M. Alaa, Mihaela van der Schaar

Abstract:
Accurate prediction of survival for cystic fibrosis patients is instrumental in establishing the optimal timing for referring patients with terminal respiratory failure for lung transplantation. Current practice considers referring patients for lung transplantation evaluation once the forced expiratory volume (FEV1) drops below 30% of its predicted nominal value. While FEV1 is indeed a strong predictor of cystic fibrosis-related mortality, we hypothesized that the survival behavior of cystic fibrosispatients exhibits a lot more heterogeneity.

To this end, we developed an algorithmic framework, which we call AutoPrognosis, that leverages the power of machine learning to automate the process of constructing clinical prognostic models, and used it to build a prognostic model for cystic fibrosis using data from a contemporary cohort that involved 99% of the cystic fibrosis population in the UK. AutoPrognosis uses Bayesian optimization techniques to automate the process of configuring ensembles of machine learning pipelines, which involve imputation, feature processing, classification and calibration algorithms. Because it is automated, it can be used by clinical researchers to build prognostic models without the need for in-depth knowledge of machine learning.

Our experiments revealed that the accuracy of the model learned by AutoPrognosis is superior to that of existing guidelines and other competing models.

Specific projects

The following section highlights and summarizes some of our other key projects related to cystic fibrosis, including those in which we have leveraged our extensive partnership with the UK Cystic Fibrosis Trust.

All of the projects below are tied together by a common purpose: to better understand and model the trajectory of cystic fibrosis (and other diseases) using time-series datasets. The topic of time series for healthcare is something we have covered in an extensive write-up, which can be found here.

Authors:
Ahmed M. Alaa, Mihaela van der Schaar

Abstract:
Models of disease progression are instrumental for predicting patient outcomes and understanding disease dynamics. Existing models provide the patient with pragmatic (supervised) predictions of risk, but do not provide the clinician with intelligible (unsupervised) representations of disease pathology.

In this paper, we develop the attentive state-space model, a deep probabilistic model that learns accurate and interpretable structured representations for disease trajectories. Unlike Markovian state-space models, in which state dynamics are memoryless, our model uses an attention mechanism to create “memoryful” dynamics, whereby attention weights determine the dependence of future disease states on past medical history. To learn the model parameters from medical records, we develop an inference algorithm that jointly learns a compiled inference network and the model parameters, leveraging the attentive representation to construct a variational approximation of the posterior state distribution.

Experiments on data from the UK Cystic Fibrosis registry show that our model demonstrates superior predictive accuracy, in addition to providing insights into disease progression dynamic.

Authors:
Ahmed M. Alaa, Mihaela van der Schaar

Abstract:
Currently available risk prediction methods are limited in their ability to deal with complex, heterogeneous, and longitudinal data such as that available in primary care records, or in their ability to deal with multiple competing risks.

This paper develops a novel deep learning approach that is able to successfully address current limitations of standard statistical approaches such as land marking and joint modeling. Our approach, which we call Dynamic-DeepHit, flexibly incorporates the available longitudinal data comprising various repeated measurements (rather than only the last available measurements) in order to issue dynamically updated survival predictions for one or multiple competing risk(s).

Dynamic-DeepHit learns the time-to-event distributions without the need to make any assumptions about the underlying stochastic models for the longitudinal and the time-to-event processes. Thus, unlike existing works in statistics, our method is able to learn data-driven associations between the longitudinal data and the various associated risks without underlying model specifications.

We demonstrate the power of our approach by applying it to a real-world longitudinal dataset from the U.K. Cystic Fibrosis Registry, which includes a heterogeneous cohort of 5883 adult patients with annual follow-ups between 2009 to 2015. The results show that Dynamic-DeepHit provides a drastic improvement in discriminating individual risks of different forms of failures due to cystic fibrosis.

Furthermore, our analysis utilizes post-processing statistics that provide clinical insight by measuring the influence of each covariate on risk predictions and the temporal importance of longitudinal measurements, thereby enabling us to identify covariates that are influential for different competing risks.

Authors:
Daniel Jarrett, Jinsung Yoon, Ioana Bica, Zhaozhi Qian, Ari Ercole, Mihaela van der Schaar

Abstract:
Time-series learning is the bread and butter of data-driven clinical decision support, and the recent explosion in ML research has demonstrated great potential in various healthcare settings.

At the same time, medical time-series problems in the wild are challenging due to their highly composite nature: They entail design choices and interactions among components that preprocess data, impute missing values, select features, issue predictions, estimate uncertainty, and interpret models. Despite exponential growth in electronic patient data, there is a remarkable gap between the potential and realized utilization of ML for clinical research and decision support. In particular, orchestrating a real-world project lifecycle poses challenges in engineering (i.e. hard to build), evaluation (i.e. hard to assess), and efficiency (i.e. hard to optimize).

Designed to address these issues simultaneously, Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a (i) software toolkit, (ii) empirical standard, and (iii) interface for optimization. Our ultimate goal lies in facilitating transparent and reproducible experimentation with complex inference workflows, providing integrated pathways for (1) personalized prediction, (2) treatment-effect estimation, and (3) information acquisition.

Through illustrative examples on real-world data in outpatient, general wards, and intensive-care settings, we illustrate the applicability of the pipeline paradigm on core tasks in the healthcare journey. To the best of our knowledge, Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.

Note: When validating Clairvoyance, we specifically sought to include experiments using datasets from time-series environments that reflect the heterogeneity of realistic use cases envisioned for Clairvoyance. The UK Cystic Fibrosis Registry was an obvious choice in this context, since individuals in the registry are chronic patients monitored over infrequent visits, and for whom long-term decline is generally expected.

Alzheimer’s

Thanks to ongoing support from Alzheimer’s Research UK, our lab has been conducting ongoing research into the application of machine learning to Alzheimer’s—a disease that is too often overlooked, despite affecting roughly 1 in 14 people over the age of 65, and 1 in every 6 people over the age of 80 (according to the UK’s NHS).

Machine learning, driven by data, can offer powerful new tools in the fight against Alzheimer’s.

Some of our lab’s key projects relating to machine learning for Alzheimer’s are introduced below, but much more information can be found on our dedicated spotlight page.

Side-note: all of the projects below made use of data provided through the open-access Alzheimer’s Disease Neuroimaging Initiative (ADNI) study, which tracks disease progression for over 1,700 patients.

Authors:
Bryan Lim, Mihaela van der Schaar

Abstract:
Joint models for longitudinal and time-to-event data are commonly used in longitudinal studies to forecast disease trajectories over time. Despite the many advantages of joint modeling, the standard forms suffer from limitations that arise from a fixed model specification and computational difficulties when applied to large datasets.

We adopt a deep learning approach to address these limitations, enhancing existing methods with the flexibility and scalability of deep neural networks while retaining the benefits of joint modeling.

Using data from the Alzheimer’s Disease Neuroimaging Institute, we show improvements in performance and scalability compared to traditional methods.

Authors:
Bryan Lim, Mihaela van der Schaar

Abstract:
Joint models for longitudinal and time-to-event data are commonly used in longitudinal studies to forecast disease trajectories over time. While there are many advantages to joint modeling, the standard forms suffer from limitations that arise from a fixed model specification, and computational difficulties when applied to high-dimensional datasets.

In this paper, we propose a deep learning approach to address these limitations, enhancing existing methods with the inherent flexibility and scalability of deep neural networks, while retaining the benefits of joint modeling.

Using longitudinal data from a real-world medical dataset, we demonstrate improvements in performance and scalability, as well as robustness in the presence of irregularly sampled data.

Authors:
Daniel Jarrett, Jinsung Yoon, Mihaela van der Schaar

Abstract:
Accurate prediction of disease trajectories is critical for early identification and timely treatment of patients at risk. Conventional methods in survival analysis are often constrained by strong parametric assumptions and limited in their ability to learn from high-dimensional data.

This paper develops a novel convolutional approach that addresses the drawbacks of both traditional statistical approaches as well as recent neural network models for survival. We present Match-Net: a missingness-aware temporal convolutional hitting-time network, designed to capture temporal dependencies and heterogeneous interactions in covariate trajectories and patterns of missingness. To the best of our knowledge, this is the first investigation of temporal convolutions in the context of dynamic prediction for personalized risk prognosis.

Using real-world data from the Alzheimer’s disease neuroimaging initiative, we demonstrate state-of-the-art performance without making any assumptions regarding underlying longitudinal or time-to-event processes-attesting to the model’s potential utility in clinical decision support.

Authors:
Changhee Lee, Mihaela van der Schaar

Abstract:
Due to the wider availability of modern electronic health records, patient care data is often being stored in the form of time-series. Clustering such time-series data is crucial for patient phenotyping, anticipating patients’ prognoses by identifying “similar” patients, and designing treatment guidelines that are tailored to homogeneous patient subgroups.

In this paper, we develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest (e.g., adverse events, the onset of comorbidities). To encourage each cluster to have homogeneous future outcomes, the clustering is carried out by learning discrete representations that best describe the future outcome distribution based on novel loss functions.

Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks and identifies meaningful clusters that can be translated into actionable information for clinical decision-making.

Organ transplantation

Organ transplantation is a high-stakes domain in which there is exceptional potential for real-world impact through increased efficiency, but increasing efficiency in any meaningful way would require us to navigate a highly complex set of interrelated problems.

We have now been working on organ transplantation for a number of years, and in this time have developed a portfolio of groundbreaking data-driven machine learning approaches with the support of clinical collaborators representing a range of specializations within the domain. Our projects tackle the challenges raised by transplantation in general, but also address problems specific to a variety of commonly transplanted organs, including hearts, livers, and lungs. Our work is ongoing, and we continue to develop new and improved methods.

Some of our lab’s key projects relating to organ transplantation are introduced below, but much more information can be found on our dedicated spotlight page.

Personalized survival predictions

Survival prediction before and after transplantation is an especially important problem because transplantation and treatment decisions depend on predictions of patient survival on the waitlist and survival after transplantation. Better predictions may, therefore, increase the number of successful transplantations.

In a study published in PLoS ONE in 2018, our lab worked with clinical and academic collaborators from the University of California, Los Angeles (UCLA), University of California, Davis (UC Davis), and University College London (UCL) to develop a methodology for personalized prediction of survival for patients with advanced heart failure, both while on the waitlist and after heart transplantation. The method we developed can capture the heterogeneity of populations by creating clusters of patients and providing specific predictive models for each cluster. It addresses the interaction of multiple features and, importantly, takes into account the difference between long-term survival and short-term survival.

In addition to being published in PLoS One (details below), this work was featured in Newsweek.

Authors:
Jinsung Yoon, William R. Zame, Amitava Banerjee, Martin Cadeiras, Ahmed Alaa, Mihaela van der Schaar

Abstract:
Risk prediction is crucial in many areas of medical practice, such as cardiac transplantation, but existing clinical risk-scoring methods have suboptimal performance. We develop a novel risk prediction algorithm and test its performance on the database of all patients who were registered for cardiac transplantation in the United States during 1985-2015.

We develop a new, interpretable, methodology (ToPs: Trees of Predictors) built on the principle that specific predictive (survival) models should be used for specific clusters within the patient population. ToPs discovers these specific clusters and the specific predictive model that performs best for each cluster.

In comparison with existing clinical risk scoring methods and state-of-the-art machine learning methods, our method provides significant improvements in survival predictions, both post- and pre-cardiac transplantation.

Additionally, as introduced in this page’s section on cystic fibrosis, our lab has also developed individualized prediction methods for patients on the lung transplantation waitlist with cystic fibrosis. To navigate back to that section, click here.

Personalized donor-recipient matching

Even though organ transplantation can increase life expectancy and quality of life for the recipient, the operation can entail various complications, including infection, acute and chronic rejection, and malignancy. This is a complicated risk assessment problem, since postoperative patient survival depends on different types of risk factors: recipient-related factors (e.g., cardiovascular disease severity of heart recipients), recipient-donor matching factors (e.g., weight ratio and human leukocyte antigen), race, and donor-related factors (e.g., diabetes).

Through impactful collaborations with clinicians, our lab has developed a range of methods and models that deal with the many complexities inherent in the problem of recipient-donor matching.

Working alongside Dr. Martin Cadeiras, a heart failure and heart transplant cardiologist at UC Davis, we sought an enhanced phenotypic characterization for the compatibility of patient-donor pairs through a precision medicine approach. We constructed personalized predictive models tailored to the individual traits of both the donor and the recipient to the finest possible granularity.

Authors:
Jinsung Yoon, Ahmed M. Alaa, Martin Cadeiras, Mihaela van der Schaar

Abstract:
Organ transplants can improve the life expectancy and quality of life for the recipient but carries the risk of serious post-operative complications, such as septic shock and organ rejection. The probability of a successful transplant depends in a very subtle fashion on compatibility between the donor and the recipient but current medical practice is short of domain knowledge regarding the complex nature of recipient-donor compatibility. Hence a data-driven approach for learning compatibility has the potential for significant improvements in match quality.

This paper proposes a novel system (ConfidentMatch) that is trained using data from electronic health records. ConfidentMatch predicts the success of an organ transplant (in terms of the 3 year survival rates) on the basis of clinical and demographic traits of the donor and recipient. ConfidentMatch captures the heterogeneity of the donor and recipient traits by optimally dividing the feature space into clusters and constructing different optimal predictive models to each cluster. The system controls the complexity of the learned predictive model in a way that allows for assuring more granular and confident predictions for a larger number of potential recipient-donor pairs, thereby ensuring that predictions are “personalized” and tailored to individual characteristics to the finest possible granularity.

Experiments conducted on the UNOS heart transplant dataset show the superiority of the prognostic value of ConfidentMatch to other competing benchmarks; ConfidentMatch can provide predictions of success with 95% confidence for 5,489 patients of a total population of 9,620 patients, which corresponds to 410 more patients than the most competitive benchmark algorithm (DeepBoost).

One major challenge with regard to this problem is that the matching policies underlying the observational data are driven by clinical guidelines, creating a “matching bias.” Additionally, we must also estimate transplant outcomes under counterfactual matches not observed in the data—in other words, we only have data for transplantation decisions that were made, and obviously we lack outcomes for decisions that were not made. To solve this problem, our lab joined forces with two clinical colleagues at UCLA: Dr. Maxime Cannesson (Chair, Department of Anesthesiology & Perioperative Medicine) and Dr. Brent Ershoff (Assistant Professor-In-Residence, Department of Anesthesiology). Together, we developed an approach that learns feature representations by jointly clustering donor features. These donor features are mapped into single donor types, and donor-invariant transformations are applied to recipient features to predict outcomes for a given donor-recipient instance.

Authors:
Can Xu, Ahmed Alaa, Ioana Bica, Brent Ershoff, Maxime Cannesson, Mihaela van der Schaar

Abstract:
Organ transplantation can improve life expectancy for recipients, but the probability of a successful transplant depends on the compatibility between donor and recipient features. Current medical practice relies on coarse rules for donor-recipient matching, but is short of domain knowledge regarding the complex factors underlying organ compatibility.

In this paper, we formulate the problem of learning data-driven rules for donor-recipient matching using observational data for organ allocations and transplant outcomes. This problem departs from the standard supervised learning setup in that it involves matching two feature spaces (for donors and recipients), and requires estimating transplant outcomes under counterfactual matches not observed in the data. To address this problem, we propose a model based on representation learning to predict donor-recipient compatibility—our model learns representations that cluster donor features, and applies donor-invariant transformations to recipient features to predict transplant outcomes under a given donor-recipient feature instance.

Experiments on several semi-synthetic and real-world datasets show that our model outperforms state-of-art allocation models and real-world policies executed by human experts.

The domain of organ transplantation is further complicated by the logistics of organ scarcity. Each organ is unique and high-dimensional, thus rendering outcome estimation for each (also unique) patient very difficult. Additionally, organs arrive in a stream: while a currently available organ might result in a positive outcome for a patient, future organs might have an even greater positive outcome (but we do not know which organs will become available in the future); not only are organs scarce, but organs that optimally match specific patients have varying degrees of rarity. Finally, each patient will presumably die relatively soon if not given an organ, and thus has access to only a limited number of organs.

These are the problems our lab aimed to address in creating OrganITE, an organ-to-patient assignment methodology developed in concert with Dr. Alexander Gimson, a consultant transplant hepatologist at Cambridge University Hospitals NHS Foundation Trust.

Authors:
Jeroen Berrevoets, James Jordon, Ioana Bica, Alexander Gimson, Mihaela van der Schaar

Abstract:
Transplant-organs are a scarce medical resource. The uniqueness of each organ and the patients’ heterogeneous responses to the organs present a unique and challenging machine learning problem. In this problem there are two key challenges: (i) assigning each organ “optimally” to a patient in the queue; (ii) accurately estimating the potential outcomes associated with each patient and each possible organ.

In this paper, we introduce OrganITE, an organ-to-patient assignment methodology that assigns organs based not only on its own estimates of the potential outcomes but also on organ scarcity. By modelling and accounting for organ scarcity we significantly increase total life years across the population, compared to the existing greedy approaches that simply optimise life years for the current organ available. Moreover, we propose an individualised treatment effect model capable of addressing the high dimensionality of the organ space.

We test our method on real and simulated data, resulting in as much as an additional year of life expectancy as compared to existing organ-to-patient policies.

Authors:
Jeroen Berrevoets, Ahmed M. Alaa, Zhaozhi Qian, James Jordon, Alexander Gimson, Mihaela van der Schaar

Abstract:
Organ transplantation is often the last resort for treating end-stage illnesses, but managing transplant wait-lists is challenging because of organ scarcity and the complexity of assessing donor-recipient compatibility.

In this paper, we develop a data-driven model for (real-time) organ allocation using observational data for transplant outcomes. Our model integrates a queuing-theoretic framework with unsupervised learning to cluster the organs into “organ types”, and then construct priority queues (associated with each organ type) wherein incoming patients are assigned. To reason about organ allocations, the model uses synthetic controls to infer a patient’s survival outcomes under counterfactual allocations to the different organ types{–} the model is trained end-to-end to optimise the trade-off between patient waiting time and expected survival time. The usage of synthetic controls enable patient-level interpretations of allocation decisions that can be presented and understood by clinicians.

We test our model on multiple data sets, and show that it outperforms other organ-allocation policies in terms of added life-years, and death count. Furthermore, we introduce a novel organ-allocation simulator to accurately test new policies.

Cardiovascular disease

Our lab has spent many years working alongside clinicians to research and develop new cutting-edge models and methods to transform how we diagnose and treat heart and circulatory conditions. Much of this work has been made possible through support from the British Heart Foundation and The Alan Turing Institute.

A few of our key projects in this area are listed below, but an extensive selection of papers can be found on our lab’s publications page.

Cardiovascular disease risk prediction

Identifying people at risk of cardiovascular diseases is a cornerstone of preventative cardiology. Risk prediction models currently recommended by clinical guidelines are typically based on a limited number of predictors, and yield sub-optimal performance across all patient groups. Data-driven techniques based on machine learning might improve the performance of risk predictions by agnostically discovering novel risk predictors and learning the complex interactions between them.

In a collaboration between our lab and a group of clinicians from the University of Cambridge and published in PLoS One, we used UK Biobank data to determine whether machine learning techniques could improve risk prediction compared to traditional approaches, and whether considering non-traditional variables could increase the accuracy of risk predictions.

Authors:
Ahmed M. Alaa, Thomas Bolton, Emanuele Di Angelantonio, James H. F. Rudd, Mihaela van der Schaar

Abstract:
Using data on 423,604 participants without CVD at baseline in UK Biobank, we developed a ML-based model for predicting CVD risk based on 473 available variables. Our ML-based model was derived using AutoPrognosis, an algorithmic tool that automatically selects and tunes ensembles of ML modeling pipelines (comprising data imputation, feature processing, classification and calibration algorithms).

We compared our model with a well-established risk prediction algorithm based on conventional CVD risk factors (Framingham score), a Cox proportional hazards (PH) model based on familiar risk factors (i.e, age, gender, smoking status, systolic blood pressure, history of diabetes, reception of treatments for hypertension and body mass index), and a Cox PH model based on all of the 473 available variables.

Overall, our AutoPrognosis model improved risk prediction compared to Framingham score, Cox PH model with conventional risk factors, and Cox PH model with all UK Biobank variables. Out of 4,801 CVD cases recorded within 5 years of baseline, AutoPrognosis was able to correctly predict 368 more cases compared to the Framingham score. Our AutoPrognosis model included predictors that are not usually considered in existing risk prediction models, such as the individuals’ usual walking pace and their self-reported overall health rating.

Furthermore, our model improved risk prediction in potentially relevant sub-populations, such as in individuals with history of diabetes. We also highlight the relative benefits accrued from including more information into a predictive model (information gain) as compared to the benefits of using more complex models (modeling gain).

Survival and mortality prediction studies

Survival and mortality prediction are crucial in many areas of medical practice, including cardiovascular disease, but existing clinical risk-scoring methods often yield suboptimal results.

Guided by diverse groups of clinical collaborators, our lab has created a range of powerful tools that can accurately and informatively predict survival and mortality in cardiovascular disease patients based on complex relationships learned from healthcare datasets.

One such tool (published in PLoS One and featured in Newsweek) has already been introduced in the organ transplantation section on this page (to navigate to the relevant paper, click here), but a couple other projects of particular note are provided below.

Authors:
Beatrice Ricci, Mihaela van der Schaar, Jinsung Yoon, Edina Cenko, Zorana Vasiljevic, Maria Dorobantu, Marija Zdravkovic, Sasko Kedev, Oliver Kalpak, Davor Milicic, Olivia Manfrini, Lina Badimon, and Raffaele Bugiardini

Abstract:

Introduction:
Patients with diabetes and NSTE-ACS exhibit a highly variable risk of mortality and morbidity, even when undergoing similar therapeutic strategies. Machine-learning (ML) algorithms represent a novel approach, which may give insights on outcome prediction through risk stratification.

Hypothesis:
To investigate the impact of early (≤24 hrs) PCI compared with only routine medical treatment (RMT) without PCI on outcomes in pts with NSTE-ACS and diabetes.

Methods: 
Cohort study using a population- based registry (ISACS-TC, 41 hospitals, 12 European countries) from 2010 to 2016. ML models were compared with traditional statistical methods using logistic regression combined with propensity matched analysis and inverse probability of treatment weighing of outcomes from a landmark of 24 hours from hospitalization. The primary endpoint was 30-day all-cause mortality. The secondary endpoint was the composite outcome of 30-day all-cause mortality and left ventricular dysfunction (ejection fraction<40%).

Results: 
Of 1250 NSTE-ACS first-day survivors with diabetes (median age 67 yrs (IQR 60 to 74 yrs; 59%, men), 470 (38%) received early PCI and 780 RMT. Unadjusted rates of the primary end-point were higher in the RMT group than the early PCI group (6.3%; 49 events vs. 2.5%; 12 events). After propensity-matched analysis as well as after inverse probability-of-treatment weighting, early PCI was associated with a significant reduction in the primary end-point (OR: 0.44; 95%CI: 0.21 to 0.92 and 0.49; 95%CI: 0.28 to 0.86, respectively). The critical factor for personalization with ML algorithms was age (≥ 65 yrs). The direction and magnitude of the association between early PCI and the primary end-point remained unchanged after ML personalization in the older age (OR: 0.35; 95%CI: 0.14 to 0.92). Younger age had no association with 30-day all-cause mortality. Similar results were also obtained for the secondary endpoint.

Conclusions: 
ML significantly improves accuracy of cardiovascular risk prediction in pts with diabetes hospitalized with NSTE-ACS. Pts of 65 yrs or older may benefit most from an early PCI strategy performed ≤ 24 hours after presentation. Conservative therapies may avoid unnecessary procedures in the younger pts.

Authors:
Edina Cenko, Jinsung Yoon, Sasko Kedev, Goran Stankovic, Zorana Vasiljevic, Gordana Krljanac, Oliver Kalpak, Beatrice Ricci, Davor Milicic, Olivia Manfrini, Mihaela van der Schaar, Lina Badimon, Raffaele Bugiardini

Abstract:

Importance
Previous works have shown that women hospitalized with ST-segment elevation myocardial infarction (STEMI) have higher short-term mortality rates than men. However, it is unclear if these differences persist among patients undergoing contemporary primary percutaneous coronary intervention (PCI).

Objective
To investigate whether the risk of 30-day mortality after STEMI is higher in women than men and, if so, to assess the role of age, medications, and primary PCI in this excess of risk.

Design, Setting, and Participants
From January 2010 to January 2016, a total of 8834 patients were hospitalized and received medical treatment for STEMI in 41 hospitals referring data to the International Survey of Acute Coronary Syndromes in Transitional Countries (ISACS-TC) registry.

Exposures
Demographics, baseline characteristics, clinical profile, and pharmacological treatment within 24 hours and primary PCI.

Main Outcomes and Measures
Adjusted 30-day mortality rates estimated using inverse probability of treatment weighted (IPTW) logistic regression models.

Results 
There were 2657 women with a mean (SD) age of 66.1 (11.6) years and 6177 men with a mean (SD) age of 59.9 (11.7) years included in the study. Thirty-day mortality was significantly higher for women than for men (11.6% vs 6.0%, P < .001). The gap in sex-specific mortality narrowed if restricting the analysis to men and women undergoing primary PCI (7.1% vs 3.3%, P < .001). After multivariable adjustment for comorbidities and treatment covariates, women under 60 had higher early mortality risk than men of the same age category (OR, 1.88; 95% CI, 1.04-3.26; P = .02). The risk in the subgroups aged 60 to 74 years and over 75 years was not significantly different between sexes (OR, 1.28; 95% CI, 0.88-1.88; P = .19 and OR, 1.17; 95% CI, 0.80-1.73; P = .40; respectively). After IPTW adjustment for baseline clinical covariates, the relationship among sex, age category, and 30-day mortality was similar (OR, 1.56 [95% CI, 1.05-2.3]; OR, 1.49 [95% CI, 1.15-1.92]; and OR, 1.21 [95% CI, 0.93-1.57]; respectively).

Conclusions and Relevance 
Younger age was associated with higher 30-day mortality rates in women with STEMI even after adjustment for medications, primary PCI, and other coexisting comorbidities. This difference declines after age 60 and is no longer observed in oldest women.

Partnering with clinicians to revolutionize healthcare

Every clinician has heard that AI will have (or is having) a transformative impact on healthcare. The area is so new and fast-moving, however, that almost no clinicians will have been taught about it in medical school, while coverage in medical journals is often very cursory.

Unlocking the potential of AI and machine learning for healthcare must be a truly interdisciplinary undertaking. To best meet the needs of all healthcare stakeholders, it is imperative that clinical professionals and members of the AI/machine learning community find a strong basis for mutual understanding and collaboration.

To help fill this gap, our lab created Revolutionizing Healthcare, a regular online engagement series for clinicians, in 2020. We now have roughly 400 clinicians from around the world registered to participate in these sessions.

Aims of the Revolutionizing Healthcare series

The aim of Revolutionizing Healthcare is to introduce members of the clinical community to foundational concepts related to AI, machine learning, data science, and operations research, while showing how these can play a valuable role in transforming healthcare.

In our sessions, we demonstrate specifically how real-world challenges facing clinicians can be mapped to solutions using AI/machine learning, etc., through the use of rigorous academic formalisms.

The sessions also explore the complex interdisciplinary nature of this kind of problem-solving, and our ultimate goal is, through engagement, to jointly shape a framework for understanding and planning the integration of AI/machine learning and healthcare.

Exploring machine learning for healthcare—together

Our sessions are tailor-made for the clinical community. Little or no quantitative background is required in order to participate in and benefit from Revolutionizing Healthcare. As mentioned above, practicing clinicians are the primary audience, but we are also happy to be joined by clinicians in training, support staff, hospital administrators, and a wide variety of healthcare professionals.

In each session we examine potential applications of AI/machine learning through the lens of a particular medical domain (e.g., acute care, cancer, organ transplantation, interpretability).

To combine the provision of instructional content with free-ranging discussion and exploration, we start each session with short presentations by our lab members, and then hold a live roundtable with a panel of clinicians, usually including an open Q&A session with participants.

We would encourage any clinicians hoping to learn more about machine learning for healthcare—or share their opinions and discuss new ideas—to sign up for our Revolutionizing Healthcare sessions via the URL below.

Policy impact

While the bulk of our research involves developing machine learning methods and models tailored to real-world healthcare problems, our ultimate goal is a full-fledged transformation that will create an entire ecosystem encompassing everything from (inter)national healthcare networks all the way down to individual practitioners and patients. To that end, our lab has contributed to a number of discussions regarding policies and guidelines at the highest levels.

As part of the 2019 NHS Topol Review, Mihaela van der Schaar co-chaired the Expert Advisory Panel on Artificial Intelligence and Robotics.

Additionally, Mihaela contributed a chapter to the U.K.’s 2018 Annual Report of the Chief Medical Officer, discussing how machine learning can transform medicine and healthcare.

Healthcare is data intensive, combining not only huge volumes of disparate and complex sources of data, but also complex classifications and meanings. Advances in mathematics, computing power, cloud computing and algorithm design have accelerated the development of methods that can be used to analyse, interpret and make predictions using these data sources. AI encompasses a multitude of technologies, including but not limited to analysing and discovering patterns in data.

AI has the potential to transform the delivery of healthcare in the NHS, from streamlining workflow processes to improving the accuracy of diagnosis and personalising treatment, as well as helping staff work more efficiently and effectively. With modern AI, a mix of human and artificial intelligences can be deployed across discipline boundaries to generate a greater collective intelligence.

The NHS should aim to develop a workforce able and willing to transform it into the world leader in healthcare AI and robotics. To achieve this, new opportunities must be created to recruit and retain people with the required technical skills. Significant changes to the roles and responsibilities of current and future NHS staff will also be needed.

The purpose of this chapter is to illustrate some of the progress ML has already made in healthcare and to suggest some of the progress it might make – and ought to make – in the near future. The view presented here is deliberately optimistic; for ML to have a chance to achieve this potential – as we believe it does – there must first be a vision of what is possible. Topol has discussed at length the potential of accumulating more data; our focus here is on extracting more information from that data.

We believe that ML can make a revolution in healthcare – but it will not do so by replacing any of these stakeholders, but rather by enabling and empowering them to improve the entire path of healthcare – from prevention, to diagnosis, to prognosis, to treatment, with the ultimate aim of enabling “individualised medicine”, while maintaining or even reducing costs. In particular, ML will support and complement, rather than substitute for, the judgment of medical personnel, and will also inform patients and administrators. Put differently: the purpose of ML in the medical domain is to provide intelligence – especially actionable intelligence – and decision support to all the stakeholders.

The development of ML for healthcare must be pulled by the healthcare stakeholders (from patients to policy-makers) and not just pushed by the technological community. The ML community and the medical community and stakeholders must work more as partners: the design and assembly of these building blocks must be guided by the needs of the users and supported by ML development. To accomplish this, ML must accomplish at least two things. The first is to provide building blocks – methods and algorithms – that the users can assemble for their own particular needs. The second is to make these methods and algorithms sufficiently transparent and understandable, and validated in a wide variety of contexts, that they earn the trust of the users.

Impact of previous research

While our lab’s focus is now firmly on machine learning, AI, and operations research for healthcare and medicine, Mihaela van der Schaar’s previous research achieved substantial impact in the areas of multimedia communications, compression and processing, and real-time stream mining.

While working at Philips from 1996 to 2003 (and simultaneously completing her Ph.D.), Mihaela developed both the theoretical foundations and the first practical algorithm for streaming video. Her contributions are embedded in commercial products (including the award-winning Phillips webcam) and she is personally credited as inventor on 35 U.S. patents (the majority of which are listed here), many of which are still frequently cited in other patents and and adopted in standards. Between 1999 and 2003, she was Philips’ representative to the International Standards Organization (ISO) that developed and wrote the MPEG-4 (Motion Picture Experts Group) standards for streaming video, in which she led several working groups, and to which she contributed more than 40 papers. For this contributions, she received 3 ISO awards.

Mihaela has also developed new methods for detecting, characterizing, and forecasting complex events (road traffic collisions, popularity of videos in social networks, energy supply and demand in smart-grids, etc.) based on a novel machine learning and real-time stream mining paradigm. These methods have been implemented as part of the IBM InfoSpheres Platform for a Smarter Planet.

For more information, please see Mihaela’s personal page.