van der Schaar Lab


COVID-19: Our impact in short

From the very beginning of the response to COVID-19, the van der Schaar Lab was at the forefront of using AI and ML to fight the pandemic. Here are our main accomplishments:

Our Lab presented the world’s first concrete solution on how to leverage ML and AI against COVID-19 in March 2020: Responding to COVID-19 with ML

In Spring 2020, we developed the Cambridge Adjutorium for COVID-19 in partnership with the NHS as the first ML solution to concretely help with COVID-19 by forecasting the need for scarce resources in hospitals such as ICU beds and ventilators: Partnering with NHS Digital and Public Health England and our paper

The van der Schaar Lab was the first to produce worldwide research highlighting COVID-19’s impact on minorities and under-represented communities in Brazil and the UK as early as May 2020.

Analysing possible government responses, we introduced the first Policy Impact Predictor for non-pharmaceutical interventions against COVID-19.

Aside from policy, the van der Schaar Lab also leveraged its expertise to guide pharmaceutical companies on streamlining clinical trials during the pandemic in August 2020: Machine learning for clinical trials in the era of COVID-19

In conjunction with clinicians at the University of Amsterdam Medical Centre, we developed the first solution to find personalised doses of dexamethasone for patients suffering from COVID-19: our paper

Our lab and COVID-19

The van der Schaar Lab has played an active role in the academic and clinical response to the COVID-19 pandemic, including:

  • providing the world’s first concrete guidance of how machine learning & AI can help healthcare systems in the fight against Covid-19;
  • developing and implementing Cambridge Adjutorium for COVID-19, the first machine learning tool used in the UK to fight Covid-19, which allowed clinicians to predict utilization of scarce resources such as ventilators and ICU beds, and entering a partnership with the NHS for real-world use of Cambridge Adjutorium at Acute Trusts in England; 
  • conducted some of the first research and statistical analysis regarding the nature of the disease, its spread, and its disproportionate impact on certain minorities and/or disadvantaged communities world-wide;
  • creating Policy Impact Predictor (PIP), a world’s first machine learning tool developed to guide government decision-making around measures to prevent the spread of COVID-19;
  • exploring and offering the first guidance regarding the potential impact of machine learning on clinical trials during the pandemic.

Cambridge Adjutorium partnership with NHS Digital

During the height of the COVID-19 pandemic, healthcare systems around the world saw an unbelievable amount of pressure on capacity. Ventilators and ICU beds were in short supply, and the time of clinical professionals was stretched across too many patients to cover.

To help hospitals respond, our lab developed Cambridge Adjutorium, a prediction and capacity management tool capable of providing hospital-level projections of upcoming demand for ventilators and ICU beds. We successfully trained Cambridge Adjutorium using depersonalised COVID-19 patient data provided by Public Health England, and reached an agreement with the NHS for real-world use of Cambridge Adjutorium at a number of Acute Trusts in England. As a result, Cambridge Adjutorium (implemented by the NHS under the name “CPAS”) was one of the first machine learning-based systems to be deployed in hospitals on a national scale to address the COVID-19 pandemic.

We were able to develop Cambridge Adjutorium and provide it to the NHS at speed by adapting a very general automated machine learning framework called AutoPrognosis (also developed by our lab). Cambridge Adjutorium can provide aggregated predictions for hospitals, which could significantly help improve capacity planning for healthcare systems in response to COVID-19.

The system uses its underlying predictive models to provide accurate near-term projections of the likely demand on hospital resources such as ICU beds and ventilators. These projections are shown to healthcare decision-makers in an easy-to-interpret and actionable format.

In April 2020 our lab announced our partnership with NHS Digital to start trialling Cambridge Adjutorium at a number of Acute Trusts in England.

You can find our original announcement, and NHS Digital’s press release, below.

A paper summarizing the development of Cambridge Adjutorium and the subsequent CPAS partnership with NHS Digital was published in Machine Learning in November 2020.

Zhaozhi Qian, Ahmed M. Alaa, Mihaela van der Schaar

The coronavirus disease 2019 (COVID-19) global pandemic poses the threat of overwhelming healthcare systems with unprecedented demands for intensive care resources. Managing these demands cannot be effectively conducted without a nationwide collective effort that relies on data to forecast hospital demands on the national, regional, hospital and individual levels. To this end, we developed the COVID-19 Capacity Planning and Analysis System (CPAS)—a machine learning-based system for hospital resource planning that we have successfully deployed at individual hospitals and across regions in the UK in coordination with NHS Digital.

In this paper, we discuss the main challenges of deploying a machine learning-based decision support system at national scale, and explain how CPAS addresses these challenges by (1) defining the appropriate learning problem, (2) combining bottom-up and top-down analytical approaches, (3) using state-of-the-art machine learning algorithms, (4) integrating heterogeneous data sources, and (5) presenting the result with an interactive and transparent interface.

CPAS is one of the first machine learning-based systems to be deployed in hospitals on a national scale to address the COVID-19 pandemic—we conclude the paper with a summary of the lessons learned from this experience.

Machine learning for adaptive clinical trials

The COVID-19 pandemic highlighted the many difficulties weighing on the process developing, trialing, and approving vaccines within a short timeframe.

In a paper authored jointly by the van der Schaar Lab’s researchers, alongside colleagues from AstraZeneca, Novartis, and academics within the broader machine learning community, we demonstrated how machine learning methods could substantially assist in the identification, approval and distribution of treatments and vaccines for diseases like COVID-19. The paper, published in Statistics in Biopharmaceutical Research in July 2020, placed particular emphasis on three particular challenges:

  • ongoing clinical trials for non-COVID-19 drugs;
  • clinical trials for repurposing drugs to treat COVID-19; and
  • clinical trials for new drugs to treat COVID-19.

In addition to its wide-reaching impact, the paper is notable as an example of fruitful and open collaboration between members of the academic machine learning community and researchers within the pharmaceutical industry. As the paper’s conclusion notes, “Diverse quantitative communities are coming together to address the challenges of this pandemic; our hope is that they will stay together – not just for this pandemic but in the long run, which will greatly improve the conduct of clinical trials in the future.”

William R. Zame, Ioana Bica, Cong Shen, Alicia Curth, Hyun-Suk Lee, Stuart Bailey, James Weatherall, David Wright, Frank Bretz, Mihaela van der Schaar

The world is in the midst of a pandemic. We still know little about the disease COVID-19 or about the virus (SARS-CoV-2) that causes it. We do not have a vaccine or a treatment (aside from managing symptoms). We do not know if recovery from COVID-19 produces immunity, and if so for how long, hence we do not know if “herd immunity” will eventually reduce the risk or if a successful vaccine can be developed – and this knowledge may be a long time coming. In the meantime, the COVID-19 pandemic is presenting enormous challenges to medical research, and to clinical trials in particular. This paper identifies some of those challenges and suggests ways in which machine learning can help in response to those challenges. We identify three areas of challenge: ongoing clinical trials for non-COVID-19 drugs; clinical trials for repurposing drugs to treat COVID-19, and clinical trials for new drugs to treat COVID-19. Within each of these areas, we identify aspects for which we believe machine learning can provide invaluable assistance.

Policy Impact Predictor (PIP)

Policy Impact Predictor (PIP) is a machine learning tool developed to guide government decision-making around measures to prevent the spread of COVID-19.

In addition to accurately modeling COVID-19 mortality trends under current policy sets, PIP can adaptively tailor forecasts to show the potential impact of specific policy changes, such as reopening schools or workplaces, implementing mask mandates, or relaxing shelter-in-place requirements.

PIP is also able to tackle “What if?” policy questions looking into the past. For example, PIP can estimate what would have happened if Italy’s government had waited a week before imposing lockdown measures.

PIP builds on a two-layer machine learning-based compartmental model introduced in a paper by Zhaozhi Qian, Ahmed Alaa, and Mihaela van der Schaar, which was published at NeurIPS 2020.

Zhaozhi Qian, Ahmed M. Alaa, Mihaela van der Schaar

The coronavirus disease 2019 (COVID-19) global pandemic has led many countries to impose unprecedented lockdown measures in order to slow down the outbreak. Questions on whether governments have acted promptly enough, and whether lockdown measures can be lifted soon have since been central in public discourse. Data-driven models that predict COVID-19 fatalities under different lockdown policy scenarios are essential for addressing these questions, and for informing governments on future policy directions.

To this end, this paper develops a Bayesian model for predicting the effects of COVID-19 containment policies in a global context — we treat each country as a distinct data point, and exploit variations of policies across countries to learn country-specific policy effects. Our model utilizes a two-layer Gaussian process (GP) prior — the lower layer uses a compartmental SEIR (Susceptible, Exposed, Infected, Recovered) model as a prior mean function with “country-and-policy-specific” parameters that capture fatality curves under different “counterfactual” policies within each country, whereas the upper layer is shared across all countries, and learns lower-layer SEIR parameters as a function of country features and policy indicators. Our model combines the solid mechanistic foundations of SEIR models (Bayesian priors) with the flexible data-driven modeling and gradient-based optimization routines of machine learning (Bayesian posteriors) — i.e., the entire model is trained end-to-end via stochastic variational inference.

We compare the projections of our model with other models listed by the Center for Disease Control (CDC), and provide scenario analyses for various lockdown and reopening strategies highlighting their impact on COVID-19 fatalities.

Note: since the publication of the initial version of the model in June 2020, we have upgraded PIP’s functionalities by using an RNN to better capture temporal dependencies on exogenous factors such as whether conditions and mobility patterns.

Research and statistical analysis

Since the start of the COVID-19 pandemic, we have conducted wide-ranging research and statistical analysis aiming to better understand the disease and its impact on specific individuals and groups. This work has spanned multiple continents, involving a diverse group of academic and clinical collaborators and a variety of datasets.

Much of this research has been presented in leading medical journals. Several highlights are presented below.

Zhaozhi Qian, William Zame, Lucas Fleuren, Paul Elbers, Mihaela van der Schaar

Our lab’s researchers, in collaboration with collaborators from Amsterdam UMC, studied a very important, impactful, and (hitherto unsolved) clinical problem: how to administer dexamethasone treatment for COVID-19 patients in the ICU.

Modeling a system’s temporal behaviour in reaction to external stimuli is a fundamental problem in many areas. Pure Machine Learning (ML) approaches often fail in the small sample regime and cannot provide actionable insights beyond predictions. A promising modification has been to incorporate expert domain knowledge into ML models. The application we consider is predicting the patient health status and disease progression over time, where a wealth of domain knowledge is available from pharmacology. Pharmacological models describe the dynamics of carefully-chosen medically meaningful variables in terms of systems of Ordinary Differential Equations (ODEs). However, these models only describe a limited collection of variables, and these variables are often not observable in clinical environments. To close this gap, we propose the latent hybridisation model (LHM) that integrates a system of expert-designed ODEs with machine-learned Neural ODEs to fully describe the dynamics of the system and to link the expert and latent variables to observable quantities. We evaluated LHM on synthetic data as well as real-world intensive care data of COVID-19 patients. LHM consistently outperforms previous works, especially when few training samples are available such as at the beginning of the pandemic.

Ahmed Alaa, Zhaozhi Qian, Jem Rashbass, Jonathan Benger, Mihaela van der Schaar


We investigated whether the timing of hospital admission is associated with the risk of mortality for patients with COVID-19 in England, and the factors associated with a longer interval between symptom onset and hospital admission.

Retrospective observational cohort study of data collected by the COVID-19 Hospitalisation in England Surveillance System (CHESS). Data were analysed using multivariate regression analysis.

Acute hospital trusts in England that submit data to CHESS routinely.

Of 14 150 patients included in CHESS until 13 May 2020, 401 lacked a confirmed diagnosis of COVID-19 and 7666 lacked a recorded date of symptom onset. This left 6083 individuals, of whom 15 were excluded because the time between symptom onset and hospital admission exceeded 3 months. The study cohort therefore comprised 6068 unique individuals.

Main outcome measures
All-cause mortality during the study period.

Timing of hospital admission was an independent predictor of mortality following adjustment for age, sex, comorbidities, ethnicity and obesity. Each additional day between symptom onset and hospital admission was associated with a 1% increase in mortality risk (HR 1.01; p<0.005). Healthcare workers were most likely to have an increased interval between symptom onset and hospital admission, as were people from Black, Asian and minority ethnic (BAME) backgrounds, and patients with obesity.

The timing of hospital admission is associated with mortality in patients with COVID-19. Healthcare workers and individuals from a BAME background are at greater risk of later admission, which may contribute to reports of poorer outcomes in these groups. Strategies to identify and admit patients with high-risk and those showing signs of deterioration in a timely way may reduce the consequent mortality from COVID-19, and should be explored.

Zhaozhi Qian, Ahmed Alaa, Mihaela van der Schaar, Ari Ercole

The high numbers of COVID-19 patients developing severe respiratory failure have placed exceptional demands on ICU capacity around the world. Understanding the determinants of ICU mortality is important for surge planning and shared decision making.

We used early data from the COVID-19 Hospitalisation in England Surveillance System to look for factors associated with ICU outcome in the hope that information from such timely analysis may be actionable before the outbreak peak. Immunosuppressive disease, chronic cardiorespiratory/renal disease and age were key determinants of ICU mortality in a proportional hazards mixed effects model. However, variation in site-stratified random effects were comparable in magnitude, suggesting substantial between-centre variability in mortality.

Notwithstanding possible ascertainment and lead-time effects, these early results motivate comparative effectiveness research to understand the origin of such differences and optimise surge ICU provision.

Pedro Baqui, Ioana Bica, Valerio Marra, Ari Ercole, Mihaela van der Schaar

Brazil ranks second worldwide in total number of COVID-19 cases and deaths. Understanding the possible socioeconomic and ethnic health inequities is particularly important given the diverse population and fragile political and economic situation. We aimed to characterise the COVID-19 pandemic in Brazil and assess variations in mortality according to region, ethnicity, comorbidities, and symptoms.

Pedro Baqui, Valerio Marra, Ahmed M. Alaa, Ioana Bica, Ari Ercole, Mihaela van der Schaar

The COVID-19 pandemic continues to have a devastating impact on Brazil. Brazil’s social, health and economic crises are aggravated by strong societal inequities and persisting political disarray. This complex scenario motivates careful study of the clinical, socioeconomic, demographic and structural factors contributing to increased risk of mortality from SARS-CoV-2 in Brazil specifically.

We consider the Brazilian SIVEP-Gripe catalog, a very rich respiratory infection dataset which allows us to estimate the importance of several non-laboratorial and socio-geographic factors on COVID-19 mortality. We analyze the catalog using machine learning algorithms to account for likely complex interdependence between metrics. The XGBoost algorithm achieved excellent performance, producing an AUC-ROC of 0.813 (95% CI 0.810–0.817), and outperforming logistic regression.

Using our model we found that, in Brazil, socioeconomic, geographical and structural factors are more important than individual comorbidities. Particularly important factors were: The state of residence and its development index; the distance to the hospital (especially for rural and less developed areas); the level of education; hospital funding model and strain. Ethnicity is also confirmed to be more important than comorbidities but less than the aforementioned factors. In conclusion, socioeconomic and structural factors are as important as biological factors in determining the outcome of COVID-19.

This has important consequences for policy making, especially on vaccination/non-pharmacological preventative measures, hospital management and healthcare network organization.

Content related to COVID-19