van der Schaar Lab

Building AI that clinicians and patients can trust – how to deal with ‘Generalization’

Human-centric AI applications are at the centre of the van der Schaar lab’s agenda. Our researchers, in cooperation with GSK, explore how data-centric and model-centric approaches can help with the Generalization challenge – how can AI systems apply their knowledge to new data outside their original training pool?

Generalization – the ability of AI systems to apply and/or extrapolate their knowledge to new data that might differ from the original training data – is a major challenge for the effective and responsible implementation of human-centric AI applications.

Now, we are exploring avenues that might deal with that challenge in patient-facing clinical applications of machine learning. Generalization in clinical AI is difficult because ML models are susceptible to overfitting, learning to spurious correlations and biases from underspecified datasets. Clinical datasets are often small, noisy, and unrepresentative, making pre-training techniques ineffective. Poor generalization can lead to unnoticed failures, posing risks to patient safety, especially for underrepresented groups.

The researchers highlight a tremendously important approach – responsible AI. Their main criterion for responsible use of ML is whether we can trust the predictions of a model. To trust model predictions, we need to identify the samples (individuals, subgroups, and features), on which the model performs well, deferring others to complementary approaches, to prevent model failures and potential harm to patients.

The paper, published in Nature Digital Medicine, suggests a number of possible solutions to the generalization challenge – mainly data-centric and model-centric methods (or a combination thereof) for selecting samples on which we can trust model predictions.

Data-centric AI methods, i.e. data curation/sculpting, aim to quantify the value and importance of individual samples and filter out samples of poor quality or including biases before model training. By removing noisy or mislabelled samples, the model training and performance are increased for the remaining data.

Model-centric methods, on the other hand, employs an additional machine learning model or the model itself to select samples on which model outputs are trustworthy. These models can be uncertainty estimation methods, model distillation, ensemble-based methods, conformal prediction, and others.

The researchers stress the need to involve a ‘human-in-the-loop’ instead of solely relying on model-centric methods in clinical application. Where impacts on individual patients are considered, AI needs human supervision still. This also holds true when it comes to ethical considerations in sample selection and model deployment. We attempt a reasoned exploration of the current ethical debate in the AI community trying to find a balance between practicality and equity, with the aim to maximize the reduction in potential harm while also recognize the issues marginalized subgroups might face.

For example, the researchers discuss the underrepresentation of males in breast cancer research and effective treatment. Currently, due to the prevalence of breast cancer and data limitations, a selective deployment of AI algorithms for women is discussed for responsible and effective use. As the generalization problem means potential risks when these models are used for men, they instead do not profit from the current state-of-the-art machine learning research. Is this the most ethical approach?

To be able to answer questions like that in the future, the authors, rather than advocating for a specific ethical recommendation, highlight the generalization challenges as an underlying technical machine learning problem. On that basis, so they hope, can the bioethical debate on selective deployment move beyond the ML and healthcare community to involve a broad range of stakeholders, including patient groups – a conversation necessary to move towards a trustworthy and reality-centric AI that offers more utility, safety, and equity.

Do you want to learn more about data-centric and reliable AI? Have a look at our NeurIPS 2023 tutorial on the topic here.

Andreas Bedorf