Reinforcement Learning approach to evaluate and rank AI models that use clinical parameters to predict hospitalisation and mortality of adults with Multiple Long-Term Conditions

Study type
Date of Approval
Study reference ID
Lay Summary

People with multiple long-term conditions suffer from two or more long-term illnesses, which is a growing public health challenge as it significantly impacts health and social care systems. It can lead to poorer quality of life, make healthcare more expensive, and affects certain groups of people, mostly as we age.

Recently, Machine Learning (ML), one branch of Artificial Intelligence, has been suggested to solve some challenges when dealing with multiple long-term conditions. These challenges range from understanding higher risks of other diseases to deciding which treatments to take, as some might be in conflict, increasing adverse reactions. One problem with this approach is that there needs to be clear guidance about which of the multiple ML tools is best, as their performance is difficult to link with important health outcomes (mortality and hospitalisation).

This project aims to fill this gap by setting criteria for the evaluation of these tools. This study will evaluate different and competing ML models using a computer learning approach that can rank them based on their performance. This project will therefore have a long-term impact on reducing the public health burden while increasing the chances of using appropriate tools in the near future.

Technical Summary

Artificial Intelligence (AI) systems with machine learning (ML) predictive models can help us understand common patterns associated with multiple long-term conditions. We aim to develop evaluation criteria for competing AI/ML models utilising bandit learning (BL) algorithms for self-supervised reinforcement learning. Following models variants will be trained: logistic regression, k-nearest neighbours, random forest, gradient boosting machines, convolutional neural networks, recurrent neural networks, support vector machines, and variational autoencoders.

We target eighteen conditions: anxiety, asthma, atrial fibrillation, cancer, coronary heart disease, chronic kidney diseases, chronic obstructive pulmonary diseases, dementia, depression, diabetes, heart failure, hypertension, Parkinson’s disease, peripheral vascular diseases, schizophrenia, stroke, rheumatoid arthritis, and osteoporosis.

Study design: four cohorts of adults aged: 18-44, 45-64, 65-84, and 85+ on 30th June 2005 and followed up to date. Minimum of 2-years follow-up from index date, ideally with long-term follow-up. Population of interest (each cohort): confirmed cases of any target condition (described above).

First primary objective: define evaluation criteria for new diagnostic strategies from prediction models based on:
(EXPOSURES) individual characteristics (e.g. age, sex), biomarkers (e.g. cholesterol, creatinine), health behaviours/risk factors (e.g. smoking, alcohol use), target conditions, and socioeconomic factors (e.g. deprivation score). OUTCOMES targeted: all-cause mortality and hospitalisation. OUTCOMES will be used as performance parameters for error rate or success rate as EXPOSURES for assigning clinical parameters as rewards functions of the bandit learning evaluation, again with OUTCOMES targeted for all-cause mortality and hospitalisation. This is a two-step process, meaning candidate models use EXPOSURES to predict the OUTCOME so that the model's PREDICTED OUTCOME is taken as the OUTPUT of the model. Then to evaluate, the bandit learning compares each model's PREDICTED OUTCOME with the actual OUTCOME.

Second primary objective: the development and testing of a technical framework that will enable stakeholders to better create, deploy, and maintain scalable and explainable automatic prediction models.

Health Outcomes to be Measured

- Health Outcomes

The health outcomes used in this research will be hospitalisation and all-cause mortality.

We will first train and implement AI/ML models for predicting all-cause mortality rates and hospitalisation.

- AI/ML Outcomes

We will derive an explainability score for the reinforcement learning approach to establish the evaluation criteria by ranking of different statistical and AI/ML models to predict the above health outcomes in one or more long-term conditions.

The error rate or success rate of these models is used for the bandit learning problem, where we run different bandit learning problems for each cohort. This is done in the online setting of bandit learning, where the objective is to minimise the total expected error rate over time or, in other words, maximise the total expected success rate over time. The output of the bandit learning problem is the best ranking of AI/ML models at each time-step, which maximises the success rate. The explainability of the models would also be considered using explainability methods [1] that can produce an explainability score. This score can be used for the ranking of the bandit learning problem. Finally, at the end of the desired period (the horizon), we arrive at the optimal ranking of these algorithms for each cohort.


Clare Bankhead - Chief Investigator - University of Oxford
Sami Adnan - Corresponding Applicant - University of Oxford
Amitis Shidani - Collaborator - University of Oxford
Cynthia Wright Drakesmith - Collaborator - University of Oxford
Lei Clifton - Collaborator - University of Oxford
Madhu Vankadari - Collaborator - University of Oxford
Rafael Perera - Collaborator - University of Oxford
Robert Williams - Collaborator - University of Oxford
Subhashisa Swain - Collaborator - University of Oxford


HES Accident and Emergency;HES Admitted Patient Care;HES Outpatient;ONS Death Registration Data;Patient Level Index of Multiple Deprivation;Practice Level Rural-Urban Classification