Using machine learning to understand inequalities in early-onset Type 2 Diabetes and co-occurring long-term conditions

Study type
Protocol
Date of Approval
Study reference ID
23_003454
Lay Summary

Type 2 Diabetes (T2DM) is a long-term condition that usually affects people in middle age, however recently more individuals are diagnosed before the age of 40 years. This is known as early-onset T2DM. Early-onset T2DM can cause other health problems, such as heart or eye diseases and individuals living with early-onset T2DM are also more likely to experience other long term conditions, such as depression. Some groups, like women, ethnic minority groups, individuals living in socially deprived areas and those who are overweight or obese are more at risk of early-onset T2DM.

We will use use electronic healthcare records of several million people to answer questions about early-onset T2DM. Our research will use advanced statistical and computer science techniques to analyse these records and look for patterns. We want to understand what might lead to someone getting early-onset T2DM and ethnicity, age, gender and deprivation play a role. We will work closely with a public and patient involvement and engagement (PPIE) group to ensure that our research remains relevant to those impacted by it.

We expect the impact of our research will be wide-ranging, and may help improve health and social care for early- and usual- onset T2DM. The findings may support specific recommendations for people with early-onset T2DM and more personalised medical approaches.

Technical Summary

This study seeks to quantify known and unknown risk factors for developing early-onset T2DM. We will apply a Transformer model to patient EHRs that have been ordered sequentially. We will use demographics, diagnoses, medications, hospitalisations, procedures and observations associated with each patient. Transformer models are able to handle high dimensional data due to its transformer architecture, attention mechanism and large number of trainable parameters.

In order to tune the parameters, the model will be pre-trained using a large cohort (Dataset A) from CPRD and then fine-tuned using an enriched cohort (Dataset B) including individuals who have been diagnosed with T2DM before the age of 40 (see code list). The entire cohort includes up to 1.6 million acceptable patients who are registered to a GP practice between 01/01/2000 to 31/12/2019.

We will use feature importance to understand what factors are associated with an early-onset T2DM diagnosis and explore these factors across different groups, stratified by sex, ethnic group, age and IMD. We will measure the model (precision, F1, recall) stratified across gender, ethnicity, age and IMD, and apply fairness metrics to understand how bias may impact the model. Finally, we will conduct ablation studies to assess the importance of different data in the model.

We will also explore and report on missingness between both datasets, and conduct a sensitivity analysis to understand how the amount of data a patient contributes affects model performance. We will benchmark our model to logistic regression model as well as other deep learning models.

This research can potentially generate evidence to inform clinical guidelines and the use of personalised guidelines for those with early onset T2DM. The findings may indirectly benefit patients by providing support for the risk factors of early onset T2DM and inform our understanding of where inequalities arise along the diabetes care pathway.

Health Outcomes to be Measured

Primary outcome:
• Early-onset Type 2 Diabetes

Secondary outcomes:
• Long-term conditions

Collaborators

Michael Barnes - Chief Investigator - Barts and the London Queen Mary's School of Medicine and Dentistry
Elizabeth Remfry - Corresponding Applicant - Queen Mary University of London
Miriam Samuel - Collaborator - Queen Mary University of London
Rafael Henkin - Collaborator - Queen Mary University of London
Rohini Mathur - Collaborator - Queen Mary University of London
Zainab Khalid Awan - Collaborator - Barts and the London Queen Mary's School of Medicine and Dentistry

Former Collaborators

Zainab Khalid Awan - Collaborator - Barts and the London Queen Mary's School of Medicine and Dentistry

Linkages

HES Admitted Patient Care;Patient Level Index of Multiple Deprivation;Practice Level Index of Multiple Deprivation