To be clinically and cost effective, low radiation dose computed tomography (LDCT) screening for lung cancer needs to be offered to people at high risk of the disease. This study will develop new mathematical predictive models for selection of people for LDCT screening based on primary care data and compare these with existing models and recommendations. If better models are developed, the identification of individuals who will benefit most from screening, and of those unlikely to benefit, will improve, increasing the effectiveness and cost-effectiveness of the programs likely to be approved in the UK and Europe soon. This approach maximises benefit by using electronic primary care datasets available in the UK, but it is likely that the principles will be transferrable to other countries where electronic healthcare data are available. In particular, the value of adding information on symptoms will be clarified and could lead to improved selection models that can be used in other countries. The ultimate aim will be to develop a model that can be used to create a primary care-based tool embedded in primary care systems that accurately selects patients eligible for CT screening for lung cancer, and provides an opportunity for primary care based invitation.
The overall aim of this project is to determine if primary care data can be used to develop improved risk prediction models for selecting individuals for low dose CT screening for lung cancer? Specific research objectives are:
1. To develop a series of mathematical models using data from patients one, two and three years prior to lung cancer diagnosis.
2. To examine for differences between models, including analysis of variation when stratifying lung cancer into early and late stage, pathological subtype and by route to diagnosis.
3. To compare the predictive performance in the external validation datasets and the eligibility rates between the new model(s), the PLCOM2012 and the LLPv2 models over a range of risk thresholds.
4. To assess the cost effectiveness of the models at varying risk thresholds.
Methods and data analysis:
a. A longitudinal cohort will be determined using CPRD with patients identified between 01/01/2000 until 31/12/2015 who have been contributing to CRPD for at least 12 months after their first registration. All cases with at least one Read code for lung cancer in CPRD who have research quality data will be included and the whole of CPRD with no diagnosis of lung cancer aged 40 years or older will be the comparator group (Sections K and L).
b. Patient demographic features, all symptoms, diagnoses, investigations and drug prescriptions will be classified and used to produce a risk prediction model.
c. This model will be externally validated and cost effectiveness analysis will be performed using microsimulation modelling.
d. The best performing pre-existing models will be externally validated in the CPRD cohort (namely the Liverpool Lung Project Version 2 - LLPv2 and Prostate, Lung, Colorectal and Ovarian cancer model (PLCOm2012).
Health Outcomes to be Measured:
Discrimination of the models (AUC)
Calibration of models
Decision analysis at pre-specified thresholds