CPRD linked data

Anonymised primary care patient data can be individually linked to secondary care and other health and area-based datasets. This linkage enables CPRD to provide a fuller picture of the patient care record to support vital public health research, informing advances in patient safety and delivery of care. CPRD are expanding its healthcare data and research services to increase both the cover of primary care data and the number of data sets that are linked and made available on a routine basis to the research community.

Data linkage in England is carried out by the Trusted Third Party NHS Digital. For further information please contact CPRD Enquiries at enquiries@cprd.com 

Linked data sets currently available include:

Availability of linked data 

Linkage of CPRD primary care data with other patient level datasets is available for English practices who have consented to participate in the linkage scheme. Each individual GP practice participating in CPRD's collection of their primary care data can choose to revoke their consent for data collection at any point.

CPRD respects all patient consent requests to opt-out. Options available in the GP EHR system allow for selection of an individual patient and for that patient to be flagged as opting out of the CPRD extract. In the event that this option is selected, the patient’s data will not be extracted for CPRD research or for data linkage. CPRD also reviews and respects clinical codes that flag patient objections to their data being used for various purposes by not collecting this data.

The latest set of linkage data, referred to as set 16, is available for both CPRD GOLD, based on the Vision software system, and CPRD Aurum, based on EMIS software.

CPRD GOLD linkage data include patients from 411 practices. These linkages cover approximately 75% of contributing CPRD GOLD practices in the June 2018 build and located in England, and roughly 56% of contributing CPRD GOLD practices in the UK. 10,553,586 patients are eligible for linkage.

CPRD Aurum linkage data include patients from 232 practices. These linkages cover approximately 43% of CPRD Aurum practices available in the June 2018 build, all of which are in England. Additional CPRD Aurum practices will be added to the linkage scheme with each set release once onboarded by CPRD. 6,566,869 patients are currently eligible for linkage.

Access to linked data 

Access to patient level data is dependent on approval of a study protocol by the Independent Scientific Advisory Committee (ISAC). All required linked data sources must be requested on the application form. Additionally, researchers who are first time users of a linked dataset must contact the CPRD Observational Research Team to discuss their requirements before submitting their application. Data are only provided by CPRD when part of a data extract linked to CPRD primary care data.


HES Admitted Patient Care data

HES Admitted Patient Care (HES APC) data contains details of all admissions to, or attendances at English NHS healthcare providers. It includes private patients treated in NHS hospitals, patients resident outside of England and care delivered by treatment centres (including those in the independent sector) funded by the NHS. All NHS healthcare providers in England, including acute hospital trusts, primary care trusts and mental health trusts provide data.

Diagnostic data recorded in HES are coded using the International Classification of Diseases version 10 (ICD10) coding frame; procedure information is coded using the UK Office of Population, Census and Surveys classification (OPCS) 4.6.

Three levels of linked HES APC data are available in CPRD:

The latest release of HES APC data (set 16) covers the period April 1997 to December 2017. 

HES Outpatient data

HES Outpatient (HES OP) data are a collection of individual records of outpatient appointments occurring in England only. The data includes information on the type of outpatient consultation appointment dates, the main specialty and treatment specialty under which the patient was treated, referral source, waiting times, clinical diagnosis and procedures performed. HES OP data can be used to support health resource utilisation studies, clarify clinical health care pathways and enable variations in the uptake of services to be evaluated, for example by gender and age.

Access to linked HES OP data is subject to prior ISAC approval.

The latest release of HES OP data (set 16) covers the period April 2003 to December 2017. 

HES Accident and Emergency data

HES Accident and Emergency (HES A&E) data consists of individual records of patient care administered in the accident and emergency setting in England. These data are a subset of national A&E data collected by NHS England to monitor the national standard that 95% of patients attending A&E should wait no longer than 4 hours from arrival to admission, transfer or discharge. A&E data are submitted by A&E providers of all types in England. Data collected includes details about patients’ attendance, outcomes of attendance, waiting times, referral source, A&E diagnosis, A&E treatment (drugs prescribed not recorded), A&E investigations and Health Resource Group. HES A&E may be used to clarify the health care pathway, to quantity health resource use and costs in the emergency setting, and to assess variations in the uptake of emergency services over time.

Access to HES A&E data is subject to prior ISAC approval.

The latest release of HES A&E data (set 16) covers the period April 2007 to December 2017. 

HES Diagnostic Imaging Dataset

The Diagnostic Imaging Dataset (DID) is a collection of detailed information about diagnostic imaging tests, such as x-rays and MRI scans, taken from NHS providers' radiological information systems. The DID includes information on imaging tests carried out from 1 April 2012 on NHS patients in England. It does not include the images that are produced as a result of these tests. The DID captures information about referral source and patient type, details of the test (type of test and body site), plus items about waiting times for each diagnostic imaging event, from time of test request through to time of reporting. The DID enables analysis of demographic and geographic variation in access to different test types and different providers.

The DID is routinely linked to Hospital Episode Statistics (HES) through NHS Digital. This existing HES DID dataset has now been linked to CPRD primary care data enabling users to analyse patient care pathways. Access to HES DID data is subject to prior ISAC approval.

The latest release of HES DID data (set 16) covers the period April 2012 to October 2017.  

Death Registration data

Death Registration data contains data from the Office for National Statistics (ONS) and includes information on the official date and causes of death (using ICD codes).

Access to ONS Death Registration data is subject to prior ISAC approval.

The latest release of ONS Death Registration Data (set 16) covers the period January 1998 to February 2018. 

Cancer data

Cancer data contain data provided by Public Health England (PHE) via the National Cancer Registration and Analysis Service (NCRAS). Linked NCRAS CPRD datasets include Cancer Registration data, the Systemic Anti-Cancer Treatment (SACT) Dataset, the National Radiotherapy Dataset (RTDS) and the Cancer Patient Experience Survey (CPES).

Access to cancer data is subject to prior ISAC approval. 

Cancer registration data

The data contains a record for each registrable tumour diagnosed or treated in England, of which the NCRAS has been notified. Cancers are coded using the International Classification of Diseases for Oncology, revision 3, 2011. They are also back mapped to the tenth revision of the International Classification of Diseases version 10.

The latest release of PHE cancer registration data (set 16) covers the period January 1990 – December 2015. 

Systemic Anti-Cancer Treatment (SACT) data

The SACT dataset covers chemotherapy treatment for all solid tumour and haematological malignancies, including those in clinical trials. Information is included about programme and regime of treatment, and the outcome for each treatment. In the latest linkage release (set 16) SACT data is available for patients with tumours recorded in the cancer registration data from January 2014 to December 2015. Data prior to January 2014 is also available but should be used with caution due to incomplete ascertainment during this period. 

National Radiotherapy Dataset (RTDS)

The RTDS dataset contains records of radiotherapy services provided since April 2009, including teletherapy and brachytherapy. All radiotherapy delivered in England to patients in NHS facilities, or in private facilities where delivery was funded by the NHS, is included. Brachytherapy delivered for the treatment of non-malignant disease, radiotherapy delivered using unsealed sources, and non-therapeutic exposures delivered using radiotherapy machines (e.g. imaging) are not included. In the latest linkage release (set 16) RTDS data is available for patients with tumours recorded in the cancer registration data from April 2009 to December 2015.

Cancer Patient Experience Survey (CPES)

The data include information from patients who have responded to the CPES about their cancer journey from their initial GP visit prior to diagnosis, through diagnosis and treatment and to the ongoing management of their cancer. Data is available for four waves of the survey conducted from 2010 to 2013. 

Mental Health Dataset (MHDS)

The Mental Health Dataset (MHDS) is a collection of patient records of individuals who accessed secondary care adult mental health services and who are thought to be suffering from a mental illness. The data include information about the type and location of care received, different episodes of care received within a spell of illness and the events that occurred such as recording of Health of the Nation Outcome Scales (HoNOS) scores, Patient Health Questionnaire (PHQ-9) scores or diagnoses. MHDS data can be used to support research into resource utilisation and provide information about patient access to secondary mental health care services. This can be useful to understand patient pathways and consider associations between primary care and access to and outcomes recorded in secondary mental health care services.

Access to linked MHDS data is subject to the prior approval of the Independent Scientific Advisory Committee (ISAC).

The latest release of MHDS data (set 16) covers the period April 2007 to November 2015. Due to a number of changes in the structure and variables recorded in the MHDS the data are provided in two formats. Data collected between April 2007 and March 2011 are provided in a first format and data collected between April 2011 and November 2015 are provided in a second, slightly different, format. 

Small area level data

Classifications based on the population characteristics of small areas or neighbourhoods (and the individuals who live there) are available for linkage to CPRD primary care data. CPRD has linked GP practice postcodes and eligible patient residence postcodes for both CPRD GOLD and CPRD Aurum to some of the most commonly requested area level data. This includes several measures of area level deprivation and a rural-urban classification. These measures can be used as a proxy for socio-demographic and socio-economic data which are generally poorly recorded in the primary care data given they do not directly relate to a patient's care.

For each measure the postcode of the practice or patient residence is mapped to lower layer Super Output Area (LSOA), SOA in Northern Ireland or datazone (DZ) in Scotland using a postcode lookup file.

Patient postcode linked measures

Patient postcode linked measures are available for patients in English practices that have consented to participate in the linkage scheme. The latest available patient postcode of residence is mapped to an LSOA boundary. The LSOA of residence then allows linkage to the following LSOA-level deprivation measures;

  • 2004 English Index of Multiple Deprivation
  • 2007 English Index of Multiple Deprivation
  • 2010 English Index of Multiple Deprivation
  • 2015 English Index of Multiple Deprivation
  • Townsend Deprivation Index: calculated using unadjusted 2001 census data

Data is provided as quintiles, deciles or twentiles of the deprivation score to prevent disclosure of patient location. In order to prevent the possibility of deductive disclosure of a patients’ area of residence, researchers will additionally only be provided with one of the five available linked datasets for any one study. Access is provided by CPRD subject to ISAC approval.

Practice postcode linked measures

The general practice postcode linkages are available for all practices in CPRD GOLD and CPRD Aurum and use the general practice postcode which is linked via LSOA, SOA in Northern Ireland and datazone (DZ) in Scotland. The general practice postcode linkage includes several measures of area level deprivation and a rural-urban classification. The data are updated monthly.

Measures of deprivation

There are several well-known area-based measures of deprivation, of which two are available at the LSOA level for linkage to CPRD primary care data through the practice postcode. These measures are:

  • 2015 English Index of Multiple Deprivation (composite and individual domains)
  • 2016 Scottish Index of Multiple Deprivation (composite and individual domains)
  • 2017 Northern Ireland Index of Multiple Deprivation (composite and individual domains)
  • 2014 Welsh Index of Multiple Deprivation (composite and individual domains)
  • Carstairs Index: England, Wales and Scotland calculated using 2011 census data

As standard, the most recent national Indices of Deprivation are provided for each country. It is important to note that the IMD indices are not comparable between countries in the UK. Older versions of the deprivation scores can be provided on request.

Rural-Urban classification

It may be important to distinguish between rural and urban areas when investigating differences in social and economic characteristics of small areas. Populations can vary in their composition between urban and rural areas, as can access to services, employment and educational opportunities, and quality of life. The measures available are:

  • 2011 England and Wales Rural-Urban classification
  • 2015 Northern Ireland Rural-Urban classification
  • 2016 Scottish Rural-Urban classification

Access is provided by CPRD subject to ISAC approval.

For more information about data linkage and prices please contact CPRD Enquiries on enquiries@cprd.com

[Page last reviewed 11 December 2018]