CPRD Linked Data

Increase Font Size Decrease Font Size

Anonymised primary care data can be individually linked to secondary care and other data sets. This linkage enables CPRD to provide a fuller picture of the patient care record to support vital public health research, informing advances in patient safety and delivery of care. CPRD are expanding its healthcare data and research services to increase both the cover of primary care data and the number of data sets that are linked and made available on a routine basis to the research community.

Data linkage is carried out by NHS Digital, which is the only statutory Trusted Third Party authorised to link identifiable data in England. Provision of linkage data is only possible under appropriate governance conditions. For further information please contact CPRD Enquiries on .

Linked data sets currently available include:

Availability of Linked Data

Linkage of CPRD primary care data with other patient level datasets is currently available for English practices who have consented to participate in the linkage scheme, and who use the Vision software system. Work is being undertaken to enable linkage with patients from practices using EMIS software. Each individual GP practice participating in CPRD's collection of their primary care data can choose to revoke their consent for data collection at any point.

CPRD respects all patient consent requests to opt-out. Options available in the GP EHR system allow for selection of an individual patient and for that patient to be flagged as opting out of the CPRD extract. In the event that this option is selected, the patient’s data will not be extracted for CPRD research or for data linkage. CPRD also reviews and respects clinical codes that flag patient objections to their data being used for various purposes by not collecting this data.

The latest set of linkage data, referred to as set 14, includes patients from 407 practices. These linkages cover approximately 75% of contributing CPRD GOLD practices in England, and roughly 57% of contributing CPRD GOLD practices in the UK.

Access to Linked Data

Access to patient level data is dependent on approval of a study protocol by the MHRA Independent Scientific Advisory Committee (ISAC). All required linked data sources must be requested on the application form. Additionally, researchers who are first time users of a linked dataset must contact the CPRD Research Team to discuss their requirements before submitting their application. Data are only provided by CPRD when part of a data extract linked to CPRD GOLD primary care data.


HES Admitted Patient Care data

HES Admitted Patient Care (HES APC) data contains details of all admissions to, or attendances at English NHS health care providers. It includes private patients treated in NHS hospitals, patients resident outside of England and care delivered by treatment centres (including those in the independent sector) funded by the NHS. All NHS health care providers in England, including acute hospital trusts, primary care trusts and mental health trusts provide data.

Diagnostic data recorded in HES are coded using the ICD10 coding frame; procedure information is coded using the UK Office of Population, Census and Surveys classification (OPCS) 4.6.

Three levels of linked HES APC data are now available in CPRD:

  • Integrated HES APC data includes the details of all diagnoses recorded per hospitalisation and the associated discharge date. Please note that no other dates are included and the primary diagnosis per hospitalisation is not flagged. Access is subject to prior ISAC approval, but data are provided free of charge.
  • Basic HES APC data includes the complete set of hospital episode information (admission and discharge dates, diagnoses (identifying primary diagnosis), specialists seen under and procedures undertaken) for each linked patient with a hospitalisation record. Please note that augmented care data and maternity data are not included as standard. Access is available on a study by study basis and is subject to a charge based on the number of patients included in the dataset. Requests for access are subject to prior ISAC approval.
  • Full HES APC data includes all Basic HES APC data as well as Maternity, and Augmented care data (intensive and/or high dependency levels of care). These data are available at minimal additional cost and all requests are subject to prior ISAC approval.

The latest release of HES APC data (set 14) covers the period April 1997 to March 2016. For more information, please contact CPRD Enquiries on .


HES Outpatient data

HES Outpatient (HES OP) data are a collection of individual records of outpatient appointments occurring in England only. The data includes information on the type of outpatient consultation appointment dates, the main specialty and treatment specialty under which the patient was treated, referral source, waiting times, clinical diagnosis and procedures performed. HES OP data can be used to support health resource utilisation studies, clarify clinical health care pathways and enable variations in the uptake of services to be evaluated, for example by gender and age.

Access to linked HES OP data is subject to prior ISAC approval and is subject to a charge based on the number of patients included in the dataset.

The latest release of HES OP data (set 14) covers the period April 2003 to March 2016. For more information, please contact CPRD Enquiries on .


HES Accident and Emergency data

HES Accident and Emergency (HES A&E) data consists of individual records of patient care administered in the accident and emergency setting in England. These data are a subset of national A&E data collected by NHS England to monitor the national standard that 95% of patients attending A&E should wait no longer than 4 hours from arrival to admission, transfer or discharge. A&E data are submitted by A&E providers of all types in England. Data collected includes details about patients’ attendance, outcomes of attendance, waiting times, referral source, A&E diagnosis, A&E treatment (drugs prescribed not recorded), A&E investigations and Health Resource Group. HES A&E may be used to clarify the health care pathway, to quantity health resource use and costs in the emergency setting, and to assess variations in the uptake of emergency services over time.

Access to HES A&E data is subject to prior ISAC approval and is subject to a charge based on the number of patients included in the dataset.

The latest release of HES A&E data (set 14) covers the period April 2007 to March 2016. For more information, please contact CPRD Enquiries on .


HES Diagnostic Imaging Dataset

The Diagnostic Imaging Dataset (DID) is a collection of detailed information about diagnostic imaging tests, such as x-rays and MRI scans, taken from NHS providers' radiological information systems. The DID includes information on imaging tests carried out from 1 April 2012 on NHS patients in England. It does not include the images that are produced as a result of these tests. The DID captures information about referral source and patient type, details of the test (type of test and body site), plus items about waiting times for each diagnostic imaging event, from time of test request through to time of reporting. The DID enables analysis of demographic and geographic variation in access to different test types and different providers.

The DID data is routinely linked to Hospital Episode Statistics (HES) through NHS Digital. This existing HES DID data set has now been linked to CPRD GOLD enabling users to analyse patient care pathways.

Access to HES DID data is subject to prior ISAC approval and is subject to a charge based on the number of patients included in the dataset.

The latest release of HES DID data (set 14) covers the period April 2012 to March 2016. For more information, please contact CPRD Enquiries on .


Death Registration data

Death Registration data contains data from the Office for National Statistics (ONS) and includes information on the official date and causes of death (using ICD codes).

Access to ONS Death Registration data is subject to prior ISAC approval, but data are provided free of charge.

The latest release of ONS Death Registration Data (set 14) covers the period January 1998 – April 2017. For more information, please contact CPRD Enquiries on .


Cancer Data

Cancer data contain data provided by Public Health England (PHE) via the National Cancer Registration and Analysis Service (NCRAS). Linked NCRAS CPRD datasets include cancer registration data, the Cancer Patient Experience Survey (CPES) and Systemic Anti-Cancer Treatment (SACT) data.

Access to Cancer data is subject to prior ISAC approval. There is a fixed cost for receiving linked NCRAS data.

Cancer registration data

The data contains a record for all registrable tumours diagnosed or treated in England, of which the NCRAS has been notified. Cancers are coded using the International Classification of Diseases for Oncology, revision 3, 2011. They are also back mapped to the tenth revision of the International Classification of Diseases version 10.

The latest release of PHE Cancer Registration Data (set 14) covers the period January 1990 – December 2015. For more information, please contact CPRD Enquiries on .

Cancer Patient Experience Survey (CPES)

The data includes information from patients who have responded to the CPES about their cancer journey from their initial GP visit prior to diagnosis, through diagnosis and treatment and to the ongoing management of their cancer. Data is available for four waves of the survey conducted from 2010 to 2013. For more information, please contact CPRD Enquiries on .

Systemic Anti-Cancer Treatment (SACT) data

The SACT dataset covers chemotherapy treatment for all solid tumour and haematological malignancies, including those in clinical trials. Information is included about programme and regime of treatment, and the outcome for each treatment. In the latest linkage release (set 14) SACT data is available for patients with tumours recorded in the cancer registration data from January 2014 to December 2015. Data prior to January 2014 is also available but should be used with caution due to incomplete ascertainment during this period. For more information, please contact CPRD Enquiries on .


Mental Health Dataset (MHDS) data

The Mental Health Dataset (MHDS) is a collection of patient records of individuals who accessed secondary care adult mental health services and who are thought to be suffering from a mental illness. The data include information about the type and location of care received, different episodes of care received within a spell of illness and the events that occurred such as recording of Health of the Nation Outcome Scales (HoNOS) scores, Patient Health Questionnaire (PHQ-9) scores or diagnoses. MHDS data can be used to support research into resource utilisation and provide information about patient access to secondary mental health care services. This can be useful to understand patient pathways and consider associations between primary care and access to and outcomes recorded in secondary mental health care services. The latest release of MHDS data (set 14) covers the period April 2007 to November 2015. Due to a number of changes in the structure and variables recorded in the MHDS the data are provided in two formats. Data collected between April 2007 and March 2011 are provided in a first format and data collected between April 2011 and November 2015 are provided in a second, slightly different, format.

Access to linked MHDS data is subject to the prior approval of the Independent Scientific Advisory Committee (ISAC) and may be subject to charge.


Deprivation data

The Indices of Multiple Deprivation and Townsend Score are area based measures of relative deprivation that are available for linkage to CPRD GOLD data through the patient and/or practice postcode. These measures can be used as a proxy to socio-demographic and socio-economic data which are generally poorly recorded in the primary care data as they do not directly relate to a patient's care. Data is provided as quintiles or deciles of the deprivation score or rank to prevent disclosure of patient or practice area. The postcode of the practice or patient residence is mapped to Lower Layer Super Output Area (LSOA) using a postcode lookup file.

Patient level measures are available for patients in English practices that have consented to participate in the linkage scheme. The latest available patient postcode of residence is mapped to an LSOA boundary. The LSOA of residence then allows linkage to the following LSOA-level deprivation measures;

  • 2004 English Index of Multiple Deprivation
  • 2007 English Index of Multiple Deprivation
  • 2010 English Index of Multiple Deprivation
  • 2015 English Index of Multiple Deprivation
  • Townsend score: calculated using unadjusted 2001 census data

Please note: in order to prevent the possibility of deductive disclosure of a patients’ area of residence, researchers will only be provided with one of the five available linked datasets for any one study. Access is provided by CPRD subject to ISAC approval.

The general practice-level linkage is available for all practices in CPRD and uses the general practice postcode which is linked via LSOA, or datazone (DZ) in Scotland. The data is updated monthly. As standard, the most recent national Indices of Deprivation are provided for each country;

  • 2015 English Index of Multiple Deprivation
  • 2010 Northern Ireland Multiple Deprivation Measure
  • 2012 Scottish Index of Multiple Deprivation
  • 2014 Welsh Index of Multiple Deprivation

It is important to note that indices are not comparable between countries in the UK. Older versions of the deprivation scores can be provided on request. Access is provided by CPRD subject to ISAC approval. For more information, please contact CPRD Enquiries on ..