CPRD Linked Data
Anonymised primary care data can be individually linked to secondary care and other data sets. This linkage enables CPRD to provide a fuller picture of the patient care record to support vital public health research, informing advances in patient safety and delivery of care. CPRD are expanding its healthcare data and research services to increase both the cover of primary care data and the number of data sets that are linked and made available on a routine basis to the research community.
Data linkage is carried out by NHS Digital, which is the only statutory Trusted Third Party authorised to link identifiable data in England. Provision of linkage data is only possible under appropriate governance conditions. For further information please contact the CPRD Knowledge Centre on .
Linked data sets currently available include:
- Hospital Episode Statistics (HES) Admitted Patient Care (HES APC) data
- HES Outpatient (HES OP) data
- HES Accident and Emergency (HES A&E) data
- HES Diagnostic Imaging Dataset (HES DID)
- Death Registration data from the Office for National Statistics (ONS)
- Cancer Registration data from Public Health England (PHE)
- Deprivation data: Townsend Scores/Index of Multiple Deprivation (IMD)
Availability of Linked Data
Linkage of CPRD primary care data with other patient level datasets is currently available for English practices who have consented to participate in the linkage scheme, and who use the Vision software system. Work is being undertaken to enable linkage with patients from practices using EMIS software. Each individual GP practice participating in CPRD's collection of their primary care data can choose to revoke their consent for data collection at any point.
CPRD respects all patient consent requests to opt-out. Options available in the GP EHR system allow for selection of an individual patient and for that patient to be flagged as opting out of the CPRD extract. In the event that this option is selected, the patient’s data will not be extracted for CPRD research or for data linkage. CPRD also reviews and respects clinical codes that flag patient objections to their data being used for various purposes by not collecting this data.
The latest set of linkage data, referred to as set 13, includes patients from 405 practices. These linkages cover approximately 75% of contributing CPRD GOLD practices in England, and roughly 57% of contributing CPRD GOLD practices in the UK.
Access to Linked Data
Access to patient level data is dependent on approval of a study protocol by the MHRA Independent Scientific Advisory Committee (ISAC). All required linked data sources must be requested on the application form. Additionally, researchers who are first time users of a linked dataset must contact the CPRD Research Team to discuss their requirements before submitting their application. Data are only provided by CPRD when part of a data extract linked to CPRD GOLD primary care data.
HES Admitted Patient Care (HES APC) data contains details of all admissions to, or attendances at English NHS health care providers. It includes private patients treated in NHS hospitals, patients resident outside of England and care delivered by treatment centres (including those in the independent sector) funded by the NHS. All NHS health care providers in England, including acute hospital trusts, primary care trusts and mental health trusts provide data.
Diagnostic data recorded in HES are coded using the ICD10 coding frame; procedure information is coded using the UK Office of Population, Census and Surveys classification (OPCS) 4.6.
Three levels of linked HES APC data are now available in CPRD:
- Integrated HES APC data includes the details of all diagnoses recorded per hospitalisation and the associated discharge date. Please note that no other dates are included and the primary diagnosis per hospitalisation is not flagged. Access is subject to prior ISAC approval, but data are provided free of charge.
- Basic HES APC data includes the complete set of hospital episode information (admission and discharge dates, diagnoses (identifying primary diagnosis), specialists seen under and procedures undertaken) for each linked patient with a hospitalisation record. Please note that augmented care data and maternity data are not included as standard. Access is available on a study by study basis and is subject to a charge based on the number of patients included in the dataset. Requests for access are subject to prior ISAC approval.
- Full HES APC data includes all Basic HES APC data as well as Maternity, and Augmented care data (intensive and/or high dependency levels of care). These data are available at minimal additional cost and all requests are subject to prior ISAC approval.
The latest release of HES APC data (set 13) covers the period April 1997 to February 2016. For more information, please contact the CPRD Knowledge Centre on .
HES Outpatient (HES OP) data are a collection of individual records of outpatient appointments occurring in England only. The data includes information on the type of outpatient consultation appointment dates, the main specialty and treatment specialty under which the patient was treated, referral source, waiting times, clinical diagnosis and procedures performed. HES OP data can be used to support health resource utilisation studies, clarify clinical health care pathways and enable variations in the uptake of services to be evaluated, for example by gender and age.
Access to linked HES OP data is subject to prior ISAC approval and is subject to a charge based on the number of patients included in the dataset.
The latest release of HES OP data (set 13) covers the period April 2003 to February 2016. For more information, please contact the CPRD Knowledge Centre on .
HES Accident and Emergency (HES A&E) data consists of individual records of patient care administered in the accident and emergency setting in England. These data are a subset of national A&E data collected by NHS England to monitor the national standard that 95% of patients attending A&E should wait no longer than 4 hours from arrival to admission, transfer or discharge. A&E data are submitted by A&E providers of all types in England. Data collected includes details about patients’ attendance, outcomes of attendance, waiting times, referral source, A&E diagnosis, A&E treatment (drugs prescribed not recorded), A&E investigations and Health Resource Group. HES A&E may be used to clarify the health care pathway, to quantity health resource use and costs in the emergency setting, and to assess variations in the uptake of emergency services over time.
Access to HES A&E data is subject to prior ISAC approval and is subject to a charge based on the number of patients included in the dataset.
The latest release of HES A&E data (set 13) covers the period April 2007 to February 2016. For more information, please contact the CPRD Knowledge Centre on .
The Diagnostic Imaging Dataset (DID) is a collection of detailed information about diagnostic imaging tests, such as x-rays and MRI scans, taken from NHS providers' radiological information systems. The DID includes information on imaging tests carried out from 1 April 2012 on NHS patients in England. It does not include the images that are produced as a result of these tests. The DID captures information about referral source and patient type, details of the test (type of test and body site), plus items about waiting times for each diagnostic imaging event, from time of test request through to time of reporting. The DID enables analysis of demographic and geographic variation in access to different test types and different providers.
The DID data is routinely linked to Hospital Episode Statistics (HES) through NHS Digital. This existing HES DID data set has now been linked to CPRD GOLD enabling users to analyse patient care pathways.
Access to HES DID data is subject to prior ISAC approval and is subject to a charge based on the number of patients included in the dataset.
The latest release of HES DID data (set 13) covers the period April 2012 to February 2016. For more information, please contact the CPRD Knowledge Centre on .
Death Registration data contains data from the Office for National Statistics (ONS) and includes information on the official date and causes of death (using ICD codes).
Access to ONS Death Registration data is subject to prior ISAC approval, but data are provided free of charge.
The latest release of ONS Death Registration Data (set 13) covers the period January 1998 – March 2016. For more information, please contact the CPRD Knowledge Centre on .
Cancer Registration Data contain data provided by Public Health England (PHE) via the National Cancer Registration and Analysis Service (NCRAS). The data contains a record for all registrable tumours diagnosed or treated in England, of which the NCRAS has been notified. Cancers are coded using the International Classification of Diseases for Oncology, revision 3, 2011. They are also back mapped to the tenth revision of the International Classification of Diseases version 10.
Access to Cancer Registration data is subject to prior ISAC approval. There is a fixed cost for receiving linked Cancer Registry data.
The latest release of PHE Cancer Registration Data (set 13) covers the period January 1990 – December 2014. For more information, please contact the CPRD Knowledge Centre on .
The Indices of Multiple Deprivation and Townsend Score are area based measures of relative deprivation that are available for linkage to CPRD GOLD data through the patient and/or practice postcode. These measures can be used as a proxy to socio-demographic and socio-economic data which are generally poorly recorded in the primary care data as they do not directly relate to a patient's care. Data is provided as quintiles or deciles of the deprivation score or rank to prevent disclosure of patient or practice area. The postcode of the practice or patient residence is mapped to Lower Layer Super Output Area (LSOA) using a postcode lookup file.
Patient level measures are available for patients in English practices that have consented to participate in the linkage scheme. The latest available patient postcode of residence is mapped to an LSOA boundary. The LSOA of residence then allows linkage to the following LSOA-level deprivation measures;
- 2004 English Index of Multiple Deprivation
- 2007 English Index of Multiple Deprivation
- 2010 English Index of Multiple Deprivation
- 2015 English Index of Multiple Deprivation
- Townsend score: calculated using unadjusted 2001 census data
Please note: in order to prevent the possibility of deductive disclosure of a patients’ area of residence, researchers will only be provided with one of the five available linked datasets for any one study. Access is provided by CPRD subject to ISAC approval.
The general practice-level linkage is available for all practices in CPRD and uses the general practice postcode which is linked via LSOA, or datazone (DZ) in Scotland. The data is updated monthly. As standard, the most recent national Indices of Deprivation are provided for each country;
- 2015 English Index of Multiple Deprivation
- 2010 Northern Ireland Multiple Deprivation Measure
- 2012 Scottish Index of Multiple Deprivation
- 2014 Welsh Index of Multiple Deprivation
It is important to note that indices are not comparable between countries in the UK. Older versions of the deprivation scores can be provided on request. Access is provided by CPRD subject to ISAC approval. For more information, please contact the CPRD Knowledge Centre on ..