Access to linked data
Access to linked data is dependent on either an approved protocol or feasibility study application.
Following protocol approval:
- For multi-study licence holders, all requests must be submitted through the electronic Research Application Portal (eRAP). For access to linked data that is not currently covered by an existing contract, an additional data access agreement will be required. Please contact the CPRD Contracts team (enquiries@cprd.com) for further information.
- For single study datasets and studies including NCRAS SACT or RTDS data, linked data will be supplied alongside the primary care data through the CPRD dataset delivery service. The Observational Research (OR) team will be in touch separately.
- For feasibility studies, requests must be submitted through the electronic Research Application Portal (eRAP).
Types of linked data request
There are two types of linked data requests:
- Linked data required in order to finalise the study population
- Linked data required for a defined study population
For type [1] requests, applicants must complete the CPRD Request for type 1 linked data using eRAP and supply the code lists/definition of the events required in the data sources of interest, as outlined in the approved protocol. The data supplied at this stage will include only the patient pseudonym (i.e. patid), code, and date. After finalising the study population, applicants will then need to make a type [2] request for all the linked data approved in the protocol.
For type [2] requests, applicants must complete the CPRD Request for type 2 linked data using eRAP and upload the list of patients for the study population.
If this is a further request to re-deliver linked data (i.e. for a protocol that has already had linked data released), the CPRD Data Update Request webform must be submitted and approved before completing this request for linked data.
- Access: Data Update Request webform
- Download: Data Minimisation workbook v1.9 (Excel, 121KB).
The study population should be restricted to those eligible for the linked data requested.
Submitting a request for linked data
Please provide lists of codes or patients as tab-delimited text files (.txt) and compress them into a single zipped file. If the zipped file exceeds 2 MB, split your patient list or code list into smaller chunks, which can then be individually zipped and uploaded with your request. There is no restriction on the number of files you can upload onto eRAP. For particularly large patient lists, ensure the list contains only the "patid" column, as this is the only field required to process the request. This may help to reduce the file size.
Requests for linked data must be submitted by either the Chief Investigator (CI) or the Corresponding Applicant (CA).
Linked data will be provided, by secure transfer, within 10 working days of receipt of a valid, approved request. If the application is incorrectly completed or the lists of codes/patients are not in the correct format, the request will not be processed until these issues are resolved, which may affect the timelines for data delivery.
To ensure that requests are processed in an efficient and timely manner, please follow the guidance outlining the requirements, which differ by study population definition (Appendix 1), how to apply eligibility for linkage (Appendix 2) and how to prepare code lists (Appendix 3). It is the responsibility of the study team to undertake due diligence to ensure that:
- the request is complete and correct
- the delivered data is in line with the completed request
Data redelivery
CPRD operates a "one tranche" policy whereby all linked data required for a study is to be provided in a single delivery. CPRD may grant exceptions to this policy on a case-by-case basis, but redelivery fees will be payable to recover staffing resource needed to action these. The fees for data redelivery (excluding VAT) are available at www.cprd.com/pricing.
Where a type [1] request is required to finalise the study population, the one tranche policy permits one type [1] and one type [2] delivery. The data redelivery fees outlined at www.cprd.com/pricing will be payable for each type 1 and each type 2 delivery after the first.
Appendix 1: Linkage request requirements
Study population definition | CPRD Requirements | What CPRD will provide |
The study population will be based on primary care data only, but data from one or more linked data sources are required for these patients/practices. (type 2 linkage request) | Type 2 linkage request: Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2) If the study requires practice level linked data, provide the list of practices included in the study. | For study populations comprising of ≤600K patients: all variables and event records from the approved linked data sources. For study populations comprising of >600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook. |
The study population will be based on coded events from linked data only. Primary care data may be used to apply additional inclusion and exclusion criteria. (linked data will be provided in two stages) | Stage 1: type 1 linkage request: Provide the list of codes for the events of interest in the approved linked data sources (see Appendix 3) | Only the relevant events of interest and limited data variables (patient pseudonym (i.e. patid), code, and date) for the requested linked data sources, to enable finalisation of the study population. |
Stage 2: type 2 linkage request: Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2) | For study populations comprising of ≤600K patients: all variables and event records from the approved linked data sources. For study populations comprising of >600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook. | |
The study population will be based on coded events from both primary care and linked data sources. (linked data will be provided in two stages) | Stage 1: type 1 linkage request: Provide the list of codes for the events of interest in the approved linked data sources (see Appendix 3) | Only the relevant events of interest and limited data variables (patient pseudonym (i.e. patid), code, and date) for the requested linked data sources, to enable finalisation of the study population alongside the CPRD primary care data. |
Stage 2: type 2 linkage request: Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2) | For study populations comprising of ≤600K patients: all variables and event records from the approved linked data sources. For study populations comprising of >600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook. | |
The study population will be based on non-coded events from linked data e.g. hospital admission dates, dates of death, socioeconomic data. (linked data will be provided in two stages) | Stage 1: type 1 linkage request: Provide the definition for the events of interest in the approved data sources. | Only the relevant events of interest and limited data variables (patient pseudonym (i.e. patid), code / requested field, and date) for the requested linked data sources, to enable finalisation of the study population. |
Stage 2: type 2 linkage request: Finalise the study population and provide the list of patients eligible for linkage to the data sources approved in the protocol (see Appendix 2) | For study populations comprising of ≤600K patients: all variables and event records from the approved linked data sources. For study populations comprising of >600K patients: further data minimisation approaches (event/variable restriction) will be applied prior to data release, based on the completed Data Minimisation workbook. |
Appendix 2: How to apply eligibility for linkage
1. Request the following files from CPRD (enquiries@cprd.com):
- The list of patient and practice files (i.e. CPRD Denominator files) for the primary care database build that you plan to use for your study (e.g. Aurum June 2021).
- The latest linkage eligibility files (GOLD/Aurum_enhanced_eligibility_[month]_[year].txt and linkage_coverage_[month]_[year].txt).
Supporting documentation for the linked data are available from https://www.cprd.com/linked-data.
Please note that for new research studies, CPRD will only provide the latest linked data available for each approved data source. Earlier versions of linked data may be provided for ongoing studies conditional on adequate justification. Please contact the CPRD (enquiries@cprd.com) to confirm the latest version of linked data available.
2. Create a source population for the primary care database build by applying patient acceptability criteria for research and any relevant time constraints (e.g. removing patients that died before the start of your study).
3. Combine the source population from step 2 with the list of patients in the linkage eligibility file (GOLD/Aurum_enhanced_eligibility_[month]_[year].txt), excluding those patients who do not appear in both files.
4. For studies limited to those who are eligible for linkage: Refine the list of patients from step 3 to those who are eligible for linkage to the data source/s approved for your study. For example, to apply linkage eligibility for Hospital Episode Statistics (HES) Admitted Patient Care data and Office for National Statistics death registration data, you should retain those patients where variables hes_apc_e and ons_death_e are both equal to 1. These patients are eligible for linkage to both data sources and can be considered as your source population.
5. Apply any further criteria based on events in primary care then save your list of patients including the relevant linkage flags (patid, hes_apc_e, ons_death_e) as a tab delimited text file and attach this to the Linked Data Request submitted on eRAP by the CI or CA. Please ensure that this list contains all patients for whom linked data is required. Please also use the following naming convention: ‘protocol number_organisation name_patientlist.txt’ e.g. 21_100001_UniversityA_patientlist.txt.
Please note that to ensure provision of the latest available data per data source, and to honour patient opt-outs, the latest eligibility status per patient, for each requested linked data source, will be applied during the processing of a request for linked data. If an earlier source file was used to finalise the list of patients, this earlier eligibility information reflects indicative eligibility only. The linkage eligibility file reflecting patient eligibility status at the time the linkage was undertaken will be provided with the delivery of linked data, this should be used to finalise the denominator populations and associated person-time as appropriate.
Appendix 3: How to prepare code lists
Code lists should be provided to CPRD as tab delimited text files. Each code list type should be provided in a separate file and each code should appear on a new line. Please see the table below for the coding frames and coding format found in CPRD linked data sources. Please ensure that all code lists are provided in the coding format shown below to avoid delays. All code lists should be submitted together with the completed CPRD Linkage Request form to enquiries@cprd.com.
CPRD Linked Data Source | Coding Frame | Code Format | Code Example |
---|---|---|---|
ONS Death Registration Data | ICD-9 / ICD-10 | NNN NNN.N XNNN.N | 410 410.1 E953.0 |
HES Admitted Patient Care ONS Death Registration data | ICD-10 | XNN XNN.N | G00 G00.1 |
HES Outpatient data HES Accident & Emergency | ICD-10 | XNN XNNN | G00 G001 |
HES Admitted Patient Care HES Outpatient data | OPCS | XNN XNNN | Q07 Q071 |
HES Accident & Emergency | A&E diagnosis/treatment | NN NNN | 01 201 |
HES Accident & Emergency | A&E investigations | NN | 02 |
HES Diagnostic Imaging Dataset | Imaging Code - NICIP | XXXX XNXXX XXXXX XXXXXX | CART C4DAC CAAAG CCHESB |
HES Diagnostic Imaging Dataset | Imaging Code -SNOMED-CT | NN* | 10077008 1051311000000104 |
NCRAS Cancer Registration Tumour and Treatment data | ICD-9 / ICD-10 | NNN NNNN XNN XNNN | 183 1832 C54 C542 |