CPRD collects data from contributing practices on a daily basis and integrates this with existing data to create releases for observational research.
Before the data is made available for research, checks are carried out covering the integrity, structure and format of the data. Issues highlighted by the checks are reviewed and addressed before data is incorporated into the data release for researchers.
We check:
- the volume of data downloaded against that supplied
- data volumes are in the expected range
- all data elements received are of the correct type, length and format
Our range of validation and quality checks include:
- Collection-level validation ensures integrity by checking that data received from practices contain only expected data files and ensures that all data elements are of the correct type, length and format. Duplicate records are identified and removed.
- Transformation-level validation checks for referential integrity between records ensure that there are no orphan records included in the database (for example, that all event records link to a patient).
- Research-quality-level validation covers the actual content of the data. CPRD provides a patient-level data quality metric in the form of a binary ‘acceptability’ flag. This is based on recording and internal consistency of key variables including date of birth, practice registration date and transfer out date.
In addition to checks undertaken by the CPRD teams before the data is released, researchers using the data are advised to undertake study-specific checks themselves.
Data quality strategy
Good quality health and health-related data are essential for reliable public health research. Researchers and organisations using CPRD data for observational and interventional research must be able to rely on the data that CPRD collects, processes, and makes available being fit for purpose to conduct a wide range of research with public health benefit, including disease epidemiology, pharmacoepidemiology and drug safety studies, health care planning, policy formulation, medicines regulatory functions, and clinical studies.
In view of the importance of quality data for public health research and surveillance, CPRD has set the following strategic objectives to achieve a co-ordinated approach between CPRD and CPRD’s stakeholders to ensure consistent standards with respect to data quality for research.
- Develop and publish data quality standards and metadata that are proportionate, fit for purpose, and that maintain public and stakeholder trust.
- Implement and maintain robust data quality validation, verification, and monitoring processes to ensure that all CPRD research data is fit for purpose and made available in an appropriate time frame.
- Leverage partnerships with data suppliers/controllers and the CPRD research community to assure the quality of research data.
- Ensure internal adherence to and external awareness of the roles and responsibilities for data quality.
- Annually review and update data quality procedures to ensure that these maintain appropriate quality standards in data collection, processing, and release.
Useful publications on the quality of CPRD data for research
- Publication: Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, Smeeth L. Data Resource Profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol. 2015 Jun 6;44(3):827–36.
- Publication: Wolf A, Dedman D, Campbell J, Booth H, Lunn D, Chapman J, Myles P. Data resource profile: Clinical Practice Research Datalink (CPRD) Aurum. Int J Epidemiol. 2019 Dec 1;48(6):1740-1740g. doi: 10.1093/ije/dyz034.
- Publication: Jick SS, Hagberg KW, Persson R, Vasilakis-Scaramozza C, Williams T, Crellin E, Myles P. Quality and completeness of diagnoses recorded in the new CPRD Aurum Database: evaluation of pulmonary embolism. Pharmacoepidemiol Drug Saf. 2020 Sep;29(9):1134-1140. doi: 10.1002/pds.4996.
- Publication: Persson R, Vasilakis-Scaramozza C, Hagberg KW, Sponholtz T, Williams T, Myles P, Jick SS. CPRD Aurum database: Assessment of data quality and completeness of three important comorbidities. Pharmacoepidemiol Drug Saf. 2020 Nov;29(11):1456-1464. doi: 10.1002/pds.5135.
- Publication: Persson R, Hagberg KW, Vasilakis-Scaramozza C, Yelland E, Williams T, Myles P, Jick SS. Presence of Codes for Indication for Use in Clinical Practice Research Datalink Aurum: An Assessment of Benign Prostatic Hyperplasia Treatments. Clin Epidemiol, 14, 641–652. doi: 10.2147/clep.s360843.
- Publication: Persson R, Sponholtz T, Vasilakis-Scaramozza C, Hagberg KW, Williams T, Kotecha D, Myles P, Jick SS. Quality and Completeness of Myocardial Infarction Recording in Clinical Practice Research Datalink Aurum. Clin Epidemiol, 13, 745–753. doi: 10.2147/clep.s319245.
- Publication: Vasilakis-Scaramozza C, Hagberg KW, Persson R, Yelland E, Williams T, Myles P, Jick SS. Quality of rheumatoid arthritis recording in United Kingdom Clinical Practice Research Datalink Aurum. Pharmacoepidemiol Drug Saf. doi: 10.1002/pds.5551.
See also
CPRD database releases and their digital object identifiers (DOIs)