Data quality

CPRD collects data from contributing practices on a daily basis and integrates this with existing data to create releases for observational research.  

Before the data is made available for research, checks are carried out covering the integrity, structure and format of the data. Issues highlighted by the checks are reviewed and addressed before data is incorporated into the data release for researchers.  

We check: 

  • the volume of data downloaded against that supplied  
  • data volumes are in the expected range 
  • all data elements received are of the correct type, length and format 

Our range of validation and quality checks include:

  • Collection-level validation ensures integrity by checking that data received from practices contain only expected data files and ensures that all data elements are of the correct type, length and format. Duplicate records are identified and removed.  
  • Transformation-level validation checks for referential integrity between records ensure that there are no orphan records included in the database (for example, that all event records link to a patient).  
  • Research-quality-level validation covers the actual content of the data. CPRD provides a patient-level data quality metric in the form of a binary ‘acceptability’ flag. This is based on recording and internal consistency of key variables including date of birth, practice registration date and transfer out date. 

In addition to checks undertaken by the CPRD teams before the data is released, researchers using the data are advised to undertake study-specific checks themselves. 

Data quality strategy

Good quality health and health-related data are essential for reliable public health research. Researchers and organisations using CPRD data for observational and interventional research must be able to rely on the data that CPRD collects, processes, and makes available being fit for purpose to conduct a wide range of research with public health benefit, including disease epidemiology, pharmacoepidemiology and drug safety studies, health care planning, policy formulation, medicines regulatory functions, and clinical studies.

In view of the importance of quality data for public health research and surveillance, CPRD has set the following strategic objectives to achieve a co-ordinated approach between CPRD and CPRD’s stakeholders to ensure consistent standards with respect to data quality for research.

  1. Develop and publish data quality standards and metadata that are proportionate, fit for purpose, and that maintain public and stakeholder trust.
  2. Implement and maintain robust data quality validation, verification, and monitoring processes to ensure that all CPRD research data is fit for purpose and made available in an appropriate time frame.
  3. Leverage partnerships with data suppliers/controllers and the CPRD research community to assure the quality of research data.
  4. Ensure internal adherence to and external awareness of the roles and responsibilities for data quality.
  5. Annually review and update data quality procedures to ensure that these maintain appropriate quality standards in data collection, processing, and release.

Useful publications on the quality of CPRD data for research   


See also  

Using CPRD primary care data  

Safeguarding patient data   

CPRD database releases and their digital object identifiers (DOIs) 

Page last reviewed