Introduction to CPRD

By the end of this module, the reader will have learnt:

What is CPRD?

The Clinical Practice Research Datalink, known as CPRD, is the pre-eminent UK real-world research service supporting retrospective and prospective public health studies and interventional research.

The CPRD service is provided by the Medicines and Healthcare products Regulatory Agency (MHRA), with support from the National Institute for Health Research (NIHR), as part of the Department of Health and Social Care. CPRD is a government-funded, and not-for-profit cost recovery organisation.

We have been working with General Practitioner (GP) practices across the UK for over 30 years, building relations with a network of GP practices and Clinical Research Networks (CRNs) to provide a unique real-world evidence data resource to support public health research.

Research applications and study protocols requesting access to CPRD data have been approved from over 20 countries worldwide, producing over 3,500 peer-reviewed publications.

What does CPRD do?

CPRD offers a range of services (listed at based on the anonymised longitudinal Electronic Healthcare Record (EHR) data of patients collected from GP practices and linked to other datasets.

Provision of data

CPRD currently offers access to two primary care databases, CPRD GOLD and CPRD Aurum, containing anonymised EHR data collected from GP practices using two major software systems in the UK. More information about each database, including the data specifications describing the structure and coding system, and data resource profile publications are available at  

In addition to anonymised primary care EHR data, CPRD has developed a number of synthetic primary care datasets to understand the structure and utility of the anonymised CPRD Aurum database. More information about these synthetic datasets is available at

CPRD has established a number of standard linkages of these primary care databases to secondary care and other health and area-based datasets. These linkages enable CPRD to provide a fuller picture of the patient care record to support vital public health research, informing advances in patient safety and delivery of care. More information, including documentation describing the structure of the data, the coding systems used, and the coverage period linked to CPRD primary care data, is available at

The Digital Object Identifiers (DOIs) for all CPRD datasets, including the release notes where available, are available at

Commissioned research services 

Beyond the provision of anonymised data for public health research, CPRD offers a commissioned research service to support observational studies. CPRD’s in-house researchers have years of experience in using CPRD primary care and linked data to deliver public health research. They can provide advice on study design, code list development and patient cohort definitions, research application writing and submission, and data analysis.

The CPRD team can also be commissioned to deliver full research projects, from feasibility studies or incidence and prevalence calculations, through to full hypothesis-testing studies and collaborations with international partners. Studies may span the whole product lifecycle, from characterising disease symptoms pre-diagnosis or estimating burden of disease, through to drug utilisation and post-authorisation safety studies (PASS).

By utilising CPRD commissioned research services, clients can be assured they will be working with a team who are experts in the data and its uses, with extensive experience in the CPRD Research Data Governance (RDG) application process. This leads to efficient design of analyses and access to the data, offered at competitive rates.    

More information about the range of commissioned research services is available at If you would like to know more about these services, please contact CPRD Enquiries at and our Observational Research team will be able to discuss your requirements.

Clinical and interventional research 

CPRD also offers clinical and interventional research services based on real world EHR data and access to a potential recruitment pool of over 18 million patients registered at CPRD’s extensive network of GP practices across the UK.

Services include but are not limited to CPRD PROVE (PRoviding Online Verification of EHR), a service for verification of coded records for observational studies, to the patient recruitment service CPRD SPRINT (Speedy Patient Recruitment INto Trials) and also trial management services, CPRD can improve the efficiency of delivering phase 2 and 3 clinical trials through innovative data-driven approaches.

More information about the range of interventional research services is available at If you would like to know more about these services, please contact CPRD Enquiries at and our Interventional Research team will be able to discuss your requirements. 

What information is available in CPRD data?

In the UK, the National Health Service (NHS) offers publicly funded healthcare within each of the devolved nations – England, Northern Ireland, Scotland and Wales. GPs are the first port of call for care, providing a range of primary care services, including referrals to wider healthcare services, where necessary. Therefore, health care consultations for most individuals are recorded in the primary care EHR by GPs, including medical information vital to the patient’s care.

Over 2,400 GP practices across the UK contribute anonymised clinical data to CPRD. No patient identifier information is held by CPRD; CPRD never receives any patient identifiers from a GP practice such as patient name, address, NHS number, full date of birth or free text medical notes. Anonymised EHR data for patients who have opted out from contributing their data for research purposes will also not be held by CPRD. More information about how the patient data is safeguarded is available at  

GPs maintain comprehensive patient medical records throughout a patient’s lifetime. CPRD primary care data is longitudinal, following patients whilst they are registered with a contributing GP practice. Median follow-up time for patients currently registered with a contributing practice in the CPRD GOLD primary care database is 13 years (the latest follow-up metrics are available in the DOIs for each dataset release available at  

With GP practices contributing to CPRD from across the four UK nations, primary care data held by CPRD includes over 60 million patient lives for retrospective observational studies, with over 18 million currently registered and active patients for prospective clinical studies and trials. CPRD data are generally representative of the UK population with respect to age, gender, and ethnicity, and therefore are invaluable for public health research.

CPRD primary care data contain anonymised patient registration information and all coded care events that general practice staff record to support the ongoing clinical care and management of their patients.

Linkage of CPRD primary care data with secondary care data, disease registers, and area-level data enhances the capacity for impactful research.  For example, primary care events could be validated or supplemented with information from the Hospital Episode Statistics (HES) data, death registration data, cancer registry data, and COVID-19 datasets.  

The Data Resource Profile publications describe how CPRD GOLD and CPRD Aurum data are collected and made available for research, what type of data is available in these primary care databases recorded by GPs, and the strengths and limitations of these data – available at

How are CPRD data and services used for research?

Research using CPRD data has benefitted public health, effecting changes in clinical guidelines, and practice. CPRD services are widely used internationally by researchers in academia, industries, charities, and government.

CPRD publishes:

CPRD data can be used to investigate a range of health and research areas and a variety of research types. Examples about how CPRD data has been used and the impact of our services are described at   

What are the strengths of CPRD data?

  • Real world data – Observational data from routine clinical practice provide valuable insight into disease epidemiology, treatment, and clinical pathways as they are in the real world. The evidence generated complements that from research conducted in controlled environments such as laboratory studies or randomised controlled trials and is therefore hugely valuable to public health.
  • Breadth of data – CPRD’s network of contributing GP practices across the UK enables anonymised EHR data from millions of patients to support public health research. The nature of GP services in primary care means a wide range of information is recorded including morbidity and lifestyle factors. This is enriched by CPRD’s linkages to other healthcare datasets, enabling a variety of public health research in different disease areas to be conducted.
  • Long-term follow-up – CPRD data enables long-term follow-up of patients, due to CPRD’s network of contributing GPs and the nature of GP recording of patient medical records. The latest follow-up metrics are available in the DOI for each dataset release available at
  • Ongoing data collection – CPRD data collection is ongoing, with updates to the primary care databases on a regular basis, enabling research with recent data.
  • Representative – CPRD patients are broadly representative of the UK population with respect to age, gender, and ethnicity.
  • Linkages – CPRD has established a number of linkages to secondary and other health and area-based datasets, that enable a fuller picture of the patient pathway and outcomes.

For more information about the strengths of CPRD data, please see the documentation and Data Resource Profile publications for each primary care dataset available at, and the documentation for each linked dataset available at

What are the limitations of CPRD data?

  • Missing data – The data available relies on this having been input by a health care provider as part of their routine care for the patient. Information that has not been coded in the patient record will therefore not be available. For example, free text notes are not collected by CPRD due to the risk of re-identification, hospital discharge letters may be scanned into the patient record but not coded separately, or information about hospital prescribed medications are not automatically transferred to the patient’s primary care record. Additionally, some health information such as smoking status, BMI, or ethnicity data may only be recorded when this is relevant to the patient’s health condition, potentially creating bias in the patterns of completeness. Researchers should consider the potential for missing data when planning their research.
  • Definitions – CPRD does not provide standardised definitions for diagnoses, treatments, and health information. Code lists or algorithms to define these will need to be developed by the study team as exposure and outcome definitions will be specific to the requirements of a study. There is a risk that this may lead to inconsistent definitions (and therefore results) between studies using the same data.
  • GP IT systems and coding - CPRD GOLD and CPRD Aurum are based on data collected from practices using the Vision® and EMIS Web® software systems, respectively.  The structures and coding systems currently used within these differ. This means that data may not be directly comparable between the two databases. Research groups should ensure they understand the coding system(s) and data structure(s) relevant to their study question and the data source they are using.

For more information about the limitations of CPRD data, please see the documentation and Data Resource Profile publications for each primary care dataset available at, and the documentation for each linked dataset available at


Next module: Using CPRD primary care data 

Page last reviewed