CPRD algorithm derived data

The CPRD team have created several derived data and value-added variables based on primary care data and, if relevant/available, secondary care linked data. These datasets use algorithms to bring together and consolidate related data from throughout the patient care record.

Derived datasets currently available include:

Access to algorithm derived data 

Access to patient level data is dependent on approval of a study protocol via the Research Data Governance (RDG) process. All required algorithm derived data sources must be requested on the application form. Additionally, researchers who are first time users of an algorithm-derived dataset must contact the CPRD Observational Research Team to discuss their requirements before submitting their application. Data are only provided by CPRD when part of a data extract is linked to CPRD primary care data. 

Guidance: Requesting linked data from CPRD


CPRD Mother-Baby Link

The CPRD team has developed a probabilistic mother-baby link algorithm, based on data recorded in the primary care medical record. This links likely mother-baby pairs within the CPRD GOLD and CPRD Aurum databases, based on household number plus maternity information from the mother’s primary care record, the infant’s month of birth and care records of newly registered babies.


CPRD Pregnancy Registers

The CPRD GOLD and CPRD Aurum Pregnancy Registers contain a list of all pregnancy episodes recorded in the corresponding primary care databases. The Pregnancy Registers are derived from the primary care data based on an algorithm developed for CPRD GOLD by the CPRD team and the London School of Hygiene and Tropical Medicine.

Each record within the Pregnancy Registers represents a unique pregnancy episode with a number of variables provided including details of the start and end of the pregnancy, trimester dates and the outcome of the pregnancy. There may be more than one episode per woman. In addition to this, live births in the CPRD GOLD Pregnancy Register are linked to the CPRD GOLD Mother-Baby Link so that researchers may access de-identified information on the resulting infants.

Publication: Minassian C, Williams R, Meeraus WH, Smeeth L, Campbell OMR, Thomas SL. Methods to generate and validate a Pregnancy Register in the UK Clinical Practice Research Datalink primary care database. Pharmacoepidemiol Drug Saf, Volume 28, Number 7, p.923-933 (2019). https://doi.org/10.1002/pds.4811

Publication: Campbell J, Bhaskaran K, Thomas S, Williams R, McDonald HI, Minassian C. Investigating the optimal handling of uncertain pregnancy episodes in the CPRD GOLD Pregnancy Register: a methodological study using UK primary care data. BMJ Open 2022;12:e055773. http://dx.doi.org/10.1136/bmjopen-2021-055773

Publication: Campbell, J., Shepherd, H., Welburn, S., Barnett, R., Oyinlola, J., Oues, N. and Williams, R. Methods to refine and extend a Pregnancy Register in the UK Clinical Practice Research Datalink primary care databases. Pharmacoepidemiol Drug Saf. https://doi.org/10.1002/pds.5584


CPRD Ethnicity Records

The CPRD Ethnicity Records are comprised of a single derived ethnicity category for each patient in CPRD GOLD and CPRD Aurum. The CPRD Ethnicity Records draw ethnicity data from the primary care databases and, for linkage eligible patients, Hospital Episode Statistics (HES) datasets.


Publication: Shiekh SI, Harley M, Ghosh RE, Ashworth M, Myles P, Booth HP, and Axson EL. Completeness, agreement, and representativeness of ethnicity recording in the United Kingdom’s Clinical Practice Research Datalink (CPRD) and linked Hospital Episode Statistics (HES). Popul Health Metr. 2023 Mar 14;21(1). https://doi.org/10.1186/s12963-023-00302-0

Page last reviewed