CPRD COVID-19 symptoms and risk factors synthetic dataset December 2021

Release date

Citation: Clinical Practice Research Datalink. (2021). CPRD COVID-19 symptoms and risk factors synthetic dataset December 2021 (Version 2021.12.001) [Data set]. Clinical Practice Research Datalink. https://doi.org/10.48329/YK2N-SZ66


This synthetic dataset is based on anonymised real primary care patient data extracted from the CPRD Aurum database. The dataset focuses on patients presenting to primary care with symptoms indicative of COVID-19 (confirmed/suspected COVID-19) and control patients with negative COVID-19 test results. The dataset includes data on sociodemographic and clinical risk factors.

The development of this dataset was funded by NHSX using the synthetic data generation and evaluation framework developed under a grant from the Regulators’ Pioneer Fund launched by The Department for Business, Energy and Industrial Strategy (BEIS) and managed by Innovate UK.

The dataset includes:
Total patients: 4,173,000
Patients with negative result test: 3,436,379
Patients with confirmed/suspected COVID-19: 736,621

Further information is available at https://www.cprd.com/content/synthetic-data

Please contact enquiries@cprd.com for further information or if you have any questions.