Each new database must be evaluated to be sure that the data are of sufficiently high quality to be useful in medical research. The data generated by general practitioners (GPs) who keep their patients' medical records on computers can be very useful for medical research, but the quality of the data must be demonstrated first. Without evidence of data quality, the use of data collected primarily for patient care in research may be scientifically questionable. Since the data in the CPRD Aurum database were not originally designed to be used for research it is particularly important to assess their quality and completeness. We propose a thorough data quality assessment study using multiple strategies to evaluate the different types of data collected in the database before using them for research purposes. We will compare information in CPRD Aurum to information in a hospital database to see if the data in Aurum are complete. We will also look to see if medications recorded in the electronic record match the diagnoses recorded by the GPs and if the diagnoses are supported by the treatments they receive.
We will use several data quality assessment techniques to assess the quality and completeness of the CPRD Aurum data. We propose a number of exercises based on published recommendations to assess the quality of the newly available CPRD Aurum data including:
- Comparison with a gold standard, comparing hospitalisation-related data in CPRD Aurum to linked HES Admitted Patient Care records.
- data element agreement and validity check methods involving several drug /lab value and drug /disease pairings to look for consistency
- data source agreement method: to assess completeness, correctness, concordance, and plausibility of breast cancer diagnoses in the CPRD Aurum data by comparisons with previously published findings from CPRD GOLD.
- element presence (to understand availability and potential bias of key covariates): We will calculate the number of body mass index (BMI), smoking, blood pressure (BP) records per patient by practice and restricted to patients with cardiovascular disease (CVD) in each practice (a subset of patients who should have more recordings of each of these variables). We will provide the mean, median and mode for each indicator for all patients in a practice vs people with CVD.
- data consistency over time: Total number of prescriptions and diagnoses by practice, by month or quarter.
Health Outcomes to be Measured:
- Type II diabetes
- Breast cancer
- Pulmonary embolism
- Benign prostatic hyperplasia
- Myocardial infarction
- Rheumatoid arthritis
- Breast cancer and all cancers