Key results
In this work, we defined and benchmark results from three key variables related to COVID-19 research using CPRD: index COVID-19 diagnoses, COVID-19 vaccinations and persons at high risk of severe disease.
We identified 2 271 072 COVID-19 cases in CPRD Aurum between 1 August 2020 and 31 January 2022. Younger age and lower socioeconomic deprivation have been consistently associated with reductions in COVID-19 incidence and severity.27 28 These factors may explain why this CPRD cohort, which proportionally under-represented persons age 65 and older and over-represented persons living in the regions with higher median total household wealth, captured 15% of COVID-19 cases in a database that covers 24% of persons in England. The requirement for an NHS number in order for results to be shared may explain some of this attrition as well. Future work using CPRD for COVID-19 research will need to consider these limitations of under ascertainment of cases. Moving beyond this study’s time period, the transition to at home testing, as well as the end of free PCR testing for the general public on 1 April 2022, will need to be additionally considered.
This manuscript reports results from a case definition of confirmed and current infection. We did not include codes for immunoglobulin titres, as measurable antibodies indicate a resolved infection rather than date of onset. We did not include codes indicating a sequela of prior infection, as these most often occur on a later date than index diagnosis. We did not include codes indicating a test without a result, as people with a negative test result should not be included in a COVID-19 case definition. Our results, therefore, identified fewer cases, although with greater specificity, than other studies in published literature that allow for such heterogeneity.29
COVID-19 vaccination events were well captured in the CPRD. This stands in stark contrast to most administrative claims and EHR databases in the USA, where less than 50% of COVID-19 vaccines are recorded in comparison to estimates provided by the Centers for Disease Control and Prevention.30 31 England’s national healthcare system, as well as the NHS data infrastructure to facilitate capture of COVID-19-related events and long-standing structure of GPs as the central node in a person’s healthcare coordination, enabled the high coverage of COVID-19 vaccination records. Researchers can be more confident with CPRD data that the absence of a vaccination record indicates unvaccinated status than they otherwise would be with most other real-world datasets, which is a critical consideration for studies related to COVID-19 disease burden, vaccination or treatments.
The proportion of persons who had completed primary series vaccination prior to infection was low among primary care cases. Notably, among hospitalised cases, no patients had completed a primary COVID-19 vaccination series. These findings may be explained by several factors. First, the COVID-19 vaccine was first made available in England on 8 December 2020, and initially, second doses were given up to 12 weeks later to maximise limited supply for as many people as possible. Therefore, the calendar period under study allowed for most persons to have had a COVID-19 diagnosis in periods at which ‘full vaccination’ was not achievable. Second, it is possible that one vaccine offered protection against severe illness.32–34
We operationalised a total of over 28 000 codes, from an initial set of nearly 50 000, for three definitions of persons at risk of severe disease. We have made the search terms available, for reproducibility, as well as the resulting code lists, for other research groups to implement in their work. After the completion of this work, NHS Digital published a code list for ‘Targeted Conditions’, which includes each element in the NHS Highest Risk category. Among these, the NHS code list can be repurposed for 4 of the 14 conditions in PANORAMIC criteria and 3 of the 11 conditions in UKHSA Clinical Risk. To our knowledge, we offer the first publication of code lists for capture of all elements in the PANORAMIC criteria as well as UKHSA Clinical Risk criteria for these high-risk definitions, which can now be readily used in datasets that contain CPRD medical and product, ICD-10 and OPCS Classification of Interventions and Procedures codes.
While there are similarities between the three definitions, differences do exist. In the example of renal disease, NHS Highest Risk is defined as chronic kidney disease stage 4 or 5, PANORAMIC trial criteria stipulate stage 2 or 3 and UKHSA Clinical Risk are for stages 3–5. PANORAMIC trial eligibility capture persons with mild renal disease, as some antiviral treatments are not approved for use in persons with severe renal disease. However, UKHSA Clinical Risk prioritised vaccination access for persons at highest risk of disease, which would include persons with more advanced renal disease. Notably, persons who have renal disease as defined by PANORAMIC trial criteria by definition do not have renal disease by NHS definition, and people in each of these may (or may not) meet UKHSA prioritisation. The choice of which high-risk definition to implement in future studies will need to be guided by the study population and research question.
In the primary care cases, persons at high risk were older, more often smokers, had larger body sizes and higher Charlson Comorbidity Indices from the overall group of primary care cases. Among hospitalised cases, the high-risk groups were similar to the entire hospitalised group. These findings are consistent with existing understanding of high-risk definitions, and perhaps provide reassurance that the code lists measure the purported phenomenon.
The study periods in this report represent the most recent data available from CPRD as of 3 February 2023. During the Autumn of 2022, the Aurum database experienced data quality issues related to the EMIS data flows from legacy systems, and no primary care data have been made available to researchers since the May 2022 release (data through March 2022, with some early view of April 2022). Separately, HES secondary care data have not been updated since March 2021, as NHS Digital has undergone a change in the way they process and link data. COVID-19 remains a serious disease for some people, and it is certain that some of the 1.7 million persons diagnosed with COVID-19 after 1 April 2021 would have been later admitted for COVID-19, but we do not have the hospital admission data to distinguish them from those whose cases were managed entirely in the community setting. Throughout, we have used the term ‘primary care records’ as the combined groups of those known to be non-hospitalised (cases where HES data were available, but the person was not hospitalised), as well as those whom we have GP encounters for but unknown eventual hospital admission status. It is difficult to approximate the number of hospital admissions that would be expected with full data availability. Carrying forward the 2% hospital admission incidence seen in the early pandemic period may not be appropriate, given 2021 introduced periods of increased (delta variant) and decreased (omicron variant) risk of hospital admission, as well the uptake of COVID-19 vaccinations and antiviral treatments. Finally, the population structure of this cohort outlined in this work further challenge the direct application of national estimates to CPRD cohorts.
This study does not capture persons not under GP care such as prisoners, some residential homes and persons without a place of residence. Additionally, CPRD Aurum, when linked with HES data, reduces the population to persons registered at eligible GP practices in England, and therefore, may not represent persons in other countries in the UK or countries outside the UK. This study does not include persons who presented directly to hospital without any prior GP interaction. In particular, persons with more severe disease such as older adults may require immediate hospital admission before seeking primary care, which could explain some of the gaps in representation we have reported. Given that CPRD is a primary care database, and the limited time period of hospital data availability, we decided to design our study as an initial cohort of persons with primary care records of COVID-19. Studies looking for complete capture of all hospitalised COVID-19 patients might consider other data sources.
This post was originally published on https://bmjopen.bmj.com