Cohort profile: the Johns Hopkins COVID Long Study (JHCLS)–a US nationwide prospective cohort study

Introduction

Since its emergence in 2019, COVID-19 has greatly affected the health and well-being of millions of people worldwide.1 2 Both acute and persistent post-infection complications have been reported by patients3 and COVID-19 is now recognised as a multiorgan disease.4 The WHO defines persistent post-infection complications, referred to as long COVID, as new or continuing symptoms 3 months after initial illness that last at least 2 months and cannot be explained otherwise.5 Despite recent studies suggesting that long COVID may occur in 10–55% of individuals exposed to SARS-CoV-2,6–8 the exact incidence remains unknown. There is also uncertainty in the pathophysiology and symptomatology of long COVID.9 10 With the elevated burden of COVID-19 worldwide, it is important to understand the full range of symptoms and long-term outcomes.2 Moreover, the large number of individuals requiring continued medical care will pose an economic burden on our healthcare system.10

Similar to other infections, SARS-CoV-2 is associated with post-acute infection syndromes resulting in a variety of symptoms.11 Some of the core symptoms associated with long COVID are common to other post-acute infection syndromes as well, including but not limited to fatigue, exertion intolerance and neurocognitive impairment.11 Despite our general knowledge of the occurrence of post-acute infection syndromes, it is largely understudied. Cohort studies composed of those who have had SARS-CoV-2 and those who have not (ie, control population) are critical to understanding the gaps in our knowledge of long COVID and post-acute infections in general.

The presentation of those with long COVID is often marked with multiple diverse symptoms affecting multiple organs; each individual may have their own unique clinical presentation.12 Though age is a major risk factor in COVID-19-related mortality, and despite a preponderance of long COVID among those aged 40–60 years, long COVID is reported across the age spectrum.13 Similarly, long COVID is reported by persons of all genders, race/ethnicities, and those with and without pre-existing comorbidities.13 14 Hence, it is essential that research both identifies and characterises the main clinical and epidemiological features associated with long COVID, including potential targets for intervention.

For these reasons, the Johns Hopkins COVID Long Study (JHCLS) was established to prospectively examine the short-term and long-term consequences of COVID-19 over a 3-year follow-up period. The overall objectives of the JHCLS are to (1) characterise the spectrum of long-term sequelae of SARS-CoV-2 infection; (2) identify individuals at risk of long-term sequelae; and (3) characterise the physical and mental health disability associated with long COVID. To meet study objectives, the cohort includes participants with and without a history of SARS-CoV-2 infection. This cohort profile describes baseline demographic and clinical characteristics of US participants enrolled in the JHCLS.

Cohort description

Study design and participants

The JHCLS launched for participants with a self-reported history of SARS-CoV-2 infection on 2 February 2021, and expanded to include participants without a history of SARS-CoV-2 infection on 2 March 2022. All consenting participants are asked to complete a one-time, short online baseline survey with the option to remain anonymous. At the end of the baseline survey, participants are asked if they agree to be contacted for future COVID-19 studies, such as enrolment into longitudinal follow-up. If they respond yes, they are contacted by email or phone 3–6 months later with information about participating in longitudinal follow-up. If they subsequently consent to participate in longitudinal follow-up, they are emailed a follow-up survey every 3–6 months.

As of 14 February 2023, 20 319 participants with a self-reported history of SARS-CoV-2 infection and 1041 participants without a history of SARS-CoV-2 infection consented to participate in the JHCLS (figure 1). Of the 20 319 participants with a history of infection, 15 478 with a defined long COVID status (76%) completed the baseline survey and 11 924 (59%) consented to be contacted for future studies. Of these, 6327 have enrolled in longitudinal follow-up and completed their first follow-up survey. Of the 1041 participants without a history of infection, 799 (77%) completed the baseline survey and 501 (48%) consented to be contacted for future studies. Of these, 278 have enrolled in longitudinal follow-up and completed their first follow-up survey. At each round of follow-up, participants without a history of infection are asked if they have experienced COVID-19 symptoms or tested positive for SARS-CoV-2 since their last survey completion. If they respond yes, they are transferred to the survey for participants with a history of infection. As of 14 February 2023, 46 of the 278 participants (17%) who completed their first round of longitudinal follow-up have self-reported either a positive SARS-CoV-2 test or symptoms of COVID-19. The median survey completion time is 20 min for the baseline survey and 24 min for the first follow-up survey.

Figure 1
Figure 1

Participants who agree to be contacted for future studies are asked to consider joining the longitudinal follow-up cohort 3–6 months after they complete the baseline survey. As of this report, not everyone who has agreed to be contacted for future studies has been invited to join the longitudinal follow-up cohort.

Recruitment

Participants are recruited into the JHCLS using several mechanisms: social media posts, Facebook ad campaigns, direct messaging (eg, emails to health departments), word of mouth and participation in a recruitment registry. For social media recruitment, researchers use study-owned and operated Instagram, Facebook and Twitter accounts. In addition, the team partnered with the Audience Development Team at the Johns Hopkins Bloomberg School of Public Health Communications Department to develop targeted Facebook ad campaigns. The study ran three Facebook ad campaigns, each targeting a neighbourhood in the USA with high SARS-CoV-2 case counts. Two campaigns ran in April 2021, the first targeting neighbourhoods in Detroit, Michigan, and the second in Fayetteville and Hope Mills, North Carolina, and South Fulton and Alpharetta, Georgia. The final campaign ran in July 2021 and targeted neighbourhoods in Houston and San Antonio, Texas; Miami and Jacksonville, Florida; and Los Angeles, California.

The study team also partners with the Johns Hopkins Opportunities for Participant Engagement (HOPE) Registry. The HOPE Registry (http://johnshopkinshope.org/) is a recruitment registry designed to connect individuals with teams conducting COVID-19 research studies at Johns Hopkins University. The JHCLS was officially enrolled into the HOPE Registry in April 2021.

Participant eligibility

To be eligible to participate in the study, participants must be at least 18 years of age. Additionally, to be eligible to complete the survey for participants with a history of SARS-CoV-2 infection, participants must self-report at least one positive SARS-CoV-2 test or symptoms of COVID-19. At the start of the baseline survey, eligible participants are provided with a short, informed consent script that details the purpose of the study and provides details on participation. In order to protect the confidentiality of participants, participants are assigned a unique study identifier number and data are collected anonymously.

Study procedures

The JHCLS baseline survey is self-administered and collects data across nine domains: SARS-CoV-2 testing and COVID-19 symptoms, vaccines and SARS-CoV-2 reinfection, COVID-19 treatments and hospitalisation, pre-existing comorbidities, physical limitations and exercise, sleep quality, mental fatigue, anxiety and demographics (online supplemental table 1). Data from these same domains are collected during longitudinal follow-up as well. All data are collected in REDCap, a Health Insurance Portability and Accountability Act-compliant, secure web application designed to build and manage online surveys and databases.15 16 Most survey questions were adapted from validated measures and assessments. However, certain questions were self-designed for the purpose of meeting study objectives. All self-designed questionnaires are available on the National Institute of Environmental Health Sciences Disaster Research Response Resources Portal (https://tools.niehs.nih.gov/dr2/index.cfm/resource/24278).

Supplemental material

SARS-CoV-2 testing and COVID-19 symptoms

To obtain data on COVID-19 history, diagnosis and symptoms, researchers self-designed questions to assess overall health status prior to initial COVID-19 illness, history of SARS-CoV-2 testing and results, initial symptom onset date, symptoms experienced during the acute phase of COVID-19, new/continuing COVID-19 symptoms, impact of each reported symptom on daily activities and self-reported recovery from COVID-19 illness. Participants without a history of infection are asked about health status prior to the COVID-19 pandemic, symptoms experienced in reference to overall general health and self-reported recovery from the effects of the COVID-19 pandemic.

Vaccines and SARS-CoV-2 reinfection

To collect data on vaccination and SARS-CoV-2 reinfection, researchers self-designed questions related to influenza vaccination uptake, COVID-19 vaccination uptake, SARS-CoV-2 antibody testing, participation in COVID-19 treatment trials, SARS-CoV-2 reinfection and self-reported comparison of COVID-19 symptoms experienced during the first reinfection compared with initial illness. Participants without a history of infection are asked questions related to influenza and COVID-19 vaccination uptake.

Treatments and hospitalisation

Researchers self-designed questions to obtain data on medications used to treat initial COVID-19 illness, medications used to treat new/continuing COVID-19 symptoms, COVID-19-related hospitalisation, healthcare utilisation and health-seeking behaviour to treat symptoms. In this section, participants without a history of infection are asked about overall healthcare utilisation.

Pre-existing comorbidities

In order to obtain data on pre-existing health status and comorbidities, researchers self-designed questions to capture current health status, pre-existing health conditions, cancer diagnosis, height, weight, current stress level and stress level prior to the COVID-19 pandemic.

Physical limitations and exercise

To assess physical limitations and exercise, researchers used questions adapted from the Baltimore Longitudinal Study of Aging (BLSA) and the Godin-Shephard Leisure-Time Physical Activity Questionnaire (GSLTPAQ).

The BLSA is a longitudinal study of healthy adults with the aim of understanding how adults adjust to the ageing process, including adjustments in physical activity.17 18 During the baseline survey, participants are asked questions to assess difficulty and level of difficulty in the following domains: mobility (walking a quarter mile/1 mile and going up 10 steps/20 steps) and instrumental activities of daily living (IADL) (light and heavy housework). If a participant reports experiencing difficulty, they are asked to report the level of difficulty (a little, some, a lot or unable to do); conversely, if they do not report difficulty, they are asked to report the level of ease (very easy, somewhat easy or not so easy).17 18 Participants who report difficulty are also asked if they experienced the difficulty prior to their COVID-19 illness or the COVID-19 pandemic. For mobility, if a participant reports any level of difficulty walking a quarter of a mile, they are considered to have a mobility disability. For IADL, if a participant reports any level of difficulty with light housework, they are considered to have an IADL disability.

The GSLTPAQ was validated for use in healthy adults by measuring the correlation between objective measures of physical condition, maximum oxygen intake during exercise (V02 max) and body fat percentile, and subjective measures of total leisure time physical activity.19–21 The questionnaire was found to have a test–retest reliability of 0.94, 0.46 and 0.48 for strenuous, moderate and light-intensity exercise, respectively, with the highest correlation shown between V02 max and strenuous-intensity exercise (Pearson’s r=0.38) and body fat percentile and strenuous-intensity exercise (Pearson’s r=0.21).20

During the baseline survey, participants are asked to report the number of times on average they participate in mild, moderate and strenuous-intensity exercise for longer than 15 min during a typical week. The number of times per week is multiplied by the corresponding metabolic equivalent of task factor (3, 5 and 9 for mild, moderate and strenuous-intensity exercise, respectively) and summed for a total leisure activity score.19 20 A score of >24 indicates an active lifestyle, a score of 14–23 indicates a moderately active lifestyle and a score of <14 indicates an insufficiently active/sedentary lifestyle.19 Participants with a history of infection are asked to report the number of times they exercised in each category before and after their COVID-19 illness; participants without a history of infection are asked in reference to before and after the COVID-19 pandemic.

Sleep quality

Sleep quality is assessed using questions adapted from the AIDS Linked to the IntraVenous Experience (ALIVE) Study and the Idiopathic Hypersomnia Severity Scale (IHHS). The ALIVE Study is a prospective cohort study designed to characterise the incidence and natural history of HIV infection among injection drug users in Baltimore, Maryland.22 Participants are asked how often they experience a list of five items related to sleep quality over the past 4 weeks. For each item, participants respond based on the following scale: 1 (all of the time), 2 (most of the time), 3 (a good bit of the time), 4 (some of the time), 5 (a little bit of the time) or 6 (none of the time).

The IHHS was validated for use in patients experiencing three major symptoms of idiopathic hypersomnia: excessive daytime sleepiness, prolonged night-time sleep and sleep inertia, and was found to have high internal consistency (Cronbach’s α=0.89) and good content validity.23 24 The scale consists of 14 items and each item is scored separately and then summed together for a total score ranging from 0 to 50. Higher scores represent more severe/frequent symptoms of idiopathic hypersomnia.23 24 For the purpose of the JHCLS, researchers used four questions from the IHHS for a range of scores from 0 to 14.

Mental fatigue

Mental fatigue is assessed using the Wood Mental Fatigue Inventory (WMFI). The WMFI has been validated for use in patients with myalgic encephalomyelitis/chronic fatigue syndrome and was found to have high internal consistency (Cronbach’s α=0.93) and good test–retest reliability (Pearson’s r=0.887).25 Participants are asked how much they have been bothered by a list of nine items over the past 2 weeks. Each item is scored on the following scale: 0 (not at all), 1 (a little), 2 (somewhat), 3 (quite a lot) or 4 (very much). At the end of the assessment, the scores are summed together for a range of 0–36. Higher scores indicate greater levels of mental fatigue.25

Anxiety

To assess anxiety, researchers use the Generalized Anxiety Disorder-7 (GAD-7). The GAD-7 has been validated for use in the general population and was found to have both high internal consistency (Cronbach’s α=0.92) and good criterion, construct, factorial and procedural validity.26 Participants are asked how often they have been bothered by a list of seven items over the past 2 weeks. For each item, participants are scored based on the following scale: 0 (not at all), 1 (several days), 2 (more than half the days) or 3 (nearly every day). At the end of the assessment, the scores are summed together for a range of 0–21. A score of 0–4 indicates no anxiety disorder, a score of 5–9 indicates a mild anxiety disorder, a score of 10–14 indicates a moderate anxiety disorder and a score of >15 indicates a severe anxiety disorder.26

Demographics

Researchers self-designed survey questions to obtain demographic information, including gender, race/ethnicity, country of residence, year of birth, educational attainment, work activities prior to the COVID-19 pandemic, primary occupation, total household income and total number of dependents.

Patient and public involvement

There was no patient or public involvement in the design, conduct, reporting or dissemination plans of our research. However, patient feedback is routinely discussed and considered. Specifically, patients are encouraged to reach out to study team members with suggestions on ways to improve the survey and the survey has been adjusted several times based on patient suggestions. In addition, study findings are regularly disseminated to patients via newsletters posted on the study website (https://covid-long.com/).

Baseline characteristics of JHCLS participants

Among 16 764 participants with a history of SARS-CoV-2 infection and defined long COVID status, the median age was 43 years, 84% were female, 88% self-reported white race and 8.0% self-reported Hispanic/Latino ethnicity (table 1). In terms of socioeconomic status, 70% of participants self-reported a bachelor’s degree or higher and 72% self-reported an annual household income of greater than or equal to $50 000. A diverse array of self-reported pre-existing comorbid conditions were reported, including hypertension (15%), depression/anxiety/other mental health conditions (35%), asthma/reactive airway disease/chronic lung disease (16%) and autoimmune disorders (9.6%) (table 2). In addition, the majority of participants (65%) were classified as overweight/obese based on a calculated body mass index (BMI) of >25.

Table 1

Baseline demographic characteristics of US participants in the Johns Hopkins COVID Long Study

Table 2

Baseline clinical characteristics of US participants in the Johns Hopkins COVID Long Study

Prior to COVID-19 illness, 75% of participants reported very good/excellent health status and 8.2% of participants reported fair/poor health status (table 2). During the acute phase of COVID-19 illness, 99% of participants reported experiencing at least one symptom. Of those, 90% reported cardiopulmonary symptoms (eg, new/worsening cough, shortness of breath, rapid heart rate), 89% reported systemic symptoms (eg, fatigue, muscle weakness, fever), 85% reported neuropsychiatric symptoms (eg, headache, dizziness, neuropathy) and 55% reported gastrointestinal symptoms (eg, vomiting, diarrhoea, lack of appetite). Overall, 9.9% of participants self-reported being hospitalised for their COVID-19 illness and 63% were defined as having long COVID based on the WHO definition.

At the time of study enrolment, 39% of participants with a history of infection reported not being vaccinated against SARS-CoV-2 compared with 56% who reported receiving at least a complete first vaccination series (table 2). The median number of days between initial SARS-CoV-2 infection and study enrolment was 173 days. While most participants reported experiencing their initial SARS-CoV-2 infection in 2020 (56%), 29% reported being infected in 2021 and 16% reported being infected in 2022.

Similar characteristics were found in participants without a history of SARS-CoV-2 infection who completed the baseline survey. Among 799 participants, the median age was 42, 85% were female, 82% self-reported white race and 5.9% self-reported Hispanic/Latino ethnicity (table 1). A higher percentage of participants without a history of infection self-reported ‘other’ race (13% compared with 5.4%) which was largely due to a greater number self-reporting Asian/Pacific Islander/Native Hawaiian race. With regard to socioeconomic status, a higher percentage of participants without a history of infection reported a bachelor’s degree or higher (84% compared with 70%) and 72% reported an annual household income of $50 000 or more. Comparable pre-existing comorbid conditions were reported: hypertension (13%), depression/anxiety/other mental health conditions (36%), asthma/reactive airway disease/chronic lung disease (12%) and autoimmune disorders (8.1%) (table 2). Based on calculated BMI, a slightly higher percentage of participants without a history of infection were classified as having a normal BMI (43% compared with 34%).

Of participants without a history of SARS-CoV-2 infection, 75% reported very good/excellent health status and 7.5% reported fair/poor health status prior to the COVID-19 pandemic (table 2). At time of study enrolment, a higher percentage of participants without a history of infection reported at least a complete first vaccination series (97% compared with 56%). It is worth noting that enrolment for participants without a history of infection opened up in March 2022 when vaccinations were more widely available, likely accounting for this difference.

The demographic and clinical characteristics between those who agreed to be contacted for future studies and those who declined were comparable, with the exception of long COVID status (online supplemental table 2). Unsurprisingly, more individuals who fully recovered declined continued participation in the study. However, the number of indeterminate individuals (too early to determine long COVID status) was similar.

Supplemental material

Strengths and limitations

The JHCLS is a large, online, prospective cohort study of adults with representation in 53 US states and territories. The baseline survey collects comprehensive clinical and behavioural data, including data related to COVID-19 diagnosis and treatment, health history, pre-existing health conditions, and physical, mental and cognitive limitations, and uses several reliable, validated scales to assess outcomes and exploratory variables. Participants are given the option to complete a one-time, anonymous online survey or to consent to longitudinal follow-up at predefined time intervals (every 3–6 months). In addition, the overall participant burden is minimal.

A major strength of the JHCLS is that a positive SARS-CoV-2 test is not required to be eligible to participate. We recognise that testing is often limited or inaccessible, and thus requires either a self-reported positive test or symptoms of COVID-19. In addition, our survey collects data on a wide range of organ systems using several different validated measures. Despite early studies focusing primarily on the respiratory symptoms associated with initial COVID-19 illness (eg, shortness of breath), we appreciate that SARS-CoV-2 may have notable effects on other organ systems following the acute period of infection as well.

Another strength of the JHCLS is the inclusion of participants without a history of SARS-CoV-2 infection which provides a natural control group, while also allowing for the determination of the incidence of long COVID among those who report a SARS-CoV-2 infection during follow-up. Importantly, both samples are comparable in terms of sociodemographic variables and pre-existing health conditions. We also recognise that many of the heterogeneous symptoms reported as long COVID may reflect all of us collectively living through a pandemic (ie, anxiety, depression). Thus, it is important that we compare those with and without infection to evaluate some of these outcomes during the same time frame (vs retrospective or historical controls).

In addition, there are few longitudinal studies focused on post-acute outcomes of COVID-19. Longitudinal studies provide an opportunity to evaluate change over time in exposures and outcomes. The longitudinal collection of data on new/continuing COVID-19 symptoms at each time point during follow-up will allow for evaluation of resolution and persistence of symptoms over time, as well as the impact of reinfection, vaccination and other health changes. To date, just under 7000 participants have consented to participate in longitudinal follow-up and have completed their first follow-up survey.

The JHCLS also has a few limitations. The reliance on self-reported clinical data, including self-reported SARS-CoV-2 tests, may result in recall and measurement bias. Although a confirmed SARS-CoV-2 test would be preferable, we recognise that tests were not available to everyone and that restriction to only those with a confirmed test would introduce selection bias. A second limitation is the possibility of selection bias due to the fact that the survey must be completed using a smart device or computer with internet access. This may preclude participants from lower socioeconomic statuses from participating. There is also a risk that findings from the JHCLS are not generalisable as the majority of participants self-reported white race, female gender and are from a higher socioeconomic status. To address this, we plan to do stratified-specific analyses that may be better representative of individuals within that same stratum. However, whether a study is representative or not depends not on demographics but on potential effect measure modifiers that may or may not include demographics.27 Additionally, results that may not necessarily be generalisable in the effect estimate may still be generalisable in the direction of effect (eg, protective or increased risk) of an exposure on outcome.27

Additionally, many individuals enrol many months after their acute infection when they already have long COVID. A potential selection bias would include increased likelihood of participation among those with more severe long COVID. However, it is important to note that the JHCLS has a subset of individuals (n=2020) who enrolled during their acute infection (within 4 weeks of infection). The high percentage of participants in our study with long COVID (63%) also likely reflects a selection bias on those willing to participate in COVID-19 research. However, those with and without a history of SARS-CoV-2 infection are similar in their demographic characteristics (table 1). Another limitation is the possibility of recall bias, especially among participants with a history of COVID-19 illness experiencing mental fatigue and/or other cognitive limitations at the time of survey completion.

A final limitation is the use of non-validated instruments to collect COVID-19-related data, including COVID-19 diagnosis, treatments and symptoms. We were limited by the unavailability of validated instruments to capture these domains at the time of study initiation. However, when available, we used validated instruments that targeted several specific domains (eg, anxiety, mental fatigue, etc), and when unavailable, we drew upon experience and validated instruments developed for other infectious diseases to develop questions used by our group and others across multiple COVID-19 studies.

Future plans

Moving forward, the JHCLS will continue to enrol additional participants with and without a history of SARS-CoV-2 infection and collect data from the baseline and longitudinal surveys. The study team is in the initial stages of analysing the longitudinal data collected thus far, focusing on the progression and resolution of long COVID symptoms over time. In addition, the study team is planning a cluster analysis of both initial and new/continuing COVID-19 symptoms to help address the broad WHO definition of long COVID. We plan to do this by bringing together the rich symptom data we have in our study with data on the impact each reported symptom has on daily functioning. In the future, the study team may apply for funding to answer additional research questions and to continue following the longitudinal participants for a longer period of time.

The JHCLS has the potential to impact both our overall understanding of long COVID and our ability to identify subgroups of individuals for targeted interventions. We can also capture real-time changes by SARS-CoV-2 variants (based on calendar time), location (using geospatial data), birth/age cohorts and/or vaccine data.

This post was originally published on https://bmjopen.bmj.com