COVID-19 profiles in general practice: a latent class analysis


COVID-19 emerged in China in late 2019. In France, around 38.5 million people have been infected to date, and around 161 000 of these died.1 General practitioners (GPs) were on the front line of the COVID-19 epidemic; they managed the less severe cases and referred the more serious ones to a hospital. Identifying patient profiles in COVID-19 and especially those at a greater risk of a poor outcome might help to improve initial care and manage complications as early as possible. Despite the scale of the COVID-19 epidemic, we are aware of only one study in which COVID-19 symptoms were self-reported via a smartphone application with a view to identifying patient profiles and the associated need for respiratory support.2 Hence, there are no literature data on COVID-19 profiles among patients consulting a GP and the corresponding associations with disease progression. Latent class analysis (LCA) is a patient-centred approach specifically designed to reliably identify subgroups of patients when they exist. LCA has been used successfully to investigate, characterise and validate disease subtypes, stratify patients into risk groups and predict treatment responses.3

We, therefore, sought to identify COVID-19 profiles (based on combinations of initial clinical symptoms and/or signs) in a population of adult primary care patients, using a hypothesis-free LCA. We then determined whether or not these profiles were associated with negative outcomes (COVID-19-related hospital admissions and deaths) at 3 months and persistent symptoms at 3 and 6 months.


Setting, design and participants

From 6 March 2020 to 12 May 2020 and from 19 September 2020 to 18 January 2021 (ie, during the first two waves of COVID-19 in France), we conducted a multicentre, prospective study in four counties in the Paris area (France): Val-de-Marne, Seine-et-Marne, Essonne and Seine-Saint-Denis. 44 GPs were recruited from multiprofessional primary care practices affiliated with the Faculty of Health at Université Paris-Est Créteil (France). During the first wave, the GPs recruited consecutive adult patients consulting for suspected COVID-19.4 During the second wave, only patients with a confirmed diagnosis of COVID-19 (ie, a positive RT-PCR test and/or a positive serology test, and/or a chest CT result suggestive of COVID-19, according to the French national guidelines)5–7 were included. The exclusion criteria were age under 18 and residence in an institution. In the present analysis, we considered all patients with a confirmed diagnosis (figure 1). Patients were followed up for 3 months, and those with persistent symptoms at 3 months were followed up for 6 months.

Figure 1
Figure 1

Study flow chart. GP, general practitioner.

Data collection

Data were extracted from the GPs’ electronic medical records in November 2021. The extracted variables included demographic characteristics, comorbidities and the symptoms and signs of COVID-19 documented for the LCA (documented fever, chills, body aches, cough, sputum, respiratory discomfort, dyspnoea (on effort or at rest), chest pain, rhinorrhoea, odynophagia, ageusia, anosmia, headache, abdominal pain, diarrhoea, nausea, vomiting, asthenia, poor general condition, lung auscultation findings, blood pressure and heart rate).

Patients were followed up as usual by their GP, and all other consultations with healthcare professionals were recorded. Three months after inclusion, the GP phoned or visited the patient and collected data on persistent COVID-19 symptoms, related deaths and hospital admissions. Persistent COVID-19 symptoms were identified according to the GP’s usual clinical practice. We asked the GPs three questions: ‘Do you consider that the patient has recovered from COVID-19? If not, which symptoms persisted? Do you attribute these symptoms to the initial disease?’ Persistent symptoms (if any) were not rated on a scale or using a questionnaire. Patients with persistent symptoms at 3 months were contacted again at 6 months, and the same variables were recorded.

Illustrative variables and outcomes

To characterise the COVID-19 profiles identified in an LCA, we considered the comorbidities at baseline. To investigate the prognostic value of these profiles, we considered the following two outcomes: (1) a 3-month composite outcome that included COVID-19-related hospital admissions and deaths (the relatedness to COVID-19 was judged by consulting the hospital’s records) and (2) the persistence of COVID-19-related signs or symptoms 3 months and 6 months after inclusion. Lastly, we noted about whether a patient had been referred to hospital by the GP in the month following the first consultation.

Statistical analysis

Quantitative and qualitative variables were described, respectively, as the median (IQR) and the number (%). The prevalence of persistent symptoms was calculated at 3 months (as a proportion of the whole study population) and at 6 months (for patients who had symptoms at three and 6 months, as a proportion of the whole study population less those lost to follow-up).

Indicators used to determine COVID-19 health profiles

We first considered all COVID-19 signs and symptoms and the lung auscultation findings as indicators. Given that some indicators were highly correlated, the investigators (EF, SB-G, EA and EM) selected the indicators that they considered to be most relevant. Very poorly documented variables (such as tachycardia and blood pressure) were not considered, and highly correlated variables were grouped together in relevant health domains; for example, abdominal pain, diarrhoea and nausea and/or vomiting were grouped together.

The investigators reached a consensus on 11 indicators, which were then used in the LCA (table 1). To characterise the LCs and predict class membership, we considered the following three active covariates: age, sex and the presence of at least one comorbidity.

Table 1

Indicators estimates for the six-class solution among the 340 COVID-19 patients

Latent class analysis

Based on the selected indicators, we iteratively fitted models comprising up to eight LCs. We then selected the model with the best statistical properties by using a variety of parsimony indices (table 2), since no single approach is universally accepted.8 9 Our one-step approach involved the simultaneous estimation of (1) the LC model of interest and (2) a logistic regression model in which the LCs were related to the active covariates listed above. The LCA was performed using Latent Gold software (V.5.0, Statistical Innovations, Belmont, Massachusetts, USA). We performed a sensitivity analysis with imputation for missing lung auscultation data, using Latent Gold’s multiple imputation procedure. In a second sensitivity analysis, the LC data were stratified on the wave of COVID-19.

Table 2

Goodness-of-fit indices for latent class models comprising one to eight classes (n=340)

Characterisation of the patient profiles

To characterise the identified profiles, we used posterior probabilities to assign patients to their most likely LC. The prevalence of comorbidities and outcomes was compared across LCs using the χ2, Fisher’s exact test or the Kruskal-Wallis test, as appropriate. Post hoc pairwise comparisons were performed when the p value was ≤0.05.

All tests were two sided, and the threshold for statistical significance was set to p≤0.05. P values from multiple pairwise comparisons were corrected using the false discovery rate method.10 Statistical analyses were performed with Stata software (version 14.2, StataCorp, College Station, TX).

Patient and public involvement

All patients received a study information sheet and gave their verbal consent to participation. Patients were not involved in the study design, conduct or reporting or plans for dissemination.


Study population

During the study period, 340 COVID-19 patients were included (figure 1). The median (IQR) age was 47 years (35–57), 202 were female (59.4%) and 163 (47.9%) had at least one comorbidity (table 1). Of the 340 patients, 24 (7.4%) were hospitalised in the first 3 months of follow-up, and 58 (out of 323 with data, 18%) still had persistent symptoms at 3 months. The most frequent symptoms were asthenia (6.8%), anosmia (5.6%) and dyspnoea (5%). At 6 months, 20 (6.4%) of the 311 patients still had persistent symptoms.

Determination of COVID-19 profiles

A six-class solution showed the best fit with a non-significant bootstrap p value, the lowest sample size-adjusted Bayesian criterion, a significant improvement in fit compared with the five-class solution, and no improvement in fit for a seven-class solution, using the bootstrapped likelihood ratio test (table 2). The classification quality was good (entropy=0.83).

The conditional probabilities of indicators and covariates for each of the six LCs (ie, the probability of each indicator being present in class members) are summarised in table 1. Of the 340 patients, 30 (9.0%) belonged to LC1, 43 (12.9%) in LC2, 52 (15.5%) in LC3, 91 (24.5%) in LC4, 65 (20.4%) in LC5 and 59 (17.7%) in LC6. We named the LCs as follows: LC1=‘paucisymptomatic’; LC2=‘anosmia and/or ageusia’; LC3=‘influenza-like syndrome with anosmia and ageusia’; LC4=‘influenza-like syndrome without anosmia or ageusia’; LC5=‘influenza-like syndrome with respiratory impairment’ and LC6=‘complete form’. LC1 and LC2 were also characterised by the lowest probabilities of asthenia and poor general condition (0.12 and 0.01, respectively), and LC5 was characterised by the highest probability of abnormal lung auscultation findings (0.37). Among the active covariates, age and presence of at least one comorbidity were significantly associated with class membership (p=0.017 and p=0.04, respectively). More than half of the members of LC1, LC5 and LC6 had at least one comorbidity. The members of LC1, LC4, LC5 and LC6 were significantly older (with a median age ranging from 49 (39–59) to 51 (40–61)) than members of LC2 and LC3 (median age 40 (28–49) and 37.5 (32–50), respectively) (tables 3 and 4). The sensitivity analysis with imputation of missing data for lung auscultation (n=91) gave similar results (online supplemental tables S1 and S2). After stratification on the wave of COVID-19, all LCs were present. LC4, LC5 and LC6 were the most prevalent in wave 1, whereas LC4 was the most prevalent in wave 2 (online supplemental table S3).

Supplemental material

Table 3

Global comparison of illustrative variables and 3-month and 6-month outcomes in the six-latent classes (LCs) identified (using posterior probabilities classifications; n=340)

Table 4

Pairwise comparison of illustrative variables and 3-month and 6-month outcomes in the six-latent classes (LCs) identified (using posterior probabilities classifications; n=340)

Characteristics of the clinical profiles

The prevalence of hypertension (but not that of other comorbidities) significantly differed from one LC to another (p=0.006); the patients in LC1, LC4 and LC5 were more likely to have hypertension than those in LC2 and LC3. At the 3-month follow-up time point, none of the patients had died but 24 had been admitted to hospital. The LCs differed with regard to the frequency of hospital admission (p=0.016), with a higher rate in LC5 than in LC2 and LC6. Two-thirds of the referrals to hospital by a GP occurred in the month after the first consultation (tables 3 and 4).

The LCs also differed with regard to the prevalence of persistent symptoms (p=0.002). Patients in LC5 and LC6 were more likely to have persistent symptoms at 3 months than those in LC1, LC2 and LC4. The most prevalent persistent symptoms were asthenia and anosmia in LC6 and asthenia and dyspnoea in LC5. At 6 months, there were no significant differences in the prevalence of persistent symptoms between the LCs (p=0.096).



Our data-driven approach of a population COVID-19 patients managed by GPs identified six profiles, namely, ‘paucisymptomatic’ (LC1, 9% of the participants), ‘anosmia and/or ageusia’ (LC2, 12.9%), ‘influenza-like syndrome with anosmia and ageusia’ (LC3, 15.5%), ‘influenza-like syndrome without anosmia or ageusia’ (LC4, 24.5%), ‘influenza-like syndrome with respiratory impairment’ (LC5) and a ‘complete form’ (LC6, 17.7%). Age and the presence of at least one comorbidity were associated with class membership. At 3 months, 7.4% of the patients had been admitted to hospital (with a higher incidence in LC5 than in LC2 and LC6), and 18% had persistent symptoms. Persistent symptoms at 3 months were more prevalent in LC5 and LC6 than in LC1, LC2 and LC4. At 6 months, 20 patients (6.4%) still had persistent symptoms; the LCs did not differ significantly in this respect.

Strengths and limitations

To the best of our knowledge, this study is the first to have defined clusters of COVID-19 patients followed up in general practice. The one-step clustering method used here is known to minimise classification errors, and the use of a bootstrap method and several parsimony indices have increased the reliability of our six-class solution.9 The major limitation of our study relates to the population size and the limited number of events used to validate our classification. Second, our study period did not encompass the presence of all the COVID-19 variants (such as Omicron, found in subsequent waves). Although the COVID-19 variants had different prevalences,11 the signs and symptoms of disease were always the same; hence, the presence of other COVID-19 variants is unlikely to have affected the nature and number of LCs obtained in our analysis. After stratification on the wave of COVID-19, only the frequency of the LCs differed. Although vaccination may prevent long-term symptoms,12 the literature data on the nature and frequency of long-term symptoms associated with the different variants are contradictory.13–15 Long-term symptoms have also been reported in people carrying the Omicron variant. Third, the hospitalisation dates were not available. However, we did have information on the date of referral to the hospital by the patient’s GP. Lastly, our study was limited to the greater Paris region and so might not be representative of the French population as a whole.

Comparison with the literature

In line with a previous longitudinal study in which symptoms were self-reported via a smartphone application,2 we identified six clinical profiles. However, the two studies differed substantially with regard to the clinical profiles. These disparities might be due to differences in methodology (consultation with a GP vs self-reporting of symptoms via an application; an LCA vs unsupervised 5-day time series clustering and the indicators used), and especially in the characteristics of the study population (patients consulting their GP vs patients able and willing to record their symptoms at least three times over 4 days or more). Sudre et al reported on more severe cases because about 20% of their study population had a hospital visit and 6% needed respiratory support (vs respectively, 7.4% and 0.9% in our study). In summary, Sudre et al identified two mild forms (clusters 1 and 2, characterised by upper respiratory tract symptoms) and four clusters (3–6) with higher proportions of patients requiring respiratory support (ranging from 8.6% to 19.8%). Clusters 3 was chiefly characterised by gastrointestinal symptoms, and the patients in clusters 5 and 6 (which could have been called ‘complete forms’) had the highest number of symptoms and the highest hospital admission rates (27.2 and 45.5%, respectively). In our study, the highest hospital admission rate was observed for LC5 (chiefly characterised by the highest prevalence of abnormal lung auscultation findings), rather than for the ‘complete form’ (LC6). It is well known that respiratory impairment with abnormal lung auscultation findings is associated with more severe COVID-19.16 This observation underlines the relevance of the clinical examination performed in our study. Moreover, most of hospital admissions in our study occurred early in the course of disease and were, therefore, more likely to be related to the severity of COVID-19 than to long-term symptoms.

The form with ‘anosmia and/or ageusia’ (LC2) was characterised by the lowest prevalence of asthenia and was similar to Sudre et al’s cluster 12. Apart from LC6 (the complete form), the patients in the classes with the highest probabilities of anosmia and/or ageusia (LC2–3) were younger than those in all the other classes.17 It should be noted that in another LCA-based study of COVID-19 symptoms, the study population comprised both COVID-19 patients and non-COVID-19 patients.18

The demographic characteristics of our COVID-19 patients consulting a GP were similar to those reported in the literature—especially with regard to age and the most frequent comorbidities (hypertension and diabetes).19 Likewise, the prevalences of COVID-19 symptoms in our study (including asthenia, fever, cough, myalgia and headache) were similar to those in the literature.19–22 Anosmia and ageusia were also common and are considered to be specific for COVID-19.17 22 In line with the literature data, our patients’ course of disease was generally mild and did not often require hospital admission.5 19 23 Various studies have shown that hospital admission and deaths are associated with older age,24 male sex25 and the presence of comorbidities.19 26

At the 3-month time point, we observed persistent COVID-19 symptoms in 18% of the patients. This result is in line with the values (from 10% to 30%) observed in other French studies.24 27 The main risk factor for developing persistent symptoms was membership of LC5 (‘influenza-like with respiratory impairment’) or LC6 (‘complete form’). These findings are also consistent with literature data showing that the presence of more than six initial symptoms28 and hospital admission (most common in LC5)28–30 are risk factors for persistent symptoms. As found in the literature, the most frequent symptoms observed at 3 and 6 months in this study were asthenia, anosmia and dyspnoea.31–33

Applicability of the findings

Some studies have shown that loss of smell in COVID-19 patients was less common during the Omicron wave than during the Delta wave.11 In contrast, sore throat was more common during the Omicron wave than during the Delta wave.11 Furthermore, the hospital admission rate was lower during the Omicron wave than during the Delta wave.11 However, if the prevalence of signs and symptoms could have differed across variants, signs and symptoms remain similar. We, therefore, believe that these potential differences had little effect on the nature of the LCs but might have influenced their prevalence. The literature data on the long-term symptoms associated with the different variants are contradictory, although vaccination might prevent these symptoms.12–15 One study showed that persistent symptoms after SARS-CoV-2 infection were more common before the Delta wave than during the Delta and Omicron waves.13 However, the fact that these differences were no longer statistically significant after adjustment for vaccination status suggested that COVID-19 vaccines reduced the risk of long-term symptoms.13 A review found that compared with previous variants of SARS-COV-2, Omicron infections were associated with fewer long-COVID symptoms; however, the small number of studies and the lack of controls for potentially cofounding variables (eg, reinfections and vaccination status) in some studies limited the results’ generalisability.14 It appears that individuals infected with the wild-type variant were more likely to develop long-COVID symptoms. In contrast, the results of another review suggested that there are no significant intervariant differences in long COVID-19 other than for certain general symptoms (with the Alpha and Omicron variants) and difficulty sleeping (for the wild-type variant).15 These findings emphasise the need to identify patients with an elevated risk of developing long-term symptoms (eg, non-vaccinated patients and/or patients with previous variants).


By using a data-driven approach to analyse COVID-19 signs and symptoms, we identified six clinical profiles among patients managed by their GP. Our results highlighted associations with hospital admission and the persistence of symptoms at 3 months. Since most studies of the presentation and clinical course of COVID-19 have been hospital-based, it is important to provide primary care-specific data that might help GPs to optimise patient management. GPs diagnose the majority of patients with COVID-19 and thus have an essential role in combating the ongoing pandemic.34 35 Our results might help GPs to (1) identify at-risk profiles for hospital admission and persistent symptoms, (2) set up procedures for closer follow-up, (3) anticipate possible worsening35 and (4) manage complications as early as possible. The higher prevalence of persistent symptoms in some COVID-19 profiles suggests that the corresponding patients should be followed up by their GPs, who are well placed to take account of the disease’s impact on quality of life and overall health via a patient-centred approach.36 Thus, our findings may help GPs to improve the follow-up of COVID-19 patients in primary care.


We thank Camille Hug and Faïka Bacar for help with data collection and interpretation. We also thank all the study participants and the participating health centres. Lastly, we thank Dr David Fraser (Biotech Communication SARL, Ploudalmézeau, France) for copy-editing assistance.

This post was originally published on