Does the effect of adolescent health behaviours on adult cardiometabolic health differ by socioeconomic background? Protocol for a population-based cohort study


Optimal levels of cardiometabolic health factors, such as waist circumference, blood pressure and blood glucose, reduce the risk of diabetes and cardiovascular diseases.1–3 Yet, cardiometabolic health factors are unequally distributed in the population.4 Individuals experiencing socioeconomic disadvantage have, on average, worse cardiometabolic health and bear a higher burden from cardiovascular disease and death compared with those with less socioeconomic disadvantage.5 6 Part of this social gradient can be explained by differences in health behaviours, such as smoking, diet, physical activity and alcohol use.7 While it is well established that adolescence is a particularly sensitive period for initiating certain health behaviours, it remains unclear whether and to what extent the effect of health behaviours during adolescence on adult cardiometabolic health may vary across socioeconomic strata. Better understanding of the potential social patterning of these effects could inform policy initiatives aimed at improving population health and mitigating social inequalities in health.

Cardiometabolic health is an important aspect of positive health as it encompasses the proper functioning of the heart, blood vessels and liver, extending beyond the mere absence of manifest disease.8 9 Labarthe9 called the shift in focus from cardiometabolic diseases to cardiometabolic health a revolution in public health. Whether revolution or not, the concept of cardiovascular health can contribute to a shift from a high risk to a population approach, including prevention of risk factors before they manifest, that is primordial prevention. Good cardiometabolic health means that established biological risk factors for cardiometabolic disease are in the normal population ranges, and there is no evidence of manifest cardiovascular disease.10 These risk factors can be objectively measured through biomarkers, such as blood pressure, waist circumference, cholesterol, fasting glucose and insulin. Some of these biomarkers are also used as diagnostic factors (eg, fasting plasma glucose as an indicator of diabetes). However, even at subclinical levels, changes in these biomarkers can affect the development of disease.3 One example is overweight and subclinical elevated cholesterol levels as risk factors for cardiovascular disease.

At least two hypothetical mechanisms may explain how health behaviours contribute to social inequality in cardiometabolic health: the differential exposure to health behaviours and the differential effects of health behaviours.11 Empirical evidence for differential exposure overwhelmingly shows that health behaviours are unequally distributed across socioeconomic groups in childhood and adolescence.12 The differential effect is not a competing but complementary mechanism.13 Also known as differential susceptibility, it indicates that socioeconomic backgrounds may modify the effects of behaviours on cardiometabolic health. While differential exposure to health behaviours has received much attention, relatively few studies have assessed differential susceptibility to health behaviours.13–15

Through differential exposure and differential susceptibility to health behaviours, adolescence might be a pivotal contributor to adult health inequalities.16 To our knowledge, no prior study has investigated differential susceptibility to multiple health behaviours in adolescents. This is a significant knowledge gap because adolescence is a sensitive period of growth and development in which several health-related behaviours are established. We propose to examine behaviour clusters rather than single behaviours, as it has been shown that health behaviours such as smoking, alcohol use, diet and physical activity cluster within individuals.17 18 Health behaviours and their clusters appear to have high stability throughout the life course.19–21 Importantly, adolescent health behaviours might be particularly affected by socioeconomic contexts.16 The theory of health lifestyles indicates that health behaviours cluster as they are embedded in structural factors based on social norms and group identities,22 operating through socialisation in the family23 and with peer-group relationships.24 Therefore, analysing health behaviours without considering the underlying structural components would provide only a partial picture of the overall effects of health behaviours on health.25 Knowledge of whether differential susceptibility to behaviour clusters is at play might influence the choice of preventive strategies to reduce inequalities in cardiometabolic health.26 For instance, in the presence of a differential effect, it might be particularly important to combine population approaches with vulnerable group approaches, targeting groups and contexts that have increased susceptibility to risk factors.27

Socioeconomic factors act at different levels. In adolescence, as an individual’s own socioeconomic position is not yet established, socioeconomic influences come mainly from the family and neighbourhood. For example, parental education and occupational class may influence health behaviours via internalised habitus, health literacy, health control beliefs, access to social networks and means to achieve health goals.25 28 29 Parental education and parental occupational class represent distinct dimensions of family-level social influences.30 In this study, we use the term socioeconomic background as an umbrella. However, we build on a causal model where we assume parental education to precede parental occupational class which in turn is linked to neighbourhood socioeconomic deprivation. Neighbourhood socioeconomic deprivation can have effects on behaviours that are independent of the parental socioeconomic background, for instance, through deterring or facilitating access to physical activity or healthy food environments,31 32 or through peer-group relationships during socialisation.33 34 We, therefore, aim to examine both family and neighbourhood levels of socioeconomic background.

The analysis of differential susceptibility often relies on causal mediation analysis.26 However, contemporary causal mediation analysis comes with a need for numerous identifying assumptions. These assumptions require among others the absence of confounding in three relationships: exposure-outcome, exposure-mediator and mediator-outcome. Additionally, there is the cross-world independence assumption, which cannot be verified in real experiments.35 This complexity arises because the estimands related to mediation address both the direct and indirect effects of socioeconomic exposures on cardiometabolic health via health behaviours.

In our approach, we steer away from causal mediation analysis for estimating differential susceptibility to health behaviours. Instead, we operationalise the differential effect of health behaviours by considering an average causal effect conditional on the socioeconomic backgrounds under examination. By grounding our estimand in first principles, we manage to reduce the number required identifying assumptions. The use of a counterfactual framework aids us in contrasting hypothetical worlds where everyone or no one is exposed. The resulting counterfactual estimand allows us to explicitly acknowledge and be transparent about the assumptions necessary for valid causal interpretation.

Methods and analysis

Our target population will be adolescents from high-income European countries. As study population, we will draw on the Young Finns Study, which recruited 3596 individuals aged 3, 6, 9, 12, 15 and 18 in 1980 (83.2% response rate from a random sample of the Finnish population). Follow-ups were done in 1983, 1986, 1989, 1992, 2001, 2007 and 2011.32 Following a comprehensive school reform in the 1970s, those who were adolescents in Finland in the 1980s had a high participation rate in secondary education and school performance about average for high-income countries. Moreover, in the 1980s, Finland had a prevalence of cardiovascular diseases and risk factors such as overweight, which was high for the time but similar to that observed in many European countries in the recent past.37 38 Hence, Finnish adolescents during the 1980s may have experienced circumstances similar to those in other European high-income countries. The analytical sample will include everyone who as adolescents (12–18 years of age) participated in the 1980 or 1989 surveys and had any relevant biomarkers measured in 2001 or 2011 (33–40 years of age). Figure 1 illustrates the different measurement points used for those who were 3, 6 and 9 and those who were 12, 15 and 18 in 1980. Adolescent health behaviours and socioeconomic background will be measured in 1980 or 1989. Outcomes were measured in 2001 and 2011 (participants aged 33–40) after 21 or 22 years of follow-up.

Figure 1
Figure 1

Illustration of follow-up times and measurement points for the different birth cohorts. Baseline measures in 1980 and 1989 and measurement of the outcomes in 2001 and 2011. Those adolescents in 1980 were born 1962/1965/1968, while those adolescents in 1989 were born 1971/1974/1977. The duration of follow-up is 21 years for former cohorts and 22 years for the latter cohorts.

Conceptual causal model

Based on existing literature and knowledge of the authors, we built a causal model around the main causal path of interest, namely the effect of health behaviour clusters in adolescence (exposure) on cardiometabolic health biomarkers in adulthood (outcome) (figure 2). To visualise the main confounding covariates considered, this graphical representation is a simplified version of the directed acyclic graph (DAG) described in online supplemental figure 1 and available online at Our a priori assumptions regarding the causal relationships between the considered variables are drawn in the detailed DAG, which informs our set of confounding variables to be included in the estimation models.

Supplemental material

Figure 2
Figure 2

Conceptual causal model for adolescent health behaviours (exposure) to cardiometabolic health biomarkers (outcome). NM indicates not measured in this study. U stands for other factors.


Diet, physical activity, alcohol and smoking are key behavioural determinants of cardiometabolic biomarkers and common causes of a range of non-communicable diseases.39 40 These health behaviours may cluster within adolescents.17 41 While some behaviours (eg, diet and physical activity) might be established during childhood, adolescence is an important period of life to measure behavioural clusters because additional health risk behaviours such as alcohol use and smoking are established during this period. Furthermore, health behaviours track from adolescence into adulthood.19–21


We will measure cardiometabolic health with an outcome-wide approach.42 We will focus on eight biomarkers of multiple interconnected physiological systems that have been shown to be causative determinants of cardiometabolic disease. The first two anthropometric biomarkers are as follows (1) waist circumference43 44 and (2) body mass index (BMI). There is evidence that both waist circumference and BMI might causally effect cardiovascular diseases and diabetes.44 45 Moreover, we will examine (3) systolic and (4) diastolic blood pressure, as higher values increase the risk for cardiovascular disease.46–48 Based on findings from randomised controlled trials, it was established that lowering low-density lipoprotein cholesterol (LDL-c) can protect against cardiovascular disease.49 50 From blood samples, we will examine (5) fasting LDL-c, (6) fasting apolipoprotein B (apoB) and (7) fasting plasma glucose. Mendelian randomisation studies pointed to apoB as the primary lipid determinant of coronary heart disease.51 Fasting plasma glucose is one relevant diagnostic biomarker for type 2 diabetes. Finally, based on fasting plasma glucose and insulin, we will examine (8) the homeostasis model of insulin sensitivity (HOMA), as it is used to diagnose insulin resistance,52 which is a causal risk factor for cardiovascular diseases and type 2 diabetes.53 54

The other covariates reported in figure 2 are described in the online supplemental file A.

Minimally sufficient adjustment set

Through visual inspection of the DAG and the web-based tool dagitty (, we identified the minimally sufficient adjustment set, that is the smallest set of confounding factors necessary to estimate the effect of interest based on the established causal model or DAG. The identified factors are a subset of all covariates in the DAG, namely sex, age at baseline survey, birth period, history of disability, adverse birth outcome, history of adverse social environment, history of chronic disease diagnosis, childhood overweight, adolescent mental disorder, adolescent peer position, neighbourhood deprivation, parental education in adolescence, parental occupational class in adolescence and parental smoking. Childhood overweight and mental health are considered unmeasured in the main analysis of this study as these are only assessed for half of the study participants, specifically those who were children (3–9 years of age) in 1980. Peer position is also unmeasured as there are no sociometric measures available in the data.

Operationalisation of the exposure

Health behaviours will be measured by means of self-reported questionnaires in individuals aged 12, 15 and 18 years or when necessary, with support from parents. To identify clusters, we will consider smoking as current/past/never, alcohol use as hazardous/moderate/abstainer, physical activity as inactive/moderately active/sufficiently active and consumption of fruit and vegetables as sufficient/insufficient. For the effect estimation, we will also consider an alternative operationalisation based on dichotomous indicators for smoking (current/past or never), alcohol (user/abstainer) and physical activity (insufficiently active/sufficiently active).

Information on smoking was assessed in a solitary room to provide confidentiality. Gathered information included habitual and current smoking and the number of cigarettes, pipefuls, cigars or snus used up until the baseline assessment. Respondents who smoked daily or habitually will be considered current smokers. Those who reported never smoking across all questions will be classified as never smokers. Those who reported having tried smoking but are not current smokers will be classified as past smokers.

Regarding alcohol, participants were asked how often they drink beer, strong beer, wine, hard liqueurs, or spirits and how many times they had been heavily intoxicated by alcoholic beverages during their lifetime. Those reporting having never consumed any type of alcohol will be categorised as abstainers. Those who report either daily alcohol consumption or at least weekly consumption of strong alcohol or over 10 occasions of intoxication will be considered hazardous drinkers. Those who are neither abstainers nor hazardous drinkers will be categorised as moderate users.

Physical activity will be measured through a three-item questionnaire inquiring about the frequency of physical exercise of at least 30 min outside of school, participation in sport club training sessions and the intensity of normal exercise. Respondents who engaged in no regular sports club training and trained less than once per week outside school will be classified as physically inactive. Those who exercised either every day, 2–6 times weekly at a high intensity, or engaged in sport club training several times a week will be categorised as being sufficiently active. This classification of ‘sufficiently active’ is the closest possible to the physical activity guidelines for young people, which recommends at least 60 min of physical activity daily, and intense activities at least three times per week.55 Those who are neither sufficiently active nor inactive will be classified as moderately active.

Dietary habits will be measured through the frequency of fruit and vegetable consumption. These were obtained with a food frequency questionnaire asking about the frequency of consumption of selected foods during the last month. Among the 12 year olds, the parents answered the questionnaire, while the 15 and 18 years old answered the questionnaire themselves. Intake of fruit and vegetables was assessed on a 6-point scale: 1 (daily), 2 (almost every day), 3 (a couple of times per week), 4 (about once a week), 5 (a couple of times per month) and 6 (more seldom). Guidelines recommend the promotion of daily fruit and vegetable intake.56 Fruit and vegetable consumption will be categorised into a dichotomous variable of sufficient/insufficient, with only daily consumption or almost daily of both fruit and vegetables considered sufficient.

Operationalisation of the outcomes

All eight biomarkers will be examined as continuous variables. Waist circumference, height, weight and blood pressure were measured during a clinical examination. Waist circumference was measured three times and will be calculated as the average of those in cm. BMI will be calculated as weight in kilograms/height in metres squared (in kg/m2). Resting systolic and diastolic blood pressure was measured three times in a sitting position using a random zero sphygmomanometer. The average from the second and third measurements will be used in the analyses. Venous blood samples were taken after 12-hour fasting and used to estimate LDL-c, apolipoprotein B, insulin and glucose. LDL-c was calculated indirectly using the Friedewald formula in subjects with >4 mmol/L triglycerides. Standard methods were used to determine apolipoprotein B in g/L and fasting plasma glucose in mmol/L. Details of these measurements have been reported elsewhere.57 Serum insulin concentrations were measured using a microparticle enzyme immunoassay kit (Abbott Laboratories, Diagnostic Division, Dainabot, Tokyo, Japan). Subsequently, insulin resistance was estimated according to the homeostasis model assessment (HOMA index) as the product of fasting glucose and insulin divided by the constant 22.5.

Operationalisation of the effect modifiers

We will operationalise socioeconomic backgrounds as dichotomised variables to optimise sample size in each stratum when estimating conditional effects.

Socioeconomic backgrounds will be considered on family and neighbourhood levels. On family level, the two levels will be manual versus non-manual occupational class, as well as compulsory education versus continued education. Both education and occupational class were measured at the study baseline (1980 and 1989 surveys and, if not available, imputed from survey responses in 1983 or 1986). Parental highest attained education will be constructed by dichotomising the longest years of education between mother/father into less than 9 years—corresponding to compulsory education (ISCED-11:0–2)—and 9 years or more corresponding to vocational, upper secondary or higher education (ISCED-11:≥3).58 The original classification of 14 groups of parental occupations will be dichotomised into manual (including skilled and unskilled workers, and farmers) and non-manual (including higher and lower level administrative and clerical employees, entrepreneurs, managerial and professional occupations). In two parent families, the data for the parent with the highest occupational class will be used.

On neighbourhood level, we will dichotomise socioeconomic neighbourhood deprivation into those above the national mean versus those with an average deprivation or below the mean. A neighbourhood deprivation score was assigned to all Finnish residents in 250 m2 grids as described in detail elsewhere.32 The score is a mean value across three Z-scores derived from the proportion of adults with only primary education, unemployment rate and proportion of people living in rented housing. Participants’ exposure to neighbourhood deprivation was calculated as a residential time-weighted Z score over the ages 6–21 years. The Z-score will be divided into two groups with 0 as a cut-off for yes (≤0) or no (>0) neighbourhood deprivation.

The operationalisation of the other covariates is described in online supplemental file A.

Statistical analysis

The analysis will be implemented in two steps corresponding to a descriptive and causal analytic task. First, we will run a latent class analysis to assess the presence of clusters of health behaviours (descriptive task). We will use the poLCA package in R. Latent class analysis aims to establish a categorical grouping (or latent trait) that underlies observed associations among the different variables of smoking, alcohol use, diet and physical activity. We will determine the best number of classes from model-fit statistics, entropy and using substantive interpretation. Results will inform the choice of single or multiple risk behaviours for the next step of the analysis.

Second, we will assess the differential effect of the so-defined single or cluster of health behaviours on cardiometabolic biomarkers conditional on family-level and neighbourhood-level socioeconomic backgrounds (causal task). Specifically, our empirical estimand will be the difference in conditional average causal effects (cACE) between the two levels of socioeconomic background variables. Colloquially, the average causal effect among individuals of a certain socioeconomic background can be formulated as the average difference in the biomarker value that would have been observed if those adolescents had a cluster of more risky health behaviours (potentially contrary to the fact) versus engaging in more health-promoting behaviours (potentially contrary to the fact) and none were lost or died during the 20 years of follow-up. The difference in cACE across levels of socioeconomic background will allow us to assess effect modification and thus result in the differential effect of interest. To estimate the cACE, we will fit inverse probability-weighted marginal structural models under representative interventions59 (stratified and with an interaction term between exposure and a chosen socioeconomic background). Internal validity of the estimates requires various assumptions: no residual confounding, consistency, no interference, positivity, no measurement error, correct specification of estimation models and no selection bias from potential losses during follow-up.60 We address confounding via the causal model and ensuing selection of the minimal sufficient adjustment set and considering quantitative bias analyses for omitted confounders. We will address the positivity assumption by computing stabilised inverse probability exposure weights, analysing their distribution and possibly truncating the weights. We will address the no-model misspecification assumption by varying the model specification (1) of the exposure model used to obtain the inverse probability weights and (2) of the marginal structural model for the outcome. For addressing potential selection bias due to non-random losses during follow-up, we will consider the adjustment provided by inverse probability of censoring weights. The other assumptions will be discussed in the manuscript. CIs will be computed via bootstrapping.

Missing data

We will evaluate the amount of missing data and the differences between the participants with and without missing data via standardised mean difference across complete measured variables. When necessary and as auxiliary variables are available (eg, own achieved education, adult profession, measurements from other waves, etc), we will consider multiple imputations by a chained equation for missing at random data via the R package mice or inverse probability weighting of complete cases.

Loss to follow-up

We will assess differences between the participants lost during follow-up and those who remained in the study across the measured covariates. If standardised mean differences will be substantial (eg, >0.05), we will consider the inverse probability of censoring weighting to adjust for non-random loss to follow-up.

Sensitivity analyses

In main analyses, we will estimate effect modification when the population is split into relatively even groups of higher and lower socioeconomic backgrounds. As Finland has a relatively strong middle class, such dichotomisation could potentially conceal inequalities between socioeconomic groups that are more at the margins of the socioeconomic distributions. Therefore, additional robustness checks will be conducted contrasting the lowest and the highest socioeconomic groups of a three-level operationalisations of socioeconomic variables. For parental education, we will use the following three groups: less than 9 years, 9–12 years and more than 12 years. For occupational class, we will use the following: manual, lower grade non-manual and higher-grade non-manual. For the neighbourhood level, in accordance with previous studies,32 we will add a third group around the mean and then set the cut-off for high neighbourhood socioeconomic disadvantage to >0.5 SD above the national mean and the cut-off for low neighbourhood socioeconomic disadvantage to be more than or equal to 0.5 SD below the national mean.

To address unmeasured confounding through childhood overweight, we will use a reduced sample having childhood overweight measured through BMI at age 9 years. The same will be done for childhood mental health. About half of the sample entered the study before age 12 and has thus a measurement of BMI before adolescence and one of mental health in 1989. Given the potential for confounding factors not included in the causal model, we will also consider applying a negative control exposure as part of the quantitative bias analyses.

In our study, the presence of two baselines (1980 and 1989) might lead to some participant loss between the two surveys. These could reduce the representativeness of our study sample. We will consider inverse probability weighting to adjust for these potential losses.

Data analysis is planned for March–April 2024.

Patient and public involvement

This project will not involve any patients or the public.

This post was originally published on