Measurement properties of the Mental Health Literacy Scale (MHLS) validation studies: a systematic review protocol


Mental health is an integral part of overall health and well-being. Global rates of mental disorders are significant, with depression alone affecting over 280 million people.1 Personal health literacy (HL) is defined as ‘the degree to which individuals have the ability to find, understand, and use information and services to inform health-related decisions and actions for themselves and others’.2 Mental HL (MHL), a derivative from and component of HL,3 is defined as the ‘knowledge and beliefs about mental disorders which aid their recognition, management or prevention’.4 Jorm elaborated on the original definition of MHL to encompass the following: understanding ways to prevent mental illness, recognising early signs and symptoms of mental illness, being aware of various help-seeking choices and treatments, awareness regarding methods of self-help and mental health first aid skills to help and support people who have mental illness.5 Accordingly, MHL consists of the following attributes: the ability to identify specific disorders, knowledge of how to obtain mental health information, knowledge of risk factors and causes, self-care methods and available professional assistance, and attitudes that encourage recognition and proper seeking of support.4 Research regarding MHL has covered a wide range of topics, including stigma, help-seeking behaviours and the mental health difficulties experienced by different vulnerable groups.6 Therefore, MHL plays a crucial role in enhancing individuals’ mental well-being by helping them identify their symptoms, find available resources, and obtain the necessary support.7 8

Using validated instruments to assess MHL is vital for developing successful strategies to promote mental health. These instruments can also assist academics and policymakers in identifying knowledge gaps in MHL and designing culturally appropriate solutions tailored to various individual and community needs.9 Developing an MHL instrument requires having a clear operational definition of the construct.3 10 Historically, this construct has been evaluated using two approaches, namely the Vignette Approach and the Scale-based Measurements.11 The Vignette Approach is described as ‘stories about individuals and situations which refer to important points in the study of perceptions, beliefs, and attitudes’.12 This approach has limitations, such as the inability to compare items within the scale, understand the differences between MHL components, and track improvement over time. Scale-based measurements, also called patient-reported outcome measures (PROMs), are ‘measurement instruments that patients complete to provide information on aspects of their health status that are relevant to their quality of life, including symptoms, functionality, and physical, mental and social health’.13

Following a systematic assessment of MHL instruments in 2014, O’Connor and Casey designed the Mental Health Litercy Scale (MHLS) to address these limitations and to produce a valid and reliable assessment tool for MHL.11 The rigour with which the MHLS was developed and its subsequent psychometric properties have made it the most reliable and validated instrument for assessing MHL.14 The scale showed adequate content and structural validity, good test–retest reliability and internal consistency (α=0.873).11 In addition, the MHLS is the only available instrument to measure all aspects of MHL.15

The MHLS is a unidimensional measurement scale with 35 items and 6 attributes based on Jorm’s six MHL attributes.4 The scale items were generated using a combination of adaption of existing MHL items, descriptors from the Diagnostic and Statistical Manual of Mental Disorders DSM-IV-TR21, national and international data, and the clinical experience of the authors and their clinical panel who advised the item generation. The scale score ranges from 35 to 160, with a higher score implying a higher level of MHL. The scale has the following sections: Recognition of Disorders (8 items measured on a 4-point Likert scale), Knowledge of Risk Factors and Causes (2 items measured on a 4-point Likert scale), Self-Treatment Knowledge (2 items measured on a 4-point Likert scale), Knowledge of Professional Help Available (3 items measured on a 4-point Likert scale), Knowledge of How to Seek Mental Health Information (4 items measured on a 5-point Likert-scale) and Attitudes that Promote Recognition and Appropriate Help-Seeking (16 items measured on a 5-point Likert scale), with items 10, 12, 15 and 20–28 as reverse-scored items.11

The scale has been used in various cultural and language contexts, making it a valuable instrument for cross-cultural research studies.16 Modification and cultural adaptation of research instruments have numerous advantages over creating new ones. It permits comparisons of research outcomes from different cultures, facilitating international scientific collaboration and reducing costs and time.17 18 According to Arafat et al,17 cross-cultural validation involves translation, adaption, measurement of reliability (repeatability and internal consistency), evaluation of validity (content validity, face validity, construct validity and criterion validity) and responsiveness.

Nevertheless, this study aims to critically examine, summarise and compare the measurement properties of all language versions of the MHLS by systematically examining the methodological quality and findings of the available publications. While the MHLS has been culturally adapted and translated into numerous languages, comprehensive reviews of the adapted versions are lacking, leaving minimal evidence regarding their measurement properties.16 19 This systematic review is important to researchers aiming to measure MHL in diverse settings as it evaluates and compares the measurement properties of all language versions of the MHLS. The objective is to provide new insights into the measurement properties of the MHLS across different language versions. The findings of this review will be valuable for academics, clinicians and policymakers to enhance their understanding of the MHLS’s reliability and validity in various cultural and language contexts. Furthermore, this review will contribute to the theoretical framework surrounding MHLS validation, guide future research initiatives and facilitate collaborations with researchers and publications in the field of MHLS validation.

The objectives of this study are:

  1. To summarise the used adaptation/validation processes employed in MHLS validation studies,

  2. To assess the methodological quality of the measurement properties of the MHLS across several language versions

  3. To compare and synthesise the findings of studies that examined the measurement properties of the MHLS in different language versions, such as its reliability, validity and responsiveness, by qualitatively summarising or quantitatively pooling the results.


This systematic review will be conducted between September 2023 and December 2023. This protocol adheres to items outlined under the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA) protocol.20 The proposed systematic review will adhere to the Joanna Briggs Institute (JBI) Manual for Evidence Synthesis (Chapter 12: Systematic Reviews of Measurement Properties)21 and the COSMIN methodology for systematic reviews of PROMs.22 The results will be presented according to PRISMA 2020.23 The systematic review methodology is summarised in figure 1. The study is registered at PROSPERO.

Figure 1
Figure 1

Systematic review methodology summary. MHLS, Mental Health Literacy Scale. SD, Standard Deviation.

Patient and public involvement


Search strategy

The review will begin with forming a research team of individuals with content and methodological competencies.24 The team will advise on the overarching research question and the entire study protocol, including identifying the search terms and databases. The review will be conducted in four stages per the JBI Standards.21

In the first stage, an initial search of the PubMed database will be done using a sensitive search filter25 to find studies on the measurement properties of MHLS (see online supplemental file 1A). The initial search will follow ‘Filter 1: Sensitive search filter for measurement properties’, which guarantees 97.4% sensitive and 4.4% precise results (table 1). In the second stage, we will search the electronic scientific databases PsycINFO, CINAHL, Scopus, MEDLINE, Embase (Elsevier), PubMed (NLM) and ERIC using the final Boolean expression created in the previous phase (see online supplemental file 1B). In the third stage, the reference lists of all papers included in the second stage will be examined, and more relevant publications will be located and incorporated into this study. In the final stage, the MHLS creators will be contacted to identify validation studies not retrieved in the previous searches.

Supplemental material

Table 1

Systematic review search strategy

We have already identified the search filters (see online supplemental file 1A). These were combined with phrases searched for the concept of interest (Mental Health Literacy) ‘AND’ the measuring instrument of interest (MHLS). However, no population search was added because there were no population type, age or setting restrictions. These searches were paired with the measurement properties search filter to locate all studies on the MHLS measurement properties that assess MHL in all populations. For a more thorough search, we used the sensitive filter. The exclusion filter was used to eliminate records from the search, such as case studies and animal studies.

Study screening and selection

The screening and selection approach will be summarised using the PRISMA flowchart.23 Our review question and inclusion criteria are framed using the PICO (Population, Instruments, Construct, Outcomes) method.21 Eligibility criteria, as shown in table 2, are as follows: (1) Participants: The review will consider studies that validate the MHLS in any population (eg, community representation, students, perinatal patients or health professionals) without restricting participants’ age group; Context: The review will consider all primary research that validated the MHLS in all global settings (ie, as acute care, primary healthcare, or the community); (2) Instrument and Construct: The review will focus solely on O’Connor and Casey MHLS;11 (3) Outcomes: Measurement properties (reliability, validity and responsiveness) of adapted MHLS will be assessed and reported based on the individual study as in table 321; (4) Types of Sources: The review will consider primarily published designs empirically validating the MHLS, including translation and cultural adaptation, reliability and validity testing using various statistical analyses.17 The aim of the included studies should be the evaluation of one or more measurement properties.22 This review will exclude studies that only use the MHLS as an outcome measure; (5) Language: Only English papers published will be eligible for review. Non-English publications will be excluded during the screening phase; (6) Date: Since the MHLS was created in 2015, only studies published between 2015 and 2022 will be considered.

Table 2

Systematic review inclusion and exclusion criteria

Table 3

Systematic review outcomes: measurement properties

The retrieved literature will be imported into Covidence. The publications will be screened in two steps: The title and abstract will be reviewed, and then the full text will be examined. Two reviewers (RE and MA) will independently examine retrieved abstracts using this review’s previously specified eligibility criteria. The author of MHLS will be contacted to identify additional studies, and citations will be searched for additional articles. Covidence will be used to identify and delete the duplicates. The two reviewers will meet at the beginning, midpoint and end of the abstract review process to discuss concerns and uncertainties relating to study selection and, if necessary, alter the search approach. Another two researchers (RE and MB) will independently review the full manuscripts. A third reviewer (IE) will make the final judgement when there is disagreement over research inclusion. With IE and MA having been experienced professionals and scholars in the field of public health and RE and MB being doctoral candidates in public health, this group is an optimal team to select and review articles for this study. EM will provide methodological guidance to the research team. The systematic review will document and report the reasons for excluding full-text papers that do not match the inclusion criteria. Finally, reviewed articles will be retained for synthesis.

Data charting

Using the Microsoft Excel 365 spreadsheet template that the reviewers adapted from the COSMIN website,26 two independent reviewers will perform the data extraction and the methodological quality assessment of full-text articles that meet the inclusion criteria. Before beginning the review, we will conduct calibration exercises, such as piloting the forms on two studies, to ensure consistency among reviewers.26 The data charting instruments (see online supplemental file 1C) were adapted from the COSMIN methodology for systematic reviews of the user manual (PROMs).22 Disagreements between the reviewers will be handled through discussion or with the assistance of a third reviewer. We will contact the authors of the study to resolve any uncertainties. The three focus areas, namely, the validation/adaptation process, risk of bias assessment and measurement properties evaluation, will guide our data ‘charting’. We will chart data by publication year, instrument administration (country, target language, setting), included sample characteristics (population group, age mean (SD), gender (% female), sample size and calculation), number of missing data, response rates, interpretability (distribution (skewness and/or kurtosis), percentage of missing items, percentage of missing total scores, floor and ceiling effects), feasibility (completion time, patient’s comprehensibility and type and ease of administration), MHLS score, and reported MHLS item modifications.

Assessment of risk of bias

We will determine the quality of the measurement properties by using the COSMIN Risk of Bias (RoB) checklist, which will be filled out to evaluate the methodological quality of each study or the risk of bias in the study’s findings. The following nine boxes from the checklist will be used: PROM development, Content validity, Structural validity, Internal consistency, Cross‐cultural validity/Measurement invariance, Reliability, Measurement error, Criterion validity, Hypothesis testing for construct validity and Responsiveness. Only the boxes for the measurement properties reviewed in the article will be evaluated using the RoB, which should be used as a modular tool.27 Quality rating options for Items under each box are ‘very good’, ‘adequate’, ‘doubtful’, ‘inadequate’, or ‘Not Applicable’. To establish the overall quality of a study, the lowest rating of any standard in the box will be used (ie, ‘the worst score counts’ principle). For example, if one item in a box is scored as ‘inadequate’ for a reliability study, the total methodological quality of that reliability research is graded as ‘inadequate’. The translation process methodological quality will be determined by using the COSMIN Study Design checklist that provides standards for translating an existing PROM in the box Translation process.28

Evaluation of measurement properties

The results of measurement properties will be rated based on the criteria presented in table 4. Ratings will vary from positive (+), negative (−) and indeterminate ratings (?) according to individual study measurement property results.22 As mentioned, the content validity rating criteria results were based on the COSMIN methodology guidelines for assessing the PROMs User Manual 22 content validity.29 Specific MHLS hypotheses for ‘Hypothesis Testing for Construct Validity’ and ‘Responsiveness’ were developed (online supplemental file 1D).

Table 4

Quality criteria for measurement properties

Data synthesis and levels of evidence

The results will either be quantitatively or qualitatively combined. We will present these pooled or summarised results per measurement property (see online supplemental file 1C), together with a grade for the quality of the evidence (high, moderate, low or very low) and a rating of the pooled or summarised results (+/−/?).

Quantitative pooling of the results

In case of availability of more than two investigations per measurement property and language version, meta-analyses will be conducted, and the findings will be statistically pooled. Calculating weighted averages (depending on the number of participants participating in each research) and 95% CIs will yield pooled estimates of measurement properties. For assessing test–retest reliability, one can calculate weighted mean intraclass correlation coefficients (ICCs) and 95% CIs using a standard generic inverse variance random effects model.30 ICC values can be combined based on estimates obtained from a Fisher transformation, z=0.5 × ln ((1+ICC)/(1−ICC)), which has an approximate variance of (Var(z) = 1/(N−3)), where N is the sample size.31 For evaluating construct validity, we will aggregate all correlations between a (PROM) and other PROMs that measure a similar construct. Meanwhile, Cronbach’s alpha will be reported as weighted means. To conduct meta-analyses, we will be consulting a statistician.

Qualitative summary of the result

If it is impossible to pool the results statistically, the results of each measurement property will be summed up qualitatively. For example, we will provide the range (lowest and highest) of Cronbach’s alpha values found for internal consistency, the percentage of confirmed hypotheses for construct validity or the range of each model fit parameter on a consistently found factor structure in structural validity studies.

Applying measurement properties criteria to the pooled or summarised results

The pooled or summarised result per measurement property per language version of MHLS will again be rated using the same quality standards for good measurement properties (table 4). The overall assessment of the combined or summed outcome may be positive (+), negative (−) or indeterminate rating (?). The ratings will be provided in the summary of findings tables (see online supplemental file 1C).

Using the GRADE approach, which is a systematic approach to rating the certainty of evidence in systematic reviews, the following four factors will be considered when evaluating measurement properties to determine the quality of the evidence in this systematic review (table 5): (1) risk of bias (ie, quality of the studies’ methodology), (2) inconsistency (ie, unexplained, inconsistent results across studies), (3) imprecision (ie, the total sample size of the available studies) and (4) indirectness (ie, evidence from different populations than the population of interest in the review).22

Table 5

Definitions of quality levels

Data presentation

The data gathered from the included papers will be presented in a tabular format, with the table reporting essential findings relevant to the review topic. The tabulated data will accompany a narrative summary describing how the results relate to the review objective and question.


MHL is essential for enhancing mental health and decreasing treatment disparities. It helps healthcare professionals comprehend the educational requirements for mental health among patients and communities. Additionally, it assists individuals in understanding their symptoms, locating relevant resources and receiving appropriate healthcare assistance.8 Improving and maintaining healthcare provision is a challenge for practitioners and policymakers. Also, patients possess distinct perspectives on healthcare quality; however, their potential for measuring it remains untapped.13 This systematic review provides a unique insight into the measurement properties of the MHLS in a cross-cultural context. The review uses a rigorous approach to summarise the evidence on MHLS reliability and validity and to assess bias and heterogeneity in the results. It will provide academics, clinicians and policymakers with needed evidence to adopt the MHLS in their research or practice based on its reliability and validity levels and will guide them in selecting the most appropriate version for their specific context. In addition, it will assist in assessing the consistency of results across different populations, settings, and study designs.

Furthermore, the review will provide a robust model and a transparent review of measurement properties using COSMIN guidelines.21 As such, a notable strength of this review is that it analyses the measurement properties of all language versions of the MHLS, emphasising the importance of researchers measuring MHL in various settings. Additionally, the review will adhere to the JBI Manual for Evidence Synthesis (Chapter 12: Systematic reviews of measurement properties)21 and the COSMIN methodology for systematic reviews of PROMs user manual22 and will be reported according to the PRISMA guideline.23 23 However, this systematic review will be limited by the temporal discrepancy between the MHLS development in 2015 and the available resources for measuring properties’ quality evaluation, which existed after 2018. In addition, excluding non-English papers due to logistical constraints could be a limitation. We anticipate that the heterogeneity of the studies will impact the ability to do meta-analyses.

This post was originally published on