Evidence on psychometric properties of self-report questionnaires in evaluating blended learning in health sciences university students: research protocol for systematic review and meta-analysis

Introduction

Amidst the COVID-19 pandemic, blended teaching and learning (BTL) has become a prevailing educational model for allied health science university students worldwide. BTL, characterised as the intentional fusion of traditional face-to-face (F2F) classroom instruction with online learning experiences,1–3 has provided educational institutions with a flexible approach to adapt to the challenges posed by the pandemic.

This approach ensures the safety of allied health science students, faculty and staff by minimising in-person interactions.4 It allows universities to transition seamlessly between fully online and in-person instruction. Nevertheless, BTL introduces unique challenges for allied health science university students who heavily rely on hands-on patient care experiences as a fundamental part of their professional training.5–7

To optimise the effectiveness of BTL in health professional education (HPE), accurate assessment of students’ perceptions of this blended learning approach is crucial. While previous studies have employed self-report questionnaires like the Classroom Environment Questionnaire,8 Dental Clinical Learning Environment Instrument9 10 and Student Course Experience Questionnaire11 12 to gauge students’ perspectives on their learning environments, these instruments were primarily designed for traditional F2F settings. In BTL, where students frequently transition between in-person and online learning modes, their perceptions significantly impact their learning approaches, utilisation of online learning technologies13 and engagement with online course materials alongside in-person activities.14

Despite the growing relevance of BTL in HPE, there remains a noticeable gap in the literature regarding the availability of reliable and valid self-report questionnaires explicitly tailored to assess students’ perceptions of BTL from a relational student-learning perspective. Notably, more systematic reviews need to synthesise evidence for reliable, valid and psychometrically sound instruments for BTL assessment. This systematic review protocol addresses this critical gap by identifying and evaluating existing self-report questionnaires designed to assess students’ perceptions of BTL.

By conducting a comprehensive review of the available literature, this study intends to provide valuable insights for educators, administrators and policymakers seeking to enhance the quality of education in the modern era, especially in the context of health professional education during and beyond the COVID-19 pandemic.

Objective

Our primary objective is to critically appraise, compare, and summarise the psychometric property scores of self-report questionnaires evaluating the quality of BTL delivery among health science university students. Specifically, this review aims to determine these measurement properties of the tools used in assessing BTL:

  1. Reliability, measurement error, internal consistency,

  2. Content validity, criterion validity, construct validity, and

  3. Responsiveness.

Study design

Conducting a systematic review and meta-analysis study design ensures a comprehensive examination of the psychometric properties of the self-report questionnaires used by university students to evaluate blended teaching and learning in health sciences programmes. Adhering to the Preferred Reporting Items for Systematic review and Meta-Analysis (PRISMA) Protocol for systematic reviews and meta-analyses, researchers will follow a 10-step process for implementing systematic reviews of patient-reported outcome measures.

The University of the Philippines’ Research Grants Administration Office exempted this research protocol from ethics review evaluation under protocol number UPMREB 2022–0259-EX and has been registered with PROSPERO. Study procedures will begin with our PROSPERO registration on 11 December 2022, and our methods will be used until the last search date. Figure 1 illustrates the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) 10-step procedure.

Figure 1
Figure 1

The 10-step procedure for conducting systematic reviews and meta-analyses of patient-reported outcome measures by COSMIN.21 COSMIN, COnsensus-based Standards for the selection of health Measurement INstruments; GRADE, Grading of Recommendations Assessment, Development and Evaluation.

Patient and public involvement statement

None.

Data gathering procedure

These databases will be searched: PubMed, EMBASE, Web of Science, MEDLINE (Ovid), PsycInfo (via ProQuest), CINAHL, EBSCOhost, ERIC, Scopus, ScienceDirect, Google Scholar, JSTOR, Acta Medica Philippina, Philippine Journal on Health Research and Development and HERDIN. Pearling will be used to determine relevant studies from the reference lists of the included articles. Reviewers will improve the search strategy and repeat its implementation if many new studies are found through pearling. Zotero manages the references in each database.

In addition to employing Zotero for reference management, the reviewers manually used MS Excel for the search strategy and critical appraisal processes throughout the study. MS Excel proves advantageous for systematic reviews due to its cost-effectiveness, user-friendly interface, flexibility and widespread acceptance. Its compatibility with other tools further enhances its practicality.

Eligibility criteria

The inclusion criteria are as follows:

  1. A study population consisting of health sciences university students (ie, medicine, physical therapy, occupational therapy, speech-language pathology, psychology, nutrition and nursing) engaged in BTL.

  2. Studies that reported the development of self-report questionnaires for health sciences university students evaluating the quality of BTL delivery.

  3. Studies that determined the measurement properties of self-report questionnaires for health sciences university students on BTL.

  4. Studies that reported the distribution of scores, percentage of missing items, floor and ceiling effects, the availability of scores and change of scores or a minimally significant difference of self-reported questionnaires used in evaluating BTL delivery among health sciences university students.

No time or language restrictions will be used.

The exclusion criteria are as follows: (1) studies reporting students’ perceptions, attitudes, learning experience, self-efficacy, satisfaction and learning outcomes on BTL delivery; and (2) biographies, case reports, editorials, newspaper articles, handouts, consensus development conferences, practice guidelines, short communications, abstracts and meetings.

Search strategy and study selection

The PCC (Population, Concept, and Context) framework, adapted from Joanna Briggs Institute, offers a precise and structured approach for formulating the search strategy.

We will use search terms in developing these three concepts:

  1. Context (BTL): These are examples of search terms: blended learning OR delivery blended teaching OR learning flexible learning.

  2. Population (Health Sciences University Students): All university college students currently taking courses in the medical field engaging in BTL, namely medicine, physical therapy, occupational therapy, speech-language pathology, psychology, nutrition and nursing. Examples of search terms are students, Medical OR medical student OR medicine student OR intern OR interns.

  3. Construct (Psychometric Properties including reports on validity and reliability): These are the example search terms: instrumentation OR methods OR validation studies OR comparative study OR psychometrics.

These sample keywords for exclusion will be included in the search strategy: Delphi-technique OR cross-sectional OR biography OR case reports.

The search strategy aims to locate both published and unpublished studies. An initial limited search of MEDLINE (PubMed) and CINAHL (EBSCOhost) was undertaken to identify articles on the topic. The text words contained in the titles and abstracts of relevant articles and the index terms used to describe the articles were used to develop a whole search strategy for PubMed, EBSCO, ProQuest, Google Scholar, and ScienceDirect (see online supplemental appendix A). The search strategy, including all identified keywords and index terms, was adapted for each included information source. The reference lists of all studies selected for critical appraisal were screened for additional studies. Keywords were combined with Boolean operators and truncations to create the search strategy.

Supplemental material

An initial pilot search was undertaken to identify pertinent articles for our review. Here is a breakdown of the findings from different databases:

  • PubMed yielded 272 hits, out of which 55 papers were potentially relevant.

  • From EBSCO, 54 articles were retrieved and eight were found to be potentially relevant.

  • ProQuest had three hits with one article being potentially relevant.

  • A search on Google Scholar returned two hits and one potentially relevant paper.

  • Lastly, ScienceDirect produced 569 hits with 55 potentially relevant to our review.

Overall, the results from this pilot search indicate many articles that fit our inclusion criteria. This underscores the feasibility of our systematic review, confirming that a satisfactory volume of articles is available for thorough evaluation.

Drawing from the COSMIN search filter and Biomedische Informatie search blocks, the search terms will efficiently identify relevant measurement property studies and outcome measurement instruments. These strategies are detailed in online supplemental appendix A and streamline the search process.

Using the eligibility criteria, two independent reviewers will examine the titles and abstracts of the studies to identify relevant articles (online supplemental appendix B). Two independent reviewers will reassess the relevance of the studies by reading full-text articles (online supplemental appendix C). Throughout the process, the reviewers will reach a consensus by discussion and a third independent reviewer will arbitrate if needed. The PRISMA flow diagram shows the selection process (online supplemental appendix D).

The COSMIN risk of bias checklist

Two independent reviewers will assess each study’s methodological quality using the COSMIN Risk of Bias checklist (online supplemental appendix E) comprising 10 boxes covering various measurement properties. Depending on the properties evaluated by the study authors, not all boxes need to be completed. Focusing on the questionnaire’s development and content validity as crucial properties, self-report questionnaires with inadequate content validity will be excluded from further assessment. If content validity is considered adequate, reviewers will continue appraising papers for more properties such as structural validity, internal consistency and responsiveness. The 10-step procedure for evaluating measurement properties is shown in figure 1, outlining the quality appraisal process using the COSMIN Risk of Bias checklist.

Using the COSMIN checklist is the optimal approach for assessing the methodological quality of studies concerning measurement properties. This preference arises from its versatility as it serves various essential purposes:

  1. The COSMIN checklist can provide valuable guidance for designing and reporting studies on measurement properties.

  2. It is instrumental in evaluating the potential risk of bias within individual studies within a systematic review of outcome measurement instruments.

  3. The COSMIN checklist is instrumental for reviewers and journal editors when critically evaluating the methodological quality of articles or grant applications pertaining to studies involving measurement properties.

Consequently, the multifaceted utility of the COSMIN checklist renders it the preeminent choice for such assessments.15

In step 1, two independent reviewers will assess the content validity of the self-report questionnaires using COSMIN boxes 1 and 2 (online supplemental appendices E.1 and E.2). When rating COSMIN boxes 1 and 2, the guidelines classify standards as ‘very good’ for adequate quality evidence, ‘adequate’ for unreported but assumed sufficient quality, ‘doubtful’ when quality requirements are unclear, ‘inadequate’ for insufficient quality and ‘not applicable’ when a standard is unnecessary in the study.16 Following online supplemental appendix F, they will evaluate relevance, comprehensiveness and comprehensibility by assigning sufficient (+) ratings for the 10 criteria in Appendix E.3. Next, they will use online supplemental appendix G to guide their assessment of each study’s ratings summarising specific criteria from Terwee et al (2018) .16 Finally, they will determine the overall content validity ratings provided by online supplemental appendix H, adhering to Terwee et al’s guidelines and using the ‘worst score counts’ method within the COSMIN boxes.16

In step 2, reviewers will evaluate the internal structure of the questionnaire, focusing on structural validity, internal consistency and cross-cultural validity. They will use COSMIN boxes 3–10 (online supplemental appendix E.4) and the guidelines from Terwee et al
16 to maintain consistency in evaluating measurement properties across the included studies. The quality of each study will be considered by taking the lowest rating of any standard in the box.

In step 3, the reviewers will assess the remaining measurement properties, including reliability, measurement error, criterion validity, hypothesis testing for construct validity and responsiveness. They will follow the guidelines in online supplemental appendix E.4 and Terwee et al.16 If there is no gold standard for measuring the construct of interest, reviewers will not use the box for criterion validity or the criterion approach for responsiveness. Instead, they will formulate hypotheses about the expected direction and magnitude between the self-report questionnaire and a well-defined comparator questionnaire. The study results will be compared with these hypotheses to determine the construct validity of the questionnaire.

Independent reviewers will assess the evidence quality using the modified Grading of Recommendations Assessment, Development and Evaluation approach, grading the trustworthiness of the overall ratings (online supplemental appendix I).16 Content validity evaluation considers three factors: Risk of bias, inconsistency and indirectness. After bias assessment, blinded reviewers will extract measurement property results from self-report questionnaires used by health sciences students to evaluate BTL (online supplemental appendix J). The study results will be compared with the criteria for good measurement properties (online supplemental appendix K) with evidence summarised for each property per questionnaire.

In step 4, reviewers will describe the interpretability and feasibility of the questionnaire, assign qualitative meaning to scores and assess factors affecting its practical application. For interpretability, they will report the distribution of scores in the study population revealing floor and ceiling effects, minimally important change values and response shift for change in scores (online supplemental appendix L) as per Terwee et al.16 For feasibility, they will consider aspects such as comprehensibility for students and teachers, type and ease of administration, instrument length, completion time, ease of standardisation, score calculation, copyright, cost, availability and regulatory requirements (online supplemental appendix M) as outlined by Terwee et al.
16

Data extraction

To avoid missing relevant data, two independent reviewers will extract the following data from the studies: (1) the characteristics of the included self-report questionnaires of health sciences university students evaluating BTL (ie, constructs, target population, recall period, number of items, response options, scoring, original language, available translations) (online supplemental appendix N) and (2) characteristics of the health sciences university students (ie, number of samples, age, sex, setting, country, language, response rate) (online supplemental appendix O). The characteristics of the study samples have all the information necessary for the generalisability of the results and for determining the similarities or dissimilarities of the study samples.16 17 This review will not quantify the proportion of F2F and online classes in blended learning (BTL). It will include all studies combining F2F and online learning regardless of allocation method. Eligibility criteria won’t explicitly consider allocation so that all relevant studies will be included.

Content comparison

Two independent reviewers will compare the contents of the self-report questionnaire for university health sciences students evaluating BTL in an academic setting across studies. Content comparison can help decide the best available measurement by checking the differences in content between several self-report questionnaires.17 Online supplemental appendix P presents a content comparison.

Summary of finding tables

The two reviewers will pool all results per measurement property of the self-report questionnaire (online supplemental appendix Q). The table shows the pooled results, overall rating (+/−/±/?) and quality of evidence (ie, high, moderate, low and very low). The summary of the findings table will recommend the most appropriate self-report questionnaires used by health sciences university students to evaluate BTL. The pooled or summarised results will again be evaluated against the criteria for good measurement properties to obtain an overall rating (online supplemental appendix R).

Meta-analysis

When pooling the measurement property results from different studies, sufficient similarity in the study population, setting, instrument (language) version and administration form are required. The MedCalc statistical software will integrate quantitative findings from similar studies providing a numerical estimate of the overall effect.18 The weights assigned to studies will be based on the inverse of the SE and sample size with more minor standard errors and larger sample sizes carrying more weight.18

A random-effects model will be used assuming the studies are distinct and without a common effect. The summary effect of the meta-analysis represents the distribution of the mean of the actual outcomes.19 For test–retest reliability, weighted mean intraclass correlation coefficients and 95% CIs will be calculated using a standard generic inverse variance random-effects model. Construct validity will involve pooling correlations of self-report questionnaires measuring similar constructs such as student attitudes.18

In the event of inconsistent results, we will investigate the underlying reasons and perform subgroup analyses by country. The overall ratings per subgroup can be determined and more subgroup analyses will be conducted based on study quality (adequate/very good vs inadequate). Subgroup analysis will be performed in this particular order by studying quality, course and country. Study quality will include heterogeneity assessment, quality of evidence control, identification of source variety, creation of guidelines and recommendations, addressing publication bias and implementation of informed decision-making. Course subgroup analysis will also be conducted as differing degrees present varying technical requirements that may or may not be supplied by an online setting. Subgroup analysis by country will consider the various education systems’ varying quality and policies. Doubtful study results will be excluded from the result pooling. If the reasons for inconsistency remain unclear, the overall rating will be based on most of the study results, with evidence quality downgraded for inconsistency.20

This post was originally published on https://bmjopen.bmj.com