Strengths and limitations of this study

This is the first systematic review to identify statistical methods that attempt to account for the impact of nonadherence to interventions in randomised noninferiority trials.

A description and critique of the statistical methods identified is provided, along with their target estimands.

Publications from any year, journal or disease area/patient population were reviewed independently by two authors.

One author extracted the data from the eligible papers.

While statistical analysis plans were requested for eligible trials, these could not be obtained for all included trials.
Introduction
Noninferiority trials, which assess whether a new intervention is not worse than a proven comparator by more than a clinically acceptable amount, are becoming increasingly common.1–3 They are principally used when it is hoped that the new intervention may convey some advantage other than better efficacy (its effect under ideal conditions), such as improved safety, tolerability, convenience or reduced cost.4 5
One of the challenges in these studies, and the focus of this review, is how participants not receiving their randomly assigned intervention according to the trial protocol (termed nonadherence or noncompliance) should be handled in the statistical analysis.6 Examples of nonadherence include not receiving a surgical intervention as planned, not taking all of the prescribed doses of a medication, or not attending all of the sessions of an exercise rehabilitation programme. Such nonadherence is common in trials and has been associated with poorer health outcomes.7–9 It can bias estimates of efficacy in either direction and so obtaining an accurate and reliable measure of adherence and accounting for any nonadherence in the statistical analysis of these studies is essential.10 11 Nonadherence may also be linked with missing outcome data if, for example, the trial protocol stipulates that further followup is no longer required once adherence drops below a specific threshold or if nonadherent participants become lost to followup. The terms adherence and compliance are often used interchangeably, though adherence is preferred here since it is felt to better reflect the partnership between the healthcare provider and participant.
A simple approach to handling nonadherence is to define and analyse different analysis sets based on participants’ observed levels of adherence, with consistent results providing greater confidence in the trial conclusions.1 In the setting of noninferiority trials, the intentiontotreat (ITT) and perprotocol (PP) populations have been advocated and are commonly used.4 12 13 However, agreement between the ITT and PP results of these trials does not guarantee that conclusions regarding noninferiority are free from bias caused by differential, or nonrandom, nonadherence (where the factors leading to nonadherence are associated with outcomes).14–16
Standard ITT analyses typically include all participants in their randomised groups irrespective of the intervention actually received.17 Thus, they reflect the effect of assigning individuals to interventions in clinical practice where not everyone is fully adherent (also known as the ‘effectiveness’ of an intervention). This approach preserves the balance in known and unknown prognostic factors afforded by randomisation and so any difference in outcomes between study arms can be attributed solely to the experimental intervention.18 However, in the presence of nonadherence, ITT analyses may yield biased estimates of efficacy (also known as the ‘causal effect’ of an intervention).19 In noninferiority trials, where efficacy and effectiveness may be considered equally important, this can increase the probability of falsely claiming noninferiority and, therefore, accepting a worse intervention.11
Modified ITT (mITT) analyses are commonly used to address some of the limitations with standard ITT methods.20 This approach allows some randomised participants, such as those who never receive any of the allocated intervention or who are identified as ineligible after randomisation, to be excluded according to prespecified rules.18 However, across trials, there is substantial variability in how this population is defined and bias may be introduced by subjectively excluding individuals from analysis.18 20 In addition, mITT analyses are not typically used to account for the impact of nonadherence.
PP analyses estimate the efficacy of interventions typically by excluding or censoring individuals with major protocol violations, including those who are nonadherent to their allocated intervention.1 6 17 Excluding participants in this way can lead to selection bias because nonadherent individuals generally differ from those who are fully adherent with respect to prognostic factors.21 22 Furthermore, using a PP analysis to address differential nonadherence is likely to reduce the protection provided by randomisation, so that trial arms are not fully comparable; this potentially biases the study results in either direction.11 In other words, any difference in outcomes between trial arms may no longer be due to the experimental intervention only. To obtain valid results from a PP analysis, we need to recover the protection due to randomisation, typically through a statistical method that (given certain assumptions) correctly adjusts for factors associated with both adherence and outcome (confounders).21
Statistical techniques that attempt to account for the impact of nonadherence and thus estimate the causal effects of experimental interventions exist. These range from simple approaches, such as including observed adherence as a covariate within a regression model, which like PP analyses is susceptible to selection bias, to more sophisticated techniques, such as instrumental variable (IV) methods and inverseprobability weighting, which allow for nonadherence while attempting to maintain the balance produced by randomisation.23 24 Several of these methods attempt to estimate the complier average causal effect (CACE), which is the causal effect of an intervention for individuals who would always be fully adherent regardless of assignment (known as compliers).25 In other words, it is a comparison of the average outcome among those who are fully adherent in the experimental arm with the average outcome among the comparable group in the control arm who would fully adhere to the experimental intervention, if offered.
It is unclear which of the alternative methods have been applied in the setting of noninferiority trials, to what extent, and with what results. Therefore, this systematic review aimed to identify statistical methods that can be used to account for the impact of nonadherence to interventions (thereby estimating the causal effects of experimental interventions) in randomised noninferiority trials. Secondary aims were to quantify the use of such methods in these studies and examine their impact on noninferiority conclusions.
Methods
The Ovid MEDLINE database was searched for terms related to adherence, noninferiority trials and statistical methods for handling nonadherence in the titles, abstracts and keywords of papers published up to 31 December 2020 (full search strategy is provided in the online supplemental appendix 1). Eligibility based on identifying appropriate statistical methods was assessed using a threestage process. First, two authors independently reviewed the title and abstract of each paper. Those where the comparison was not randomised, the primary analysis was not for noninferiority, or the analysis assessed costeffectiveness were excluded (costeffectiveness analyses were not of interest because the focus of this review was on estimating the efficacy of interventions). Papers not published in English were also excluded. If the full text was unavailable, the abstract was reviewed against the eligibility criteria to ensure that key papers were not excluded. Next, an automated search of the full texts was performed in order to identify those containing the terms ‘compliance’, ‘adherence’ or ‘complier’. Finally, fulltext reviews of the remaining papers were performed independently by two authors to identify (1) randomised trials with a primary analysis for noninferiority that applied (or planned to apply, for protocol papers) statistical methods to account for the impact of nonadherence to interventions, and (2) methodology papers that described such statistical methods and included a noninferiority trial application. Any discrepancies between reviewer pairs were discussed with a third author in order to reach a consensus. In addition, statisticians within the field were consulted in order to identify key publications, and the reference lists and citations of eligible papers searched for relevant analyses (performed by one author (MD)). Metaanalyses and systematic reviews identified were also searched for eligible noninferiority trials. Where a trial’s published protocol and results paper were both eligible and reported the same statistical method of interest, the protocol paper was excluded to avoid double counting. Statistical analysis plans were requested for all eligible trials.
Supplemental material
A standardised electronic form was used to extract the relevant information from each paper considered eligible. This included details of the trial characteristics (journal, year of publication, disease area or patient population, unit of randomisation, type of experimental intervention, type of primary outcome and noninferiority margin), nonadherence to the interventions (definitions and estimated levels of nonadherence), the statistical method attempting to account for nonadherence (name of the method, estimand, estimate of effect and confidence interval (CI), conclusion regarding noninferiority and any advantages/disadvantages of the method stated) and any other analyses applied to the same outcome (analysis population, estimand, estimate of effect and CI, and conclusion regarding noninferiority). Data extraction was performed by one author (MD). The primary outcome was the statistical method applied (or planned to be applied) in order to account for nonadherence to the interventions. Other outcomes were the impact of applying these methods on the trial conclusions (compared with other analyses applied to the same outcome, where available) and the advantages and disadvantages of the methods where stated by the authors. The impact of applying the methods of interest was assessed using trial results papers only.
This systematic review was registered with PROSPERO and conducted in accordance with the Preferred Reporting Items for Systematic Reviews and MetaAnalyses statement.26 Information was largely combined using a narrative synthesis approach, that is, ‘synthesis of findings from multiple studies that relies primarily on the use of words and text to summarise and explain the findings of the synthesis’.27 All analyses were conducted using Stata V.15.1.
Patient and public involvement
Patients or the public were not involved in the design, conduct, reporting, or dissemination plans of our research.
Results
After removing duplicate publications, our search identified 3235 papers. Of these, 934 were excluded following review of the titles and abstracts, 790 did not contain any keywords in the full texts and 1489 were excluded after fulltext review, leaving 22 papers whose citations and reference lists contained a further 5 papers meeting eligibility criteria. After removing publications of the same trial reporting identical statistical methods of interest, 24 papers remained (figure 1).
The 24 publications, which consisted of 4 protocols, 13 results papers and 7 methodology papers, reported relevant methods on 26 occasions (2 methodology papers both contained 2 relevant analyses). Four of the analyses included in methodology papers were reanalyses of noninferiority trials, one included a simulation study based on a noninferiority trial and four included simulation studies not based on real trials. Fifteen of the 24 papers included (63%) were published within the last 5 years and the most common type of experimental intervention studied was drug interventions (35%) (table 1; online supplemental table A1).
Nonadherence to interventions
Nonadherence to the randomly assigned interventions was defined in the methods, statistical analysis plan or results section of most analyses (n=19, 73%). Fifteen (79%) used a binary definition of adherence, whereas 3 (16%) used a continuous measure (one was unclear). Of the 19 analyses that defined nonadherence to the interventions, 13 reported estimates of nonadherence (the remaining 6 were protocols or simulation studies). More than half reported estimates of nonadherence that were no more than 10%, though the range was wide (1.7%–51.3%) (table 2). For reasons that were not reported, two papers provided data on nonadherence to the interventions in only one arm of the trial.
Statistical methods for handling nonadherence to interventions
In total, 11 different statistical methods that attempt to account for nonadherence to interventions were identified (table 3). The most common were IV approaches (n=9, 35%), including observed adherence as a covariate within a regression model (n=3, 12%), and modelling adherence as a timevarying covariate in a timetoevent analysis (n=3, 12%). Other methods included rank preserving structural failure time models and Gestimation (n=2, 8%), inverseprobabilityoftreatment weighting (n=2, 8%) and the tipping point approach (n=2, 8%, both in the same methodology paper). The other five techniques identified were all reported once. Further details of the methods reported more than once are provided in table 4 and online supplemental table A2. The techniques identified in the 17 protocols and results papers were more commonly specified as sensitivity analyses (n=13, 76%) than primary analyses (n=3, 18%) (one was unclear).
Advantages and disadvantages of the statistical methods
The advantages and disadvantages of the methods identified (as stated by the authors) are given in table 3. Advantages or disadvantages of the techniques used were stated in 8 (33%) of the 24 papers included; 6 were methodological papers and 2 were results papers. No advantages or disadvantages were stated for 5 of the 11 methods identified.
Impact of the statistical methods on noninferiority conclusions
Twelve of the 13 results papers (92%) also included an alternative analysis of the same outcome (online supplemental table A3). All 12 performed an ITT or mITT analysis. In addition, some reported results from PP (n=6, 50%) or astreated (AT; n=2, 17%) analyses. Noninferiority conclusions from the alternate analyses were in agreement with those from the methods of interest on six occasions and could not be compared on six occasions (due to different measures of effect or the results not being provided in full). Five of the six analyses where the different methods were in agreement concluded noninferiority of the experimental intervention versus the comparator. The remaining trial provided mixed findings regarding noninferiority across the two different countries included, though the interpretation of this study appeared inconsistent with its design (a CI approach to determining noninferiority was stated in the methods but not used).
Statistical analysis plans
Statistical analysis plans were requested for all 17 noninferiority trials where the protocol or results paper was included in the review, and obtained for nine of these trials.
Discussion
To the best of our knowledge, this is the first systematic review undertaken to both identify statistical methods that adjust for the impact of nonadherence to interventions in randomised noninferiority trials and also identify the frequency and consequences of their use. We found that few papers reported such methods (less than 2% of those reaching fulltext review). This may be partly due to unfamiliarity with such techniques among trialists and statisticians as a result of the long lead time for statistical methodology to make its way into routine practice. The most common techniques identified were IV approaches, including observed adherence as a covariate within a regression model, and modelling adherence as a timevarying covariate in a timetoevent analysis. Overall, the number of trials implementing relevant statistical methods was too small to draw firm inferences about their impacts on noninferiority conclusions. In six analyses where the results from methods of interest could be compared directly with those from an alternative analysis, conclusions regarding noninferiority were consistent across the different approaches.
Almost half of the methods identified focus on estimating CACE (also known as the local average treatment effect (LATE)). This is the average effect of the experimental intervention within the subpopulation of compliers.25 We argue that this is the natural estimation focus when attempting to account for nonadherence to interventions in the context of noninferiority trials. This is because we want to be confident that there is noninferiority among those who would comply with either intervention. By contrast, including participants who would not fully adhere to both interventions may bias estimation towards noninferiority (in a similar way that, in the context of noninferiority, ITT analyses may be biased towards noninferiority under nonadherence). For similar reasons, we believe that the CACE is preferable to the population average treatment effect (ATE). Lastly, we note that when adjusting for observed adherence within a regression model or modelling adherence as a timevarying covariate in a timetoevent analysis, the target estimand is unclear.
The infrequent use of statistical methods for handling nonadherence seen in the current review has also been observed more generally in randomised controlled trials (RCTs). A review of 100 RCTs randomly selected from those published in 4 highimpact journals during 2008 found only 1 that attempted to account for nonadherence to interventions using a causal inference framework (in which inverseprobabilityofcensoring weighting was applied).6 More recently, Mostazir et al conducted a review of statistical approaches for handling nonadherence to interventions in RCTs published between 1991 and 2015, which identified 88 analyses incorporating 9 different methods.28 IV methods were among the most common and accounted for almost one in four applications of suitable techniques. However, some of the other methods identified (including CACE analyses using maximumlikelihood estimation and adjusted treatment received models) were not captured in the current review focusing on noninferiority trials. Similarly, we did not identify all 12 approaches included in a recent review of methodological papers containing statistical techniques for handling nonadherence to interventions in the context of timetoevent outcomes.29 This suggests that other relevant methods are available but either they are not suitable for comparing active interventions, as is often required in noninferiority trials, or they may not have been applied within these studies. The three aforementioned reviews did not focus specifically on noninferiority trials.
It is perhaps not surprising that IV approaches were the most common method identified in the current review, given that their assumptions are well suited to many doubleblind trials, they can be applied across a range of trial designs, and they are relatively simple to implement in standard statistical software.30 IV methods use randomisation as the instrument in order to account for unmeasured confounders of the outcome and intervention received (ie, adherence). Their main assumptions are: (1) randomisation affects the outcome only through its influence on the intervention received (the exclusion restriction), (2) randomisation does not share common causes with the outcome (the exchangeability assumption), (3) randomisation causes some participants to receive their assigned intervention (the relevance assumption) and, in order to estimate CACE, (4) there are no participants who would always receive the opposite of their random allocation (the monotonicity assumption).23 31 In individually randomised trials, the exclusion restriction and monotonicity assumptions are typically satisfied by effective double blinding and/or use of objective outcomes, and the exchangeability assumption is usually valid since randomisation is expected to produce trial arms that are balanced with respect to prognostic factors.
When these assumptions hold, it is relatively straightforward to show that if we regress the intervention received (ie, adherence) on randomisation, and then use this model to predict each participant’s adherence, these predictions are orthogonal (independent) of all adherence–outcome confounders. Therefore, if in a second step we regress the outcome on these predictions, we get an unconfounded estimate of the effect of adherence on outcome. It follows that, in contrast to techniques that involve inverseprobability weighting, when the above four IV assumptions hold, IV methods enable us to estimate CACE even in the presence of unmeasured confounding (although inclusion of measured confounders can improve precision).32 33 While IV methods may thus appear a panacea, as usual in statistics, there are no free lunches: a lack of precision and statistical power is often a challenge with IV techniques and methods used to adjust for nonadherence more generally.5 30 34 35
The twostage leastsquares (2SLS) regression approach sketched in the previous paragraph can be applied when the intervention is not all or nothing. Suppose that a noninferiority trial is conducted to assess whether prescribing one dose of a medication per week is noninferior to prescribing two doses per week over the course of 4 weeks. For each participant, the monotonicity assumption requires that the potential number of doses taken would be lower if the participant was randomly assigned to receive one dose per week than if they were randomised to receive two doses per week. Assuming there are no covariates and the monotonicity assumption holds, it can be shown that the 2SLS estimator converges toward a weighted average of the causal effects of one unit increases in the intervention among compliers (individuals whose intervention intensity is affected by randomisation (the instrument)).36 37 This is because the implicit effect of the 2SLS analysis is that values of the outcome at which there are more compliers get given greater weight.
A limitation of IV methods is that when interventions are administered at multiple timepoints, standard approaches are susceptible to timevarying confounding and selection bias.21 These biases occur when previous values of a covariate predict the current intervention received and the current value of the covariate predicts outcome.38 If the timevarying confounders are themselves affected by previous intervention received, socalled Gmethods, such as inverseprobability weighting or Gestimation, are required to allow for the feedback loop occurring between the intervention received and confounders over time.21 24 39 Gmethods were seldom reported in the current review, perhaps because they can be more complex to implement than alternative approaches and also rely on assumptions which may be vulnerable to violations. When considering whether to apply an IV approach or a Gmethod, statisticians might consider whether the exclusion restriction and monotonicity assumptions are realistic given the context of the trial, and whether randomisation is a sufficiently strong instrument. Where outcomes are collected at multiple timepoints, inverseprobability weighting may be a more attractive approach if data on potential confounders are also collected throughout followup.
In order to estimate the effect of the experimental intervention in the absence of (full) protection by randomisation, additional assumptions must be made. Most of these are, by their nature, inherently untestable. Each of the statistical methods identified in the current review make slightly different assumptions in order to estimate the effects of interventions under full adherence, and hence each has a different method of estimation; both assumptions and estimation methods have associated advantages and disadvantages. Crucially, all of the methods require reliable information regarding adherence to the randomly assigned interventions, which is often challenging to measure, particularly for longterm therapies.40
Despite these limitations, it is our view that the methods identified have an important role in noninferiority trials with nonadherence to interventions and should be applied as sensitivity analyses alongside other techniques (such as ITT, PP and AT analyses). Given that agreement between ITT and PP analyses cannot guarantee unbiased conclusions in these studies, those with nontrivial nonadherence should assess the sensitivity of trial results to different assumptions in order to guard against falsely claiming noninferiority and accepting a worse intervention. Careful consideration needs to be given to the assumptions that are most plausible given the trial context and planned design, before selecting the appropriate statistical method to adjust for potential nonadherence. Relevant data needed to implement the chosen technique should then be collected as fully as possible. Clearly, the best approach to reducing the potential biases introduced by nonadherence to the interventions is to design trials that minimise such nonadherence. Future work should compare the performance of the methods identified under different nonadherence scenarios in noninferiority trials to facilitate understanding of when they might be applied appropriately.
Strengths and limitations
This is the first systematic review (protocol published on PROSPERO) to identify statistical methods that attempt to account for the impact of nonadherence to interventions in randomised noninferiority trials, quantify the use of such methods in these studies and examine their impact on noninferiority conclusions. The review included publications from any year, journal or disease area and involved two authors agreeing the eligibility of each paper identified in the search. However, it has some limitations. First, the search for eligible papers had to be restricted to those containing terms related to adherence or statistical methods for handling nonadherence in the titles, abstracts and keywords. Publications applying suitable methods in sensitivity analyses may not have referred to the techniques within these fields and, if so, would not have been captured by the search. A wide range of search terms were used to try and mitigate this problem and, therefore, a large number of papers were reviewed. Second, only one database was searched and papers not published in English were excluded meaning it is possible that some eligible papers may not have been captured by the search. However, most major noninferiority trials are likely to be published in English within one of the MEDLINE journals and, therefore, should have been captured. Third, one author performed the data extraction from eligible publications, though other authors were consulted where necessary. Finally, while statistical analysis plans were requested, these could not be obtained for eight of the trials included. For these studies, we cannot be sure that the details provided in the publications reviewed are accurate accounts of the planned analyses.
Conclusion
In noninferiority trials with nonadherence to interventions, ITT and PP analyses are often performed but may result in biased estimates of efficacy and, therefore, agreement between these approaches does not guarantee that conclusions regarding noninferiority are unbiased. Statistical methods that attempt to account for the impact of nonadherence and thereby estimate the causal effects of interventions are available, but their use in noninferiority trials remains extremely infrequent. It is our view that the methods identified should be applied more widely within sensitivity analyses of noninferiority trials. In particular, those with nontrivial nonadherence should assess the sensitivity of trial results to different assumptions in order to guard against falsely claiming noninferiority and accepting a worse intervention.
Data availability statement
No data are available. No additional data are available.
Ethics statements
Patient consent for publication
Ethics approval
Ethical approval was not required because this review used publicly accessible documents.