Statistical methods for non-adherence in non-inferiority trials: useful and used? A systematic review

Strengths and limitations of this study

  • This is the first systematic review to identify statistical methods that attempt to account for the impact of non-adherence to interventions in randomised non-inferiority trials.

  • A description and critique of the statistical methods identified is provided, along with their target estimands.

  • Publications from any year, journal or disease area/patient population were reviewed independently by two authors.

  • One author extracted the data from the eligible papers.

  • While statistical analysis plans were requested for eligible trials, these could not be obtained for all included trials.


Non-inferiority trials, which assess whether a new intervention is not worse than a proven comparator by more than a clinically acceptable amount, are becoming increasingly common.1–3 They are principally used when it is hoped that the new intervention may convey some advantage other than better efficacy (its effect under ideal conditions), such as improved safety, tolerability, convenience or reduced cost.4 5

One of the challenges in these studies, and the focus of this review, is how participants not receiving their randomly assigned intervention according to the trial protocol (termed non-adherence or non-compliance) should be handled in the statistical analysis.6 Examples of non-adherence include not receiving a surgical intervention as planned, not taking all of the prescribed doses of a medication, or not attending all of the sessions of an exercise rehabilitation programme. Such non-adherence is common in trials and has been associated with poorer health outcomes.7–9 It can bias estimates of efficacy in either direction and so obtaining an accurate and reliable measure of adherence and accounting for any non-adherence in the statistical analysis of these studies is essential.10 11 Non-adherence may also be linked with missing outcome data if, for example, the trial protocol stipulates that further follow-up is no longer required once adherence drops below a specific threshold or if non-adherent participants become lost to follow-up. The terms adherence and compliance are often used interchangeably, though adherence is preferred here since it is felt to better reflect the partnership between the healthcare provider and participant.

A simple approach to handling non-adherence is to define and analyse different analysis sets based on participants’ observed levels of adherence, with consistent results providing greater confidence in the trial conclusions.1 In the setting of non-inferiority trials, the intention-to-treat (ITT) and per-protocol (PP) populations have been advocated and are commonly used.4 12 13 However, agreement between the ITT and PP results of these trials does not guarantee that conclusions regarding non-inferiority are free from bias caused by differential, or non-random, non-adherence (where the factors leading to non-adherence are associated with outcomes).14–16

Standard ITT analyses typically include all participants in their randomised groups irrespective of the intervention actually received.17 Thus, they reflect the effect of assigning individuals to interventions in clinical practice where not everyone is fully adherent (also known as the ‘effectiveness’ of an intervention). This approach preserves the balance in known and unknown prognostic factors afforded by randomisation and so any difference in outcomes between study arms can be attributed solely to the experimental intervention.18 However, in the presence of non-adherence, ITT analyses may yield biased estimates of efficacy (also known as the ‘causal effect’ of an intervention).19 In non-inferiority trials, where efficacy and effectiveness may be considered equally important, this can increase the probability of falsely claiming non-inferiority and, therefore, accepting a worse intervention.11

Modified ITT (mITT) analyses are commonly used to address some of the limitations with standard ITT methods.20 This approach allows some randomised participants, such as those who never receive any of the allocated intervention or who are identified as ineligible after randomisation, to be excluded according to prespecified rules.18 However, across trials, there is substantial variability in how this population is defined and bias may be introduced by subjectively excluding individuals from analysis.18 20 In addition, mITT analyses are not typically used to account for the impact of non-adherence.

PP analyses estimate the efficacy of interventions typically by excluding or censoring individuals with major protocol violations, including those who are non-adherent to their allocated intervention.1 6 17 Excluding participants in this way can lead to selection bias because non-adherent individuals generally differ from those who are fully adherent with respect to prognostic factors.21 22 Furthermore, using a PP analysis to address differential non-adherence is likely to reduce the protection provided by randomisation, so that trial arms are not fully comparable; this potentially biases the study results in either direction.11 In other words, any difference in outcomes between trial arms may no longer be due to the experimental intervention only. To obtain valid results from a PP analysis, we need to recover the protection due to randomisation, typically through a statistical method that (given certain assumptions) correctly adjusts for factors associated with both adherence and outcome (confounders).21

Statistical techniques that attempt to account for the impact of non-adherence and thus estimate the causal effects of experimental interventions exist. These range from simple approaches, such as including observed adherence as a covariate within a regression model, which like PP analyses is susceptible to selection bias, to more sophisticated techniques, such as instrumental variable (IV) methods and inverse-probability weighting, which allow for non-adherence while attempting to maintain the balance produced by randomisation.23 24 Several of these methods attempt to estimate the complier average causal effect (CACE), which is the causal effect of an intervention for individuals who would always be fully adherent regardless of assignment (known as compliers).25 In other words, it is a comparison of the average outcome among those who are fully adherent in the experimental arm with the average outcome among the comparable group in the control arm who would fully adhere to the experimental intervention, if offered.

It is unclear which of the alternative methods have been applied in the setting of non-inferiority trials, to what extent, and with what results. Therefore, this systematic review aimed to identify statistical methods that can be used to account for the impact of non-adherence to interventions (thereby estimating the causal effects of experimental interventions) in randomised non-inferiority trials. Secondary aims were to quantify the use of such methods in these studies and examine their impact on non-inferiority conclusions.


The Ovid MEDLINE database was searched for terms related to adherence, non-inferiority trials and statistical methods for handling non-adherence in the titles, abstracts and keywords of papers published up to 31 December 2020 (full search strategy is provided in the online supplemental appendix 1). Eligibility based on identifying appropriate statistical methods was assessed using a three-stage process. First, two authors independently reviewed the title and abstract of each paper. Those where the comparison was not randomised, the primary analysis was not for non-inferiority, or the analysis assessed cost-effectiveness were excluded (cost-effectiveness analyses were not of interest because the focus of this review was on estimating the efficacy of interventions). Papers not published in English were also excluded. If the full text was unavailable, the abstract was reviewed against the eligibility criteria to ensure that key papers were not excluded. Next, an automated search of the full texts was performed in order to identify those containing the terms ‘compliance’, ‘adherence’ or ‘complier’. Finally, full-text reviews of the remaining papers were performed independently by two authors to identify (1) randomised trials with a primary analysis for non-inferiority that applied (or planned to apply, for protocol papers) statistical methods to account for the impact of non-adherence to interventions, and (2) methodology papers that described such statistical methods and included a non-inferiority trial application. Any discrepancies between reviewer pairs were discussed with a third author in order to reach a consensus. In addition, statisticians within the field were consulted in order to identify key publications, and the reference lists and citations of eligible papers searched for relevant analyses (performed by one author (MD)). Meta-analyses and systematic reviews identified were also searched for eligible non-inferiority trials. Where a trial’s published protocol and results paper were both eligible and reported the same statistical method of interest, the protocol paper was excluded to avoid double counting. Statistical analysis plans were requested for all eligible trials.

Supplemental material

A standardised electronic form was used to extract the relevant information from each paper considered eligible. This included details of the trial characteristics (journal, year of publication, disease area or patient population, unit of randomisation, type of experimental intervention, type of primary outcome and non-inferiority margin), non-adherence to the interventions (definitions and estimated levels of non-adherence), the statistical method attempting to account for non-adherence (name of the method, estimand, estimate of effect and confidence interval (CI), conclusion regarding non-inferiority and any advantages/disadvantages of the method stated) and any other analyses applied to the same outcome (analysis population, estimand, estimate of effect and CI, and conclusion regarding non-inferiority). Data extraction was performed by one author (MD). The primary outcome was the statistical method applied (or planned to be applied) in order to account for non-adherence to the interventions. Other outcomes were the impact of applying these methods on the trial conclusions (compared with other analyses applied to the same outcome, where available) and the advantages and disadvantages of the methods where stated by the authors. The impact of applying the methods of interest was assessed using trial results papers only.

This systematic review was registered with PROSPERO and conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement.26 Information was largely combined using a narrative synthesis approach, that is, ‘synthesis of findings from multiple studies that relies primarily on the use of words and text to summarise and explain the findings of the synthesis’.27 All analyses were conducted using Stata V.15.1.

Patient and public involvement

Patients or the public were not involved in the design, conduct, reporting, or dissemination plans of our research.


After removing duplicate publications, our search identified 3235 papers. Of these, 934 were excluded following review of the titles and abstracts, 790 did not contain any keywords in the full texts and 1489 were excluded after full-text review, leaving 22 papers whose citations and reference lists contained a further 5 papers meeting eligibility criteria. After removing publications of the same trial reporting identical statistical methods of interest, 24 papers remained (figure 1).

Figure 1
Figure 1

Flow chart showing the eligibility of papers reviewed (uploaded separately).

The 24 publications, which consisted of 4 protocols, 13 results papers and 7 methodology papers, reported relevant methods on 26 occasions (2 methodology papers both contained 2 relevant analyses). Four of the analyses included in methodology papers were re-analyses of non-inferiority trials, one included a simulation study based on a non-inferiority trial and four included simulation studies not based on real trials. Fifteen of the 24 papers included (63%) were published within the last 5 years and the most common type of experimental intervention studied was drug interventions (35%) (table 1; online supplemental table A1).

Table 1

Characteristics of eligible analyses (n=26)

Non-adherence to interventions

Non-adherence to the randomly assigned interventions was defined in the methods, statistical analysis plan or results section of most analyses (n=19, 73%). Fifteen (79%) used a binary definition of adherence, whereas 3 (16%) used a continuous measure (one was unclear). Of the 19 analyses that defined non-adherence to the interventions, 13 reported estimates of non-adherence (the remaining 6 were protocols or simulation studies). More than half reported estimates of non-adherence that were no more than 10%, though the range was wide (1.7%–51.3%) (table 2). For reasons that were not reported, two papers provided data on non-adherence to the interventions in only one arm of the trial.

Table 2

Estimates of non-adherence to interventions reported in methodology and results papers, combined across trial arms unless reported (n=13)

Statistical methods for handling non-adherence to interventions

In total, 11 different statistical methods that attempt to account for non-adherence to interventions were identified (table 3). The most common were IV approaches (n=9, 35%), including observed adherence as a covariate within a regression model (n=3, 12%), and modelling adherence as a time-varying covariate in a time-to-event analysis (n=3, 12%). Other methods included rank preserving structural failure time models and G-estimation (n=2, 8%), inverse-probability-of-treatment weighting (n=2, 8%) and the tipping point approach (n=2, 8%, both in the same methodology paper). The other five techniques identified were all reported once. Further details of the methods reported more than once are provided in table 4 and online supplemental table A2. The techniques identified in the 17 protocols and results papers were more commonly specified as sensitivity analyses (n=13, 76%) than primary analyses (n=3, 18%) (one was unclear).

Table 3

Statistical methods that were identified as attempting to account for non-adherence to interventions

Table 4

Details of the statistical methods reported on more than one occasion

Advantages and disadvantages of the statistical methods

The advantages and disadvantages of the methods identified (as stated by the authors) are given in table 3. Advantages or disadvantages of the techniques used were stated in 8 (33%) of the 24 papers included; 6 were methodological papers and 2 were results papers. No advantages or disadvantages were stated for 5 of the 11 methods identified.

Impact of the statistical methods on non-inferiority conclusions

Twelve of the 13 results papers (92%) also included an alternative analysis of the same outcome (online supplemental table A3). All 12 performed an ITT or mITT analysis. In addition, some reported results from PP (n=6, 50%) or as-treated (AT; n=2, 17%) analyses. Non-inferiority conclusions from the alternate analyses were in agreement with those from the methods of interest on six occasions and could not be compared on six occasions (due to different measures of effect or the results not being provided in full). Five of the six analyses where the different methods were in agreement concluded non-inferiority of the experimental intervention versus the comparator. The remaining trial provided mixed findings regarding non-inferiority across the two different countries included, though the interpretation of this study appeared inconsistent with its design (a CI approach to determining non-inferiority was stated in the methods but not used).

Statistical analysis plans

Statistical analysis plans were requested for all 17 non-inferiority trials where the protocol or results paper was included in the review, and obtained for nine of these trials.


To the best of our knowledge, this is the first systematic review undertaken to both identify statistical methods that adjust for the impact of non-adherence to interventions in randomised non-inferiority trials and also identify the frequency and consequences of their use. We found that few papers reported such methods (less than 2% of those reaching full-text review). This may be partly due to unfamiliarity with such techniques among trialists and statisticians as a result of the long lead time for statistical methodology to make its way into routine practice. The most common techniques identified were IV approaches, including observed adherence as a covariate within a regression model, and modelling adherence as a time-varying covariate in a time-to-event analysis. Overall, the number of trials implementing relevant statistical methods was too small to draw firm inferences about their impacts on non-inferiority conclusions. In six analyses where the results from methods of interest could be compared directly with those from an alternative analysis, conclusions regarding non-inferiority were consistent across the different approaches.

Almost half of the methods identified focus on estimating CACE (also known as the local average treatment effect (LATE)). This is the average effect of the experimental intervention within the subpopulation of compliers.25 We argue that this is the natural estimation focus when attempting to account for non-adherence to interventions in the context of non-inferiority trials. This is because we want to be confident that there is non-inferiority among those who would comply with either intervention. By contrast, including participants who would not fully adhere to both interventions may bias estimation towards non-inferiority (in a similar way that, in the context of non-inferiority, ITT analyses may be biased towards non-inferiority under non-adherence). For similar reasons, we believe that the CACE is preferable to the population average treatment effect (ATE). Lastly, we note that when adjusting for observed adherence within a regression model or modelling adherence as a time-varying covariate in a time-to-event analysis, the target estimand is unclear.

The infrequent use of statistical methods for handling non-adherence seen in the current review has also been observed more generally in randomised controlled trials (RCTs). A review of 100 RCTs randomly selected from those published in 4 high-impact journals during 2008 found only 1 that attempted to account for non-adherence to interventions using a causal inference framework (in which inverse-probability-of-censoring weighting was applied).6 More recently, Mostazir et al conducted a review of statistical approaches for handling non-adherence to interventions in RCTs published between 1991 and 2015, which identified 88 analyses incorporating 9 different methods.28 IV methods were among the most common and accounted for almost one in four applications of suitable techniques. However, some of the other methods identified (including CACE analyses using maximum-likelihood estimation and adjusted treatment received models) were not captured in the current review focusing on non-inferiority trials. Similarly, we did not identify all 12 approaches included in a recent review of methodological papers containing statistical techniques for handling non-adherence to interventions in the context of time-to-event outcomes.29 This suggests that other relevant methods are available but either they are not suitable for comparing active interventions, as is often required in non-inferiority trials, or they may not have been applied within these studies. The three aforementioned reviews did not focus specifically on non-inferiority trials.

It is perhaps not surprising that IV approaches were the most common method identified in the current review, given that their assumptions are well suited to many double-blind trials, they can be applied across a range of trial designs, and they are relatively simple to implement in standard statistical software.30 IV methods use randomisation as the instrument in order to account for unmeasured confounders of the outcome and intervention received (ie, adherence). Their main assumptions are: (1) randomisation affects the outcome only through its influence on the intervention received (the exclusion restriction), (2) randomisation does not share common causes with the outcome (the exchangeability assumption), (3) randomisation causes some participants to receive their assigned intervention (the relevance assumption) and, in order to estimate CACE, (4) there are no participants who would always receive the opposite of their random allocation (the monotonicity assumption).23 31 In individually randomised trials, the exclusion restriction and monotonicity assumptions are typically satisfied by effective double blinding and/or use of objective outcomes, and the exchangeability assumption is usually valid since randomisation is expected to produce trial arms that are balanced with respect to prognostic factors.

When these assumptions hold, it is relatively straightforward to show that if we regress the intervention received (ie, adherence) on randomisation, and then use this model to predict each participant’s adherence, these predictions are orthogonal (independent) of all adherence–outcome confounders. Therefore, if in a second step we regress the outcome on these predictions, we get an unconfounded estimate of the effect of adherence on outcome. It follows that, in contrast to techniques that involve inverse-probability weighting, when the above four IV assumptions hold, IV methods enable us to estimate CACE even in the presence of unmeasured confounding (although inclusion of measured confounders can improve precision).32 33 While IV methods may thus appear a panacea, as usual in statistics, there are no free lunches: a lack of precision and statistical power is often a challenge with IV techniques and methods used to adjust for non-adherence more generally.5 30 34 35

The two-stage least-squares (2SLS) regression approach sketched in the previous paragraph can be applied when the intervention is not all or nothing. Suppose that a non-inferiority trial is conducted to assess whether prescribing one dose of a medication per week is non-inferior to prescribing two doses per week over the course of 4 weeks. For each participant, the monotonicity assumption requires that the potential number of doses taken would be lower if the participant was randomly assigned to receive one dose per week than if they were randomised to receive two doses per week. Assuming there are no covariates and the monotonicity assumption holds, it can be shown that the 2SLS estimator converges toward a weighted average of the causal effects of one unit increases in the intervention among compliers (individuals whose intervention intensity is affected by randomisation (the instrument)).36 37 This is because the implicit effect of the 2SLS analysis is that values of the outcome at which there are more compliers get given greater weight.

A limitation of IV methods is that when interventions are administered at multiple timepoints, standard approaches are susceptible to time-varying confounding and selection bias.21 These biases occur when previous values of a covariate predict the current intervention received and the current value of the covariate predicts outcome.38 If the time-varying confounders are themselves affected by previous intervention received, so-called G-methods, such as inverse-probability weighting or G-estimation, are required to allow for the feedback loop occurring between the intervention received and confounders over time.21 24 39 G-methods were seldom reported in the current review, perhaps because they can be more complex to implement than alternative approaches and also rely on assumptions which may be vulnerable to violations. When considering whether to apply an IV approach or a G-method, statisticians might consider whether the exclusion restriction and monotonicity assumptions are realistic given the context of the trial, and whether randomisation is a sufficiently strong instrument. Where outcomes are collected at multiple timepoints, inverse-probability weighting may be a more attractive approach if data on potential confounders are also collected throughout follow-up.

In order to estimate the effect of the experimental intervention in the absence of (full) protection by randomisation, additional assumptions must be made. Most of these are, by their nature, inherently untestable. Each of the statistical methods identified in the current review make slightly different assumptions in order to estimate the effects of interventions under full adherence, and hence each has a different method of estimation; both assumptions and estimation methods have associated advantages and disadvantages. Crucially, all of the methods require reliable information regarding adherence to the randomly assigned interventions, which is often challenging to measure, particularly for long-term therapies.40

Despite these limitations, it is our view that the methods identified have an important role in non-inferiority trials with non-adherence to interventions and should be applied as sensitivity analyses alongside other techniques (such as ITT, PP and AT analyses). Given that agreement between ITT and PP analyses cannot guarantee unbiased conclusions in these studies, those with non-trivial non-adherence should assess the sensitivity of trial results to different assumptions in order to guard against falsely claiming non-inferiority and accepting a worse intervention. Careful consideration needs to be given to the assumptions that are most plausible given the trial context and planned design, before selecting the appropriate statistical method to adjust for potential non-adherence. Relevant data needed to implement the chosen technique should then be collected as fully as possible. Clearly, the best approach to reducing the potential biases introduced by non-adherence to the interventions is to design trials that minimise such non-adherence. Future work should compare the performance of the methods identified under different non-adherence scenarios in non-inferiority trials to facilitate understanding of when they might be applied appropriately.

Strengths and limitations

This is the first systematic review (protocol published on PROSPERO) to identify statistical methods that attempt to account for the impact of non-adherence to interventions in randomised non-inferiority trials, quantify the use of such methods in these studies and examine their impact on non-inferiority conclusions. The review included publications from any year, journal or disease area and involved two authors agreeing the eligibility of each paper identified in the search. However, it has some limitations. First, the search for eligible papers had to be restricted to those containing terms related to adherence or statistical methods for handling non-adherence in the titles, abstracts and keywords. Publications applying suitable methods in sensitivity analyses may not have referred to the techniques within these fields and, if so, would not have been captured by the search. A wide range of search terms were used to try and mitigate this problem and, therefore, a large number of papers were reviewed. Second, only one database was searched and papers not published in English were excluded meaning it is possible that some eligible papers may not have been captured by the search. However, most major non-inferiority trials are likely to be published in English within one of the MEDLINE journals and, therefore, should have been captured. Third, one author performed the data extraction from eligible publications, though other authors were consulted where necessary. Finally, while statistical analysis plans were requested, these could not be obtained for eight of the trials included. For these studies, we cannot be sure that the details provided in the publications reviewed are accurate accounts of the planned analyses.


In non-inferiority trials with non-adherence to interventions, ITT and PP analyses are often performed but may result in biased estimates of efficacy and, therefore, agreement between these approaches does not guarantee that conclusions regarding non-inferiority are unbiased. Statistical methods that attempt to account for the impact of non-adherence and thereby estimate the causal effects of interventions are available, but their use in non-inferiority trials remains extremely infrequent. It is our view that the methods identified should be applied more widely within sensitivity analyses of non-inferiority trials. In particular, those with non-trivial non-adherence should assess the sensitivity of trial results to different assumptions in order to guard against falsely claiming non-inferiority and accepting a worse intervention.

Data availability statement

No data are available. No additional data are available.

Ethics statements

Patient consent for publication

Ethics approval

Ethical approval was not required because this review used publicly accessible documents.

Leave a Reply