Potential impact on cost-effectiveness estimates of using immature survival data: a case study based on transcatheter edge-to-edge repair (TEER) used for patients with severe mitral regurgitation at high surgical risk

Introduction

The EVEREST II High Surgical Risk (HSR) Study investigated the effectiveness of transcatheter edge-to-edge repair (TEER) of mitral valve malfunction with the MitraClip system (Abbott Vascular)+medical management (MM) in patients judged at high surgical risk (HSR). The outcome of primary interest was mortality rate. Study results were first reported by Whitlow et al,1 after 12 months of follow-up. Early studies on the MitraClip system had no comparator group; however, although the Food and Drug Administration (FDA)-approved EVEREST II HSR Study protocol described it as a single-arm investigation, Whitlow et al reported the follow-up of 36 ‘control’ patients who were screened for TEER eligibility but were eventually contraindicated to TEER. Since alternative treatment was unavailable, these patients received MM only and were presented as a comparator group to TEER patients. It is difficult to identify a satisfactory comparator in this small HSR population and this was undertaken after results for the TEER group were known. Selection of controls was based on clinically judged demographic similarity with the TEER group; the number of screened contraindicated patients not selected as control was reported as 22 (38% exclusion).

To recover development costs and expand adoption by decision-makers, it is desirable for a manufacturer to demonstrate cost-effectiveness of an expensive device such as MitraClip at an early stage following marketing approval.

The EVEREST II HSR Study was included in the manufacturer submission to support the approval and/or reimbursement of the device by several regulatory agencies, including the FDA.2 Despite its ‘quasi-comparative nature’, the EVEREST II HSR Study was also used as a main source of evidence to support the cost-effectiveness of TEER in HSR patients, especially those affected by primary mitral regurgitation (MR).3 More broadly, several industry-sponsored cost-effectiveness estimates of the MitraClip intervention4–6 have been conducted each based on the short-term follow-up 1-year study results from EVEREST II HSR,1 and with various caveats, authors have described the intervention cost-effective. With such short-term follow-up, it is unlikely for the upfront one-off device cost to be sufficiently offset so as to yield a cost-effective estimate within reimbursement norms (eg, cost/life-year gained) unless a survival advantage for the device over the comparator can be shown. Survival and cost-effectiveness estimates are generally best served with the use of mature survival data preferably from a large data set in a randomised controlled trial (RCT). Five-year follow-up of the EVEREST II HSR Study TEER recipients was published by Kar et al.7 This follow-up duration appears sufficiently extended relative to 12-month follow-up results that it seemed reasonable to assess the potential impact of more mature survival data on the published survival in cost-effectiveness studies.

Methods

Overall methods

Our focus was on survival modelling employed in economic evaluations of TEER published as original papers that aimed to evaluate the cost-effectiveness of TEER using results from EVEREST II HSR as main source of clinical effectiveness evidence. The aim was to review this survival modelling in the light of more mature 5-year mortality data. For this purpose, we conducted a non-systematic literature search using PubMed electronic database, retrieving articles on economic evaluations of TEER published from 2012 (date of publication of the paper by Whitlow et al1) to September 2020.

Three articles (Mealing et al,6 Cameron et al4 and Guerin et al5) met our eligibility criteria. We were able to confirm that no articles were omitted from our search based on a systematic review by Rezapour et al,8 which became publicly available in December 2020.

In the light of the 5-year follow-up data reported by Kar et al,7 we re-examined survival modelling in the three cost-effectiveness studies that used the 12-month follow-up results from the EVEREST II HRS Study of Whitlow et al1; these cost-effectiveness studies were all undertaken before Kar et al’s 5-year follow-up study was available.

The EVEREST II HSR Study consisted of a cohort of 78 patients who had TEER (77 recruited in the USA and 1 in Canada). Whitlow et al1 also reported the results of a control group of ‘concurrently screened’ patients (N=36), which was identified retrospectively after the 12-month results for the TEER arm (N=78). This control group was used to model survival in the three cost-effectiveness studies.

Most patients were older than 75 years (mean age 76.7±9.8 years and 77.2±13.0 years for TEER and control groups, respectively).

At 12 months, Whitlow et al1 reported a gain in overall survival for MitraClip group over the retrospective control group (75.4% vs 55.3%). Transitions between MR grades (baseline–12 months) were also reported.

Published survival plots from EVEREST II HSR at 1 year1 and 5 years7 were digitised using Digitizelt software. Parametric models were developed in Stata V.15 software (Statacorp, College Station, Texas, USA.) using the streg command, and the stgenreg9 and stpm210 packages. The method of Guyot et al11 was used to reconstruct individual patient data (IPD). Restricted mean survival was estimated in Stata using the strmst2 package of Cronin et al.12

Results are presented graphically and in narrative description.

The Guerin et al5 paper did not report several inputs for their economic model including the baseline distribution of patients between health states, the mortality rates and transitions between MR grades. Guerin et al referenced source data for these inputs as Whitlow et al1 and Cioffi et al.13

The relevant data from these publications are summarised in online supplemental data 1.

Supplemental material

Patient and public involvement

No patient involved.

Results

Survival based on 12-month and on updated clinical data

Figure 1A depicts the reconstructed 12-month Kaplan-Meier survival plots and associated uncertainty for each group in EVEREST II HSR. With such small groups, 95% CIs are wide and any modelled extrapolation beyond the observed survival will be associated with very substantial uncertainty.

Figure 1
Figure 1

Reconstructed Kaplan-Meier plots and 95% CI (EVEREST II High Surgical Risk Study) for 1-year (A) and 5-year follow-up (B). Red line=medical management group (N=36); black line=TEER+medical management group (N=78). TEER, transcatheter edge-to-edge repair.

The Kar et al7 5-year analysis of EVEREST II HSR did not report survival data for the control arm only for the TEER arm (figure 1B); surprisingly, the paper stated that in this study, there was ‘a lack of medical control group’.

The 5-year Kaplan-Meier plot for the TEER arm published by Kar et al7 exhibited two fairly distinct phases (figure 1B). Up to about 2–2.5 years, the trajectory of the Kaplan-Meier appears to continually decrease in slope, gradually tending to flatten out. In contrast thereafter, the trajectory tends to be steeper and the flattening tendency tends to cease. At 24 months, 59% of the patients remained at risk; with such a small sample, the downturn in trajectory may be anomalously due to small numbers remaining for analysis. However, other studies (figure 2) of post-TEER survival14–16 in various populations exhibit a similar pattern with gradually decreasing trajectory slope to around 2 years followed by an increased trajectory thereafter.

Figure 2
Figure 2

Reconstructed Kaplan-Meier plots of survival in TEER recipients: Adamo et al14 (cohort, N=304); Velu et al16 (cohort, N=326); Mack et al15 (COAPT RCT, N=302). Horizontal lines indicate survival at change in trajectory. RCT, randomised controlled trial; TEER, transcatheter edge-to-edge repair.

Conventional parametric models (eg, Weibull) based on EVEREST II HSR 5-year TEER data produced poor fit because of the trajectory change in the observed data (online supplemental data 2).

Survival modelling in cost-effectiveness studies

Unusually, and contrary to common practice, none of the three cost-effectiveness studies provided graphical representation of their final survival models, and only described the procedures used to obtain them. Both Mealing et al6 and Cameron et al4 studies used Weibull models based on short-term survival data from the EVEREST II HSR. Mealing et al modelled arms independently, Cameron et al assumed proportion hazards but did not report a test of this assumption. Guerin et al5 did not use parametric modelling but combined transition probabilities between MR grades (0–4 and dead) from EVEREST II HSR (TEER arm) with mortality rate data for MR grades taken from a non-randomised study (Cioffi et al13) in a different population. MR grade transitions were only reported in TEER patients in Whitlow et al,1 and for only 54 of the 78. Guerin et al assumed MR grades remained stable over 5 years in the control arm, an assumption that does not seem reasonable for an HSR group with comorbidities.

Mealing et al

Mealing et al6 provided their Weibull models for each arm over 1 year only; models were extrapolated to lifetime horizon to generate final survival models. In the TEER arm, a Weibull model generates unrealistic survival when compared with 5-year follow-up reported by Kar et al7 (online supplemental data 2). In absence of post 1-year observed data for the MM arm, it is impossible to assess if a Weibull model for the MM arm represents a true estimate, an underestimate or an overestimate of post 12-month survival; in view of the small size and unusual nature of the sample, we can infer that the model is associated with great uncertainty. After 3.5% annual discounting, the Mealing et al modelling generated 3.2 extra life-years for the TEER group relative to the MM ‘control’ group (5.1 years vs 1.9 years). This is three times greater than the 1.13 years reported by Baron et al,17 based on data from the COAPT RCT (5.05 vs 3.92, with 3% annual discounting). Relative to Baron et al,17 the TEER arm life-year gains are slightly larger from EVEREST II HSR despite greater discounting, greater age and greater comorbidity load. Relative to the COAPT, MM arm life-year gain from EVEREST II HSR is considerably lower (1.9 vs 3.92) as could be expected.

Cameron et al

Cameron et al4 generated a Weibull model for the EVEREST II HSR control arm (N=36) using 12-month follow-up results (Whitlow et al1) and provided the model parameters (online supplemental data 3). The Weibull model for the TEER arm was generated by applying an HR of 0.492 to the MM arm model (the source of the HR is unavailable in the public domain and no test of proportional hazards assumption was reported). Online supplemental data 3 compares the Cameron et al TEER arm Weibull model extrapolated to 20 years with the observed 5-year TEER survival reported by Kar et al.7 For a population mean age 76.7 years with a heavy burden of comorbidities, this TEER model generates unrealistic >10% survivors after 20 years.

Cameron et al4 adjusted the extrapolated Weibull survival models using age-adjusted and sex-adjusted Canadian general population survival data so as to ensure there were no survivors after the age of 100 years. The resulting model was not shown. After 5% annual discounting, the lifetime life-years gained were 3.93 and 2.09 for TEER and MM arms, respectively, yielding an incremental life-year gain of 1.89 years. With 3.5% discounting, the yields are approximately 4.05, 2.12 life-years gained for TEER and control arms, respectively, giving an increment of 1.93 life-years gained. The Cameron et al analysis thus generates less incremental gain than the Mealing et al analysis (3.2 vs 1.93), by approximately 66%. Relative to the incremental gain from TEER versus MM in COAPT RCT (1.13 life-years gained) reported by Baron et al,17 these are large gains but are inevitably associated with great uncertainty because of small numbers, short follow-up and questionable nature of the control arm in EVEREST II HSR.

Guerin et al

Guerin et al5 reported on the cost-effectiveness of MitraClip therapy in MR based on a ‘fictional population of 1000 patients over a 5-year period’. Differences from the Mealing et al6 and Cameron et al4 analyses were manifold including discount rate (4%), time horizon (5 years), perspective and most relevantly, the methodology used for modelling survival (see online supplemental data 1 for details). Over the 5-year time horizon and after 4% annual discounting, Guerin et al reported 3.26 life-years gained from TEER, 1.54 life-years gained from MM and an incremental gain of 1.7 life-years for TEER over MM. With 3.5% discounting, these values equate to approximately 3.28 (TEER), 1.55 (MM) and 1.73 (increment), respectively. Unsurprisingly, these estimates are less than the lifetime estimates of Mealing et al and Cameron et al. The Guerin et al 5-year MM arm fairs poorly but cannot be compared with any 5-year observed data since Kar et al7 did not report any survival for the control group. The Guerin et al estimate for the TEER arm (3.28 life-years gained) exceeds that estimated from the observed 5-year Kar et al Kaplan-Meier plot (~2.6 life-years gained with 4% discounting) by approximately 25%.

In summary, the three cost-effectiveness studies generate survival models for the TEER group of EVEREST II HSR that are optimistic when compared with the observed 5-year survival reported by Kar et al. The corresponding estimates for the control arm vary between cost-effectiveness studies, must be associated with great uncertainty because of the nature and small size of this group and because no comparison is possible between 12-month and more mature data.

In table 1, we have summarised the life-year gains obtained using the Weibull survival models from Mealing et al, Cameron et al and Guerin et al, and these were compared with the life-year gains modelled by Baron et al17 for the COAPT trial (5.05 vs 3.92 life-years, 1.13 life-years gained). With regard to the comparison between these studies, it is though important to keep in mind that these two populations have different baseline characteristics (mixed aetiologies of MR in EVEREST II HSR vs functional MR in COAPT, age and prevalence of comorbidities). Even though cost-effectiveness studies have been undertaken from different payer perspectives, the broad range of variation in estimated life-year gains from the cost-effectiveness studies is remarkable (table 1). This emphasises the influence that survival modelling methods have on estimates of life-year benefit in cost-effectiveness studies.

Table 1

Life-year gains reported in CE studies of patients receiving TEER or medical management (analyses based on EVEREST II High Surgical Risk (study mean age 77.6, 59% of secondary MR/41% of primary MR)

Survival using 5-year follow-up of EVEREST II HSR

Because of the trajectory change in the observed data (figure 1B), conventional parametric models (online supplemental data 2) generate poor fit to the 5-year follow-up survival of the TEER group reported by Kar et al.7 Only the bathtub model generated reasonable fit and realistic survival in extrapolation with almost no survivors after 16 years.

Piecewise modelling from ~24 months and flexible parametric models seem more appropriate since they can accommodate the trajectory change in survival. These options generated a combination of better visual fit and more plausible proportions of survivors in extrapolation beyond the 5-year observation (figure 3). It should be emphasised that all models are associated with very substantial uncertainty.

Figure 3
Figure 3

Weibull models of survival to 20 years based on 1-year and 5-year follow-up of the TEER arm of EVEREST II High Surgical Risk; comparison with piecewise and flexible parametric models. TEER, transcatheter edge-to-edge repair.

After 3.0% annual discounting, the flexible model delivers 4.57 life-years gained; this compares with Baron et al’s17 estimate of 5.05 life-years gained in the TEER arm of COAPT (figure 4 and online supplemental data 4). The flexible model output may appear optimistic for an older population with greater load of morbidities. Piecewise models (figure 3) generate less life-year gains.

Figure 4
Figure 4

Comparison of the flexible parametric model of survival after TEER in EVEREST II HSR with the Baron et al17 modelled survival after TEER in the COAPT RCT. Note the populations in the studies differ considerably, the EVEREST population being older and bearing a heavier burden of comorbidities. HSR, High Surgical Risk; KM, Kaplan-Meier; RCT, randomised controlled trial; TEER, transcatheter edge-to-edge repair.

Discussion

Manufacturers of expensive devices or pharmaceuticals desire early uptake so as to recoup development costs of their intervention and, in the case of devices, to enable the pursuit of improvement and testing of design. A regrettable corollary is that in reality, premature cost-effectiveness analyses may be undertaken using immature trial data obtained in small non-randomised populations, usually sponsored by industry who holds guardianship of trial data. Results from such studies should be viewed with caution, and perhaps should support only temporary reimbursement decisions. In such circumstances, good practice would encompass regular updating of the cost-effectiveness estimates when and as the underlying trial results gradually mature. The undesirable increasing use of immature studies in submission to decision-making bodies has been emphasised by various authors.18 19 With regard to drugs for cancer, authors18 have remarked that while ‘regulatory authorities continue to approve cancer drugs that have uncertain benefits at the time of licensing, it increasingly becomes the responsibility of every national HTA [health technology assessment] agency to ensure that the post-marketing survival benefits of these drugs are closely monitored. Indeed, willingness to address these uncertainties must be a priority for all parties involved in HTA.’ Submissions based on short follow-up single-arm studies have occurred (eg, GARNET—NCT02715284,20 dostarlimab for previously treated endometrial cancer).

The upfront cost of the MitraClip device varies by jurisdiction (eg, €21 100 in France, $36 625 in the USA, $C30 000 in Canada and £16 500 in the UK). One hundred thousand MitraClip TEER implantations worldwide were reached in late September 2019. At a cost of US$36 600/implantation, payers will have expended approximately US$3.6 billion up to 2019. Since publication of the COAPT trial in September 2018, which reported a spectacular benefit for TEER in patients with secondary MR,21 adoption of TEER with the MitraClip device will have steadily grown.

Early adoption of TEER with the MitraClip device was likely supported by the favourable cost-effectiveness analyses published in the years 2013–2016 and discussed in our paper. We do not think the estimates in these studies are likely to be reliable being based on 12-month results from the EVEREST II HSR Study in which a ‘control group’ of 36 individuals were retrospectively selected from ‘concurrently screened’ patients (Whitlow et al1). Resource use and quality of life outcomes for the control group beyond 1 year are undocumented and the uncertainty inherent in the way survival modelling was undertaken is substantial. As far as we can ascertain, there is no guideline-supported treatment option for such a high-risk population, the accepted treatment for degenerative severe MR being CT surgery for which these patients were ineligible and the selection of a valid control group is problematic. These considerations challenge the likely reliability of the output from the cost-effectiveness analyses.

It is unfortunate that no further follow-up data beyond 12 months were published for the EVEREST II HSR ‘control group’ even though follow-up of the MitraClip arm at 3 years and at 5 years was reported by Cameron et al4 (figure 2) and documented by Kar et al,7 respectively. It is also remarkable that in the 5-year study of Kar et al,7 the authors stated there was an absence of an MM group in EVEREST II HSR. Whether the MM arm survival beyond 12 months has ever been recorded or is as yet unpublished is unclear. The protocol for the study (identifier: NCT01940120) described it as a single-arm study.

For many cost-effectiveness studies, the modelling of survival is often a pre-eminently important effectiveness input. All other things being equal, the more mature the observed survival data, the more likely it is for the cost-effectiveness estimate to be reliable. For the MitraClip arm of the EVEREST II HSR Study, the comparison of 5-year observed survival with cost-effectiveness short-term modelling of survival indicates that the modelling of immature data undertaken in these cost-effectiveness studies4–6 is not ideal and may not be the most appropriate approach. The accompanying absence of data beyond 12 months for the MM arm cannot be remedied and renders the results untestable in the context of mature data. We agree with Cameron et al, when they state ‘follow-up of MitraClip patients in the EVEREST II HSR is ongoing and will provide more certainty regarding the long-term clinical benefits of MitraClip therapy’ but are not encouraged by the fact that such studies remain to be undertaken and no published data exist for the MM arm beyond 12 months.

The present work was undertaken using studies based on the EVEREST II HSR Study which has enrolled patients at high surgical risk and with mixed aetiologies of MR (41% of patients had degenerative MR/59% of patients had functional MR). Despite the non-exclusive nature of MR aetiology, the study was included in the submission by the company to support approval and/or reimbursement of MitraClip in patients with degenerative MR deemed at prohibitive surgical risk. Other clinical effectiveness evidence submitted to regulatory agencies comprised the results of the EVEREST II RCT,22 which compared TEER with surgery, as well as the REALISM High Risk Study,2 another single-arm study evaluating the safety and effectiveness of the MitraClip in HSR patients. This means no RCT has been conducted to support the effectiveness of the MitraClip in patients with degenerative MR at prohibitive surgical risk. This modest requirement on clinical effectiveness evidence to grant approval for the MitraClip in 2013 contrasts with that for transcatheter aortic valve replacement (TAVR) in 2011 which relied on the partner RCT comparing TAVR with standard therapy in inoperable patients.23 Following the EVEREST II clinical programme, which has contributed to adoption of TEER in primary/degenerative MR, the success story of adoption of the MitraClip system has continued this time in patients with secondary/functional MR. For this population, evidence supportive of clinical effectiveness has mainly come from the COAPT RCT21 published in 2018 with 2 years of follow-up. As for the EVEREST II HSR Study, we have conducted a similar work that equally emphasises the importance of updating cost-effectiveness estimates using more mature survival data.24

Our study presents some limitations. The main one is that we have focused mainly on survival analysis and did not consider the uncertainty associated with the resource costs used in the three cost-effectiveness studies. However, in the present case, we assume that the main factor driving the cost-effectiveness is the incremental survival effectiveness, which has got to be sufficiently high to offset the substantial incremental cost generated from the use of an expensive health technology compared with standard of care. Another limitation is that we used reconstructed IPD rather than the patient data from the EVEREST II HSR. However, due to the high quality of the plots available from Kar et al,7 and the use of robust methods, we believe our Kaplan-Meier survival plots are very closely similar to those published. Last, as previously emphasised, our analyses on the control group rely on a cohort of 36 patients with no follow-up beyond 12 months.

Our work has important implications as we broadly emphasised that the availability of updated survival analysis is likely to have some impact on decision-making as part of HTA, which advocates for a more continuous approach in assessing health technologies as mature clinical data become released. This could be conducted by companies/manufacturers upon request by HTA bodies before the renewal of reimbursement.

This post was originally published on https://bmjopen.bmj.com