Availability of evidence and comparative effectiveness for surgical versus drug interventions: an overview of systematic reviews and meta-analyses


Many diseases are treated or managed with surgery. Some of them may also be addressed by pharmaceutical interventions and studying the effectiveness of these different interventions is important in optimising shared decision-making for patients and physicians. However, the amount and certainty of the evidence we hold in healthcare is limited,1 and this situation is likely worse for surgical interventions due to serious challenges in running placebo-controlled or comparative effectiveness trials.2 Challenges to controlled trials include unique patient anatomy, operator-dependent variables such as the skill or experience of the surgeon,3–5 and the difficulty of successful blinding.6 Due to these challenges, randomised controlled trials (RCTs) in surgery are less common than in non-surgical medical specialties. Although there have been calls to strengthen the quality of the evidence in surgery,2 7 8 these have resulted in relatively few RCTs assessing surgical interventions, particularly in comparison to medical treatments.

A summary of the existing body, mapping the gaps of evidence on surgical versus medical interventions across diseases, does not exist in the literature. A synthesis of this existing body of evidence is important to guide evidence-based care and inform decisions in the clinic where surgery and medical management are both reasonable options. We hypothesised that there may be a dearth of randomised evidence comparing surgery versus drugs and that even in topics where such RCTs exist, the evidence provided by them might be weak. To find RCTs comparing surgical versus pharmaceutical interventions, we conducted an umbrella review (an overview of systematic reviews)9 10 by searching the Cochrane Database of Systematic Reviews for reviews considering comparisons of surgery to drugs. We aimed to examine the prevalence of intended comparisons of surgery to drug regimens, how often such comparisons had any RCTs, and, whenever RCTs were available, what was the strength of evidence of such comparisons, and whether surgery or the drug intervention was favoured.

Materials and methods

This systematic review of systematic reviews (umbrella review) was structured based on the guidance provided by Belbasis et al10 (for more information on reviews of reviews, see also Cochrane Handbook Chapter V: Overviews of Reviews11). For reporting, we adapted the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines12 and the checklists are found as supplements. The protocol for the data collection and analysis was preregistered on the Open Science Framework website,13 together with the raw data and code.

Search strategy and selection criteria

We queried the Cochrane Database of Systematic Reviews using the term “surg*” in “Title/Abstract/Keywords” (“surg*(ti;ab;kw)”) on 25 April 2022. Inclusion criteria for reviews were the search of RCTs comparing a surgical to a drug intervention.

A surgical intervention was defined as a procedural technique aiming to change anatomy to treat or alleviate a pathology or symptom (including dermatological excisions). We excluded endoscopic and endovascular procedures since many of them are performed by medical rather than surgical specialists. A drug intervention was defined as a treatment that used a non-supplement and non-vitamin, pharmaceutical agent. Dental procedures, radiation treatment and comparisons of surgery versus no treatment or only placebo were excluded from our study. Cochrane reviews that intended to compare surgical and pharmaceutical interventions were considered even in cases where the review was unsuccessful in finding any such comparisons.

As many surgical procedures also require drug regimens (eg, preoperatively or as background treatment), we allowed comparisons where the surgical arm including a drug intervention was compared with a drug intervention as well. Comparisons of surgery to surgery plus drugs were not eligible, as both arms used surgery.

The articles’ abstracts were reviewed by EAZ and JV who coded the reviews independently for eligibility (include, exclude and unsure) first and then sought to reach a consensus among the reviews coded as unsure by either reviewer. If either reviewer included the review, it was included directly. The remaining differences were mediated by JPI, and a final check of all included studies was performed by JPI, EAZ and JV.

Main outcomes

The main outcome assessed was the percentage of Cochrane systematic reviews that found eligible RCTs comparing head-to-head surgical and pharmacological interventions among all the reviews aiming to look for such studies. The strength of evidence of the existing comparison was also treated as a main outcome, as were the direction of effects in the review assessments, both in the original Cochrane analysis and our standardised reanalysis.

Data extraction

EAZ extracted data for the included systematic reviews. The included systematic reviews were further classified into their corresponding surgical specialty field: cardiac surgery, dermatology, general surgery, neurosurgery, obstetrics and gynaecology, ophthalmology, orthopaedic surgery, otolaryngology, plastic surgery, thoracic surgery, urology and vascular surgery.

Whenever data were available from at least one RCT comparing a surgical to a drug arm, we identified the primary outcome(s) of the systematic review for the eligible comparison(s) by examining the methods section of the systematic review, and classified it as either mortality, composite or non-mortality. Data, in the form of contingency tables or means, SD and number of participants in each arms, from individual RCTs were then collected from Cochrane eligible reviews. We also collected available Grading of Recommendations, Assessment, Development, and Evaluations assessments (GRADE)14 for the eligible comparisons and outcomes and the summary effect size as well as the 95% CI of the effect for the eligible comparison outcomes. Reviews that found no RCT of drugs to surgery were tabulated as having no data.


As Cochrane reviewers may have used different statistical models in each topic to combine the results of RCTs in meta-analyses, we aimed for standardisation. To achieve it, we recalculated the summary effect size and heterogeneity for each topic using a random effects model following the Hartung-Knapp-Sidik-Jonkman approach15 16 so that all outcomes/topics would be analysed with the same statistical methods. The modified Haldane-Anscombe continuity correction was used, that is, when studies had no event in either the surgical or the drug arm we added 0.5 to the entire contingency table of the specific study.17

The analysis of the data was performed using R V.4.1.3 (10 March 2022),18 with the assessment of statistical significance using a threshold for α of 0.005, as previously proposed.19 The Wilson approach was used for CIs (99.5%) created for the primary outcomes.

Additions to the protocol

The original preregistered protocol can be found at www.doi.org/10.17605/OSF.IO/3QVW9.

Some additions were made during the process of conducting this umbrella review. For each review, we noted the search date of the reviews to understand how old they may be. We assessed inter-rater reliability using Cohen’s κ. We also probed for hints of bias by using the test of excess significance for each topic with two or more RCTs (and for the composite of observed and expected statistical significant results across all topics),20 and small-study effects Egger’s regression for meta-analyses with three or more RCTs.21

For each RCT in the included reviews, we extracted their year of publication to capture how recent the evidence was. Then, we extracted the specialty orientation of the journal, in which the RCT was published, using the categories ‘mostly surgical’, ‘general’ and ‘mostly non-surgical’. The category ‘mostly surgical’ includes those journals that have ‘surgery’ in their title, those that have the name of a surgical specialty in their title and those affiliated with a surgical society. The category ‘general’ pertains to journals that cover all of medicine and its specialties, surgical and non-surgical. The category ‘mostly non-surgical’ includes all the remaining journals. We assessed whether the direction of effects (favouring surgery or favouring drug) was associated with the type of journal, hypothesising that RCTs published in mostly surgical journals may be more likely than other journals to favour surgery. We also examined whether the eligible RCTs that were included in the systematic reviews might have any overlap between different reviews. Finally, we extracted information on risk of bias assessments of the eligible RCTs, as these assessments had been performed in the Cochrane systematic reviews that had included the RCTs.

Patient and public involvement

No patients were involved in the design and conduct of this umbrella review.


Search results

The selection flow chart for Cochrane systematic reviews is represented in figure 1. The search strategy retrieved 2495 articles from the Cochrane Database of Systematic Reviews. Among them, 440 were excluded by an automated search for withdrawn reviews and of studies with no mention of the word surgery and any of its variations in the abstract. Further manual assessment of titles and abstracts in duplicate resulted in 223 Cochrane reviews being potentially eligible. The inter-rater reliability was fair with a κ of 0.36 and 90% agreement on exclusion. All reviewer differences were in the articles classified as ‘unsure’ by either reviewer.

Figure 1
Figure 1

PRISMA study selection flow chart. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses. *filtered for [surg*] in the abstract and removed withdrawn publications

On full-text evaluation, 35 were excluded: in 5 reviews, the surgical and drug treatments were not in separate arms and hence they were not an eligible head-to-head comparison22–26; in 7 reviews, there was no surgical intervention arm27–33; in 17 reviews, there was no drug intervention34–39 39–49); 2 reviews were excluded for evaluating an endoscopic intervention50 51; 3 reviews were excluded for evaluating an endovascular intervention52–54; and finally, 1 review was excluded for being an umbrella review.55

Therefore, 188 Cochrane reviews were found to meet the inclusion criteria (online supplemental file 1). Of those, 147 Cochrane reviews aimed to investigate surgical versus drug interventions but were unable to find any RCTs meeting their selection criteria. The remaining 41 reviews contained data for at least one RCT in at least one head-to-head comparison of a surgical versus a drug intervention arm (22% (99.5% CI 14% to 31%)).

Supplemental material

The 188 reviews covered all major surgical specialties (online supplemental table 1), with the most commonly represented specialties being general surgery (n=35), obstetrics and gynaecology (n=31), ophthalmology (n=25), orthopaedic surgery (n=23) and otolaryngology (n=23). When examining whether any specialty had compared surgery to drugs more than others, no significant difference was found (Fisher’s exact p=0.62).

Eligible RCTs for surgery versus drug comparisons

The 41 eligible reviews with data included 103 comparisons of surgery versus drug treatments with data on various primary outcomes (table 1), and they included data from a total of 165 RCTs with a total of 295 primary outcome assessments. For the 165 trials, the median publication year was 2005 and the IQR was 1994–2016. The median search date year of the eligible reviews was 2016 (IQR 2010–2022). 19 of the 165 trials were part of two different Cochrane reviews. 14 of these 19 trials also overlapped in terms of addressing the same outcome and treatment arms. The overlapping studies comprised >50% of the included RCTs in 2 of 103 meta-analyses.

Table 1

Eligible comparisons of surgical versus medical interventions

Risk of bias in eligible RCTs

Risk of bias assessments of the 165 eligible RCTs by the authors of the original Cochrane systematic reviews did not always include the same elements. Specifically, for the generation of the randomisation sequence, information had been extracted in 141 trials and of those 6 (4%) were deemed to be at high risk of bias, 42 (30%) were unclear and 93 (66%) were at low risk of bias. The respective numbers were 9 (6%) high risk, 63 (39%) unclear and 89 (55%) low risk among 161 RCTs extracted for risk of allocation bias; 101 (73%) high risk, 29 (21%) unclear and 9 (6%) low risk among 139 RCTs extracted for performance bias; 47 (34%) high risk, 71 (51%) unclear and 21 (15%) low risk among 139 RCTs extracted for detection bias; 20 (16%) high risk, 15 (12%) unclear and 90 (72%) low risk among 125 RCTs extracted for attrition bias; 17 (12%) high risk, 56 (41%) unclear and 64 (47%) low risk among 137 RCTs extracted for reporting bias, and 17 (13%) high risk, 29 (23%) unclear and 80 (64%) low risk among 126 extracted for other risk of bias.

Comparative effectiveness of surgery versus drugs

Based on the 95% CI of the summary estimate obtained by the Cochrane review authors, surgery was more effective in 36 of the 103 outcomes of various comparisons (35% (99.5% CI 23% to 49%)), and drugs were more effective in 15 (15% (99.5% CI 6% to 26%)). Fifty-two (50% (99.5 CI% 37% to 64%)) outcomes were inconclusive. The respective numbers were 1/12 (8%), 1/12 (8%) and 10/12 (83%) for mortality outcomes; 3/11 (27%), 3/11 (27%) and 5/11 (46%) for composite outcomes; and 32/80 (40%), 11/80 (14%) and 37/80 (46%) for non-mortality outcomes.

When we standardised the meta-analyses to use the same random effects method for all analyses, surgery was favoured in 28/103 outcomes (32%), drugs were favoured in 9/103 (10%) outcomes and 66/103 (58%) outcomes were inconclusive. The respective numbers were 1/12 (8%), 0/12 (0%) and 11/12 (92%) for mortality outcomes; 3/11 (18%), 2/11 (27%) and 6/11 (55%) for composite outcomes and 24/80 (30%) 7/80 (9%) and 49/80 (61%) for non-mortality outcomes.

Table 2 shows the topics for which the surgical intervention was found to be more effective and table 3 shows those where the drug arm was found to be more effective, all according to the Cochrane authors’ analysis. Online supplemental table 2 does the same for the topics for which the comparisons were inconclusive.

Table 2

Comparisons where the surgical treatment was superior to the drug treatment

Table 3

Comparisons where the drug treatment was superior to the surgical treatment

Tests of bias and heterogeneity

Of the 103 comparisons, only 31 had ≥3 studies to be able to run an Egger regression for small study effects and only 5 had at least 10 studies to allow a meaningful application of this regression test. 3/5 with 10 or more studies had a small study effects signal suggestive of potential publication bias (p<0.05); all 3 compared surgical to pharmacological methods of abortion. The test of excess significance applied to all outcomes with ≥2 studies gave signals of potential bias in 16/53 outcomes (245 individual study outcomes) and across all outcomes the expected number of statistically significant results was 74 vs an observed 84 across 245 study outcomes (p=0.27). Among the 50 topics with 2 or more studies, the median of I2 was 43% (IQR 0%–80%).

Strength of evidence according to GRADE

GRADE assessment of the strength of the evidence showed high rating for 4 outcomes (4%), moderate for 22 (21%), low for 27 (26%) and very low for 33 (32%). No GRADE assessment was performed for 17 (17%) outcomes.

According to GRADE assessments, only cardiac surgery, obstetrics and gynaecology and general surgery interventions had high GRADE ratings. Otolaryngology and dermatology had many moderate ratings. Almost all other GRADE ratings were low or very low (table 4).

Table 4

GRADE assessment across specialties

Of the four outcomes with high GRADE rating, sphincterotomy for anal fissure showed superiority over medical treatment while the other three comparisons were inconclusive. Of the 22 outcomes with moderate GRADE rating, 6 (27%) were inconclusive, 14 (64%) were in favour of surgery and 2 (9%) were in favour of the drug regimen according to the calculations of the Cochrane authors (14 (64%), were inconclusive, 7 (32%) favoured the surgical arm and 1 (5%) were in favour of the drug regimen according to our standard random-effects calculations).

Results of RCTs according to journal of publication

Of the 165 eligible RCTs (295 outcome assessments), 73 RCTs (133 assessments) were published in mostly surgical journals, 38 RCTs (69 assessments) in general journals and 54 RCTs (93 assessments) in mostly non-surgical journals. Based on 95% CIs for the assessments of RCTs published in mostly surgical journals, 40/133 (30%) were in favour of surgery, 14/133 (11%) were in favour of drugs and 79/133 (59%) were inconclusive. The respective numbers for the assessments of RCTs published in general journals were 27/69 (39%), 5/69 (7%) and 37/69 (53%); and for the assessments of RCTs published in mostly non-surgical journals they were 22/93 (24%), 15/93 (16%) and 56 (60%), respectively. The proportion of RCTs favouring surgery was not significantly higher in mostly surgical journals (30%) compared with other journals (39% and 24% for general and non-surgical journals, respectively) (p=0.18 by Fisher’s exact test).


Main findings

In a subset of Cochrane reviews that aimed to compare surgery to drugs we found that only one in five systematic reviews that had shown interest in such comparisons eventually found data from any RCTs for comparisons of the two modes of interventions. Furthermore, the majority of the comparisons where RCTs of surgery versus drugs had inconclusive results, few studies per meta-analytical outcome (30% with 3 or more studies) and also had low or very low strength of the evidence on GRADE assessments, and many trials had high risk of performance and detection bias.

Anal fissure was the only disease in our sample that had high GRADE evidence and a direction of effect indicating that one intervention (sphincterotomy) was more effective. Consequently, in the vast majority of cases where surgical and pharmaceutical interventions are available for treatment, an evidence-based decision in the clinic is difficult. Our secondary post hoc analysis of the type of journal where the eligible RCTs were published showed that results published in surgical journals were not necessarily more prone to favour the surgical arm of an RCT over the pharmaceutical arm.


This study covers the entire Cochrane database which is considered a high-quality comprehensive collection of systematic reviews. Cochrane reviews tend to address questions typically asked in routine clinical practice and underpin many clinical guideline recommendations, making this sample all the more relevant to everyday practice.56 Another strength of this study is that all surgical specialties were included. This is, therefore, to our knowledge the first project aiming to assess the extent of comparative evidence for surgery versus pharmacotherapy for a diverse spectrum of diseases.


Our analysis has several limitations. First, our predefined inclusion criteria excluded non-pharmacological medical interventions. Several comparisons may be found in the literature where surgery is compared against non-surgical non-pharmacological medical interventions, such as with continuous positive airway pressure (CPAP) or radiotherapy. We also excluded endovascular and endoscopic procedures since they may be performed by surgical and medical specialists. These eligibility choices aimed to achieve some homogeneity in a project that is by definition already very heterogeneous. The use of an algorithm to filter out papers with no mention of the word surgery as well as the search strategy itself may have led to us missing reviews that discuss a particular surgical procedure but never explicitly mention the word surgery but merely the name of the intervention.

Second, we focused exclusively on RCTs, but other types of evidence, for example, non-RCTs, or uncontrolled clinical trials may also exist and sometimes their results may be compelling enough to deem a randomised study unnecessary. Such unquestionable superiority in the absence of randomised evidence is however unlikely.57 Efforts such as IDEAL8 have laid out much of the groundwork for performing RCTs in surgical research, yet a dearth of RCTs in the surgical realm of research persists to this day.

Third, only one database (Cochrane Database of Systematic Reviews) was used for this study, and we did not examine non-Cochrane meta-analyses published as journal articles. While the database aims to be all inclusive, there are still some topics in medical and surgical care that have not been covered by Cochrane reviews.

However, the Cochrane database is more meticulous in describing its methods and it will routinely publish systematic reviews that have found no eligible articles, while this is unlikely in systematic reviews published in traditional journals. Therefore, including systematic reviews from journals may have distorted the picture and also caused a problem of overlapping systematic reviews. Moreover, we did not assess the methodological rigour or reporting quality of the Cochrane systematic reviews,58 as this was not the focus of our study. Cochrane systematic reviews score very highly in standard tools like the assessing the methodological quality of systematic reviews tool (AMSTAR),59 both because they are very meticulous and also because AMSTAR and AMSTAR-2 were developed with inspiration from the Cochrane Handbook.

Fourth, it is possible that within the same disease, subgroups of patients may be eligible only for medical or only for surgical treatment, or that one or the other approach is much better only for specific subgroups. With the dearth of evidence we found for the overall analysis, identification of such subgroup effects would be unlikely and error-prone.

Context of these findings

Sequestration between different disciplines and specialties60 may lead to isolation of specialists who use different tools, and this may lead to a lack of comparisons of the treatments that each specialty uses. Each specialty may have its own community, journals, meetings and research agenda, limiting communication between different specialists even though they may be dealing with the same disease from different angles and with different therapeutic sets. This lack of communication may also be due to differences in mentorship and the trend of subspecialisation in medical training separating clinicians and their practices even further,61 or to differing incentive structures.

Prior literature comparing surgical and medical interventions has assessed specific treatments, such as that for basal cell carcinoma,60 and demonstrated that sequestration was prominent. Despite a large number of trials, almost all of them compared medical interventions among themselves, or surgical interventions among themselves, rather than comparing between these two groups of treatment even though both groups of treatment could have been used. Our work shows that this issue of sequestration is widespread in surgical versus pharmaceutical interventions, and that even where comparisons exist, there are too few, as well as often biased trials.


This study suggests that comparisons of pharmaceutical and surgical interventions are infrequent. The available comparisons have very few included studies which makes heterogeneity, and bias hard to quantify and may yield spurious results with the normality assumptions underpinning common frequentist meta-analytical approaches.62 That is, even for the comparisons that have been retrieved the evidence is not sufficient.

Even accepting the difficulties in performing RCTs involving surgical interventions, our results still indicate a need for more comparative effectiveness research and for improved communication between surgical and medical specialties to bridge this gap in evidence. There are, of course, barriers to this. Head-to-head comparisons of treatments are often disfavoured by manufacturers leery of jeopardising their product against that of a competitor,63 64 and incentives unfortunately exist for both surgical and medical practitioners to promote treatments they are able to offer. Moving forward, both medical and surgical professional societies should collaborate to design fair and unbiased trials, and funders should also keep such research on their radars to try and overcome these structural obstacles.

Future research

Future clinical research should try to expand the scope, volume and methodological rigour of comparative evidence on surgical versus medical interventions. This work should involve both surgical and medical specialists and should also incorporate patient preferences. Long-term patient-centred outcomes, including both benefits and harms, should become available to put surgical and medical practices into proper perspective.

This post was originally published on https://bmjopen.bmj.com