Critical appraisal and assessment of bias among studies evaluating risk prediction models for in-hospital and 30-day mortality after percutaneous coronary intervention: a systematic review

Model performance and risk of bias need improvement

All models constructed in the studies included in this review demonstrated good predictive performance (AUC >0.75) and effectively identified patients at high risk of short-term mortality after PCI. Among them, the model by Gao et al showed the best predictive performance after external validation.20 However, most studies were assessed as having a high risk of bias, with only one study having an unclear overall risk of bias.

In terms of study design, four retrospective cohort studies were assessed as having a high risk of bias due to potential data omissions, input errors and differences in evaluation methods, which could affect data quality and integrity. It is recommended that model construction prioritise prospective study data to improve data quality and reduce bias and errors.35 Two studies that included high-risk patients with high mortality rates were assessed as having a high risk of bias because their predictive models might overly rely on these patients’ specific high-risk characteristics, neglecting other important predictors for the overall patient population. This could lead to model overfitting and overestimation of mortality risk for the general patient population. Therefore, study samples should have broad representation, including patients of varying risk levels, to enhance model generalisability and clinical applicability. Three studies using multicentre data were assessed as having a high risk of bias due to potential significant data bias arising from different data collection standards across centres. Thus, before conducting multicentre studies, researchers should establish clear research protocols and standardise data collection and statistical procedures to reduce risk of bias and enhance the scientific value of the research.36

In model construction, five studies were assessed as having a high risk of bias due to an EPV of less than 10. A low EPV can lead to overestimation or underestimation of predictor variable effects, increasing statistical instability. To improve accuracy and stability, it is generally recommended ensuring an EPV of at least 20.37 18 studies were assessed as having a high risk of bias for categorising continuous variables as discrete. This simplification can result in the loss of crucial information and distort relationships between variables, affecting model accuracy; thus, unnecessary discretisation of continuous variables should be avoided. If categorisation is necessary, appropriate thresholds based on clinical and statistical needs should be selected and using multicategory or ordinal classifications to retain more information should be considered. Two studies introduced bias by excluding certain participants, which could alter the distribution of key variables and create differences between the remaining sample and the overall patient population. In this case, excluding participants without sufficient justification should be avoided. If exclusion is necessary, thoroughly assessing and reporting the potential bias impact on study results are important. 11 studies were assessed as having a high risk of bias due to improper handling of missing data, which could lead to information loss or the introduction of misleading data, distorting prediction outcomes. Advanced techniques like multiple imputation should be used to handle missing data, rather than using simple single-value replacements.38 17 studies were assessed as having a high risk of bias for selecting predictors solely based on univariate analysis. This method ignores interactions between variables and the complex relationships between predictors and outcomes, such as non-linear relationships and threshold effects, leading to poor model performance in practical applications. Use of multivariate analysis techniques to consider these interactions and complexities is necessary, ensuring the model is built on multidimensional variable effects. Five studies did not explicitly exclude non-cardiac death as a competing risk, resulting in high risk of bias. This omission can lead to external deaths being mistakenly attributed to cardiac causes, obscuring the true impact of cardiac-related risk factors and leading to inaccurate estimates of cardiac death risk.39 Excluding all relevant competing risks during model development is crucial. Four studies were assessed as having a high risk of bias for not recalibrating predictor coefficients after selection in multivariate regression analysis. This can overlook interactions between variables, preventing the model from accurately reflecting the true effects of the predictors and reducing predictive ability. Recalibrating predictor coefficients after multivariate regression analysis to ensure model accuracy and stability is necessary to reduce risk of bias.

In terms of model validation and presentation, three studies were assessed as having a high risk of bias for not reporting calibration plots or tables. This omission prevents the assessment of calibration performance across different risk levels and may overlook calibration issues.40 Calibration plots or tables provide visual information that helps identify calibration problems within specific probability ranges, detect trends and systematic errors, increase transparency, and offer a basis for comprehensive model evaluation and improvement. Therefore, it is recommended providing calibration plots or tables in addition to using the Hosmer-Lemeshow test when calibrating models. 10 studies were assessed as having a high risk of bias due to using random data splitting for internal validation. Random splitting can lead to data unevenness between the training and validation sets or data leakage, affecting model stability and generalisability. To improve model reliability and reduce risk of bias, it is recommended using more robust internal validation methods, such as cross-validation (eg, k-fold cross-validation) or leave-one-out validation, combined with external validation covering diverse populations and clinical settings, to reflect the model’s feasibility in actual clinical practice.41 Additionally, ensuring sample independence in the construction and validation of different models is crucial to enhance the applicability of models across different patient groups and avoid bias from duplicate samples. Two studies were assessed as having a high risk of bias due to inconsistencies between reported predictors and their regression coefficients, reflecting potential bias in the variable selection process, possibly due to overfitting or improper statistical handling. These inconsistencies can reduce the interpretability of the model, making the study results difficult to understand and apply. To avoid this risk of bias, ensuring data quality is important, as well as transparently reporting the variable selection process and regression coefficient calculations, making the research process reproducible for other researchers. Furthermore, it is essential to regularly update and refine models as new data and technologies become available.

Predictors are subject to further discussion

The predictive factors included in this study vary due to factors such as data source availability and the timing of model development. Key predictors include age, cardiogenic shock, low ejection fraction, high-risk myocardial infarction sites and elevated ST segments on ECGs. These indicators collectively suggest a preoperative state of reduced patient tolerance and significant cardiovascular impairment, which increases the risk of surgical complications and elevates short-term postoperative mortality.42 This aligns with findings from Komorova et al’s research.43 Consequently, these predictors highlight the need for clinicians to rigorously assess cardiovascular status perioperatively, adhere to the latest clinical guidelines and best practices, and strictly manage surgical indications. Other studies indicate that mechanical complications of myocardial infarction, such as ventricular septal defects, lead to high mortality rates unaffected by reperfusion therapy.44 Future research should focus on evaluating their impact on the accuracy of postoperative mortality risk predictions. Furthermore, since some predictors in 21 studies require assessment during or after surgery, they are unable to assist in preoperative planning. To better meet diverse risk prediction needs, future research should focus on developing precise models tailored to different clinical stages, as exemplified by several models developed by Peterson and his team.13 19 26 27

Prospects for model construction

In this review, only three models included were based on machine learning algorithms. Among them, Doll et al employed machine learning algorithms for variable reduction.12 Al’Aref et al compared the predictive performance of various machine learning algorithms with traditional logistic regression,14 revealing that AdaBoost and XGBoost outperformed logistic regression. However, the decision tree model constructed by Negassa et al exhibited relatively average predictive performance compared with other logistic regression models due to the inclusion of fewer predictive variables.34 In recent years, machine learning algorithms have been widely applied and developed in data processing, model construction and performance evaluation. Relevant studies have shown that they can effectively handle high-dimensional, non-linear and complex data, thereby improving model accuracy and generalisation ability.45 Although the predictive performance of machine learning algorithms in existing studies is generally moderate, future research should focus on addressing challenges such as algorithm bias and model interpretability, fully exploring their potential and value in constructing clinical risk prediction models.

10 studies included in this review constructed models based on datasets of high-risk patients, such as those with STEMI or cardiogenic shock, where the postoperative mortality rate of these patients is significantly increased. While this enhances the predictive accuracy of the models for specific populations, it also compromises their generalisability in clinical practice. Clinicians should integrate both specific and generalisable models to provide comprehensive prediction and decision support.46 Before application, it is essential to carefully consider the scope and limitations of the models to avoid indiscriminate generalisation to patient populations different from the development dataset. Additionally, efforts can be made to develop dynamic risk prediction models by incorporating changes in information from patient electronic health records in real time or periodically, enabling early identification of high-risk populations and achieving continuous, comprehensive risk monitoring throughout the entire care continuum.47 This will provide timely information for clinical care teams to improve patient outcomes and reduce the incidence of adverse events.48

Limitations

This review has certain limitations: first, only Chinese and English literature was included, potentially omitting studies in other languages and causing some publication bias. Second, due to differences in sample characteristics, variable selection and construction algorithms among different modelling studies, only qualitative analysis was conducted, and a meta-analysis was not feasible. Third, the included machine learning algorithm models involve complex algorithms and data processing procedures, which may have bias risks that cannot be fully assessed by PROBAST. Fourth, sample overlap in seven studies could overemphasise characteristics specific to that dataset, possibly limiting the models’ applicability and introducing bias into the findings.

This post was originally published on https://bmjopen.bmj.com