Traumatic brain injury (TBI) is a significant health burden worldwide.1 It is the leading cause of mortality and disability among young individuals.2 Patients with TBI are vulnerable to hypoxia and hypotension in the early period of their course, and these insults are associated with poor outcomes.3 4 Prehospital assessment and management of patients with TBI are important,5 as early prediction of TBI and correcting hypoxia and hypotension during the prehospital stage could be beneficial.3 However, the identification of TBI can often be challenging in the prehospital area.5 Vulnerable patients, including the elderly or patients who take medications like antiplatelet or anticoagulant drugs, often have TBI owing to low energy insults.6 Prehospital clinical signs are also reported to have poor sensitivity for raised intracranial pressure following TBI.7
Several prediction models to target patients with TBI have been reported.8–12 However, most incorporated information is available only in the hospital, such as laboratory results or image findings.8 9 13 In addition, most previous prediction models focused on the outcomes of patients with TBI,14–16 not the identification of TBI. Previously, predictors of older adult patients with TBI who required transport to a trauma centre were identified. However, this was consensus based; therefore, there is a lack of clinical data.17 Accurate prehospital prediction of TBI and its severity could prevent delays to definite care for patients with TBI. Most emergency medical service (EMS) providers collect various information including demographics, previous medical history, circumstances of the trauma and clinical signs including vital signs; but those variables have not been evaluated together as predictors of TBI and its severity. Using a variety of prehospital information and adapting newly emerging machine learning algorithms for predicting diagnosis, disposition and outcome of TBI might improve the accuracy of identification of TBI and its severity.
The aim of this study was to develop and test prediction models for the diagnosis and prognosis of TBI using prehospital information and machine learning algorithms among patients with severe trauma. We hypothesised that incorporating prehospital information could achieve acceptable performance in predicting TBI, and machine learning algorithms could contribute to performance improvement.
Materials and methods
Study design and settings
This was a multicentre retrospective study conducted at three tertiary academic emergency departments (EDs) located in an urban area (Seoul and Bundang) of South Korea. These EDs received 50 000–90 000 visits annually and are not designated trauma centres. We adhered to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis statement on reporting predictive models.18
The EMS system in South Korea is operated by the National Fire Agency. The EMS level is considered intermediate, as EMS providers can perform bleeding control, spinal motion restriction, immobilisation and splintage, advanced airway management and administer fluid intravenously. As only physicians can declare death in South Korea, EMS providers cannot stop resuscitation and must transport all patients including those in cardiac arrest to the ED. For all EMS transport, EMS providers record an ambulance run sheet by law. Since 2012, the National Fire Agency adapted the US Centres for Disease Control and Prevention of the US field triage decision scheme to evaluate patients with trauma,19 and they developed an EMS severe trauma in-depth registry. For said patients, EMS providers evaluate whether patients met trauma centre transport criteria in the field triage decision scheme. If they did, the in-depth registry should be recorded, and EMS transport protocol recommends that patients are transferred to a near-regional trauma centre; but it is not mandatory.
The Ministry of Health and Welfare designated three ED levels according to the resources and functional requirements; level 1 (n=36) and level 2 (n=118) EDs have more resources and better facilities for emergency care and must be staffed by emergency physicians 24 hours a day/365 days a year, whereas level 3 EDs (n=248) can be staffed by general physicians. In accordance with the EMS Act, all EDs participated annually in a nationwide functional performance evaluation programme, which was administered by the Ministry of Health and Welfare. The three participating hospitals in this study were all level 1 EDs that can perform acute trauma care for patients with TBI 24 hours a day/365 days a year—including emergency neurosurgical operation and angiographic interventions. The Ministry of Health and Welfare also designated trauma centres in Korea. Total 16 trauma centres were designated as trauma centres in 2018. Among them, 15 were level I EDs.
We used an EMS ambulance run sheet, EMS trauma in-depth registry and ED administrative database. The EMS database information, including ambulance run sheet and trauma in-depth registry, was collected electronically by EMS providers using tablets. The EMS record review for each severe trauma has been performed by EMS medical directors of each fire department since 2012. The ED administrative database contains patients’ demographic characteristics, route of visit, time of visit and diagnosis and disposition. We merged the EMS database with the ED administrative database based on patients’ arrival time, age and sex.
We included adult (age ≥15) EMS users who were transported to participating hospitals with severe trauma from 1 January 2014 to 31 December 2018. Severe trauma was assessed by EMS providers and defined as patients who fulfilled trauma centre transport criteria (physiologic criteria, anatomic criteria, mechanism of injury criteria or special patients or system consideration criteria) in the field triage decision scheme.20 Patients were excluded if they had out-of-hospital cardiac arrest or their main cause of EMS call was medical or non-traumatic injury, including choking, drowning, fire, flame, heat, cold, poisoning, chemical, sexual assault, weather or natural disaster. Patients with an unknown outcome were also excluded.
The primary outcome measure was the diagnosis of TBI. TBI diagnosis was defined as patients whose diagnostic code, according to the International Statistical Classification of Diseases and Related Health Problems (ICD-10), was between S06.0 and S06.9.21 22 Although S06.7 is coded for the duration of unconscious, we included S06.7 in our study outcome according to the previous studies.21–23 However, no patients only have S06.7 code for TBI diagnosis in our study. The ED administrative database has two types of primary diagnostic codes: the final diagnostic codes at ED discharge and at hospital discharge. We extracted up to 20 codes for each. We defined the diagnostic code as positive for TBI if a confirmative diagnostic code was found in any level of the discharge record. Because ICD 10 code is not directly linked to the severity of TBI, we further included a variety of additional outcome measures to perform analysis that takes into account severity. A secondary outcome measure was TBI diagnosis with intracranial haemorrhage or injury (TBI-I), defined as patients with TBI excluding concussion (ICD 10 code with S06.0). A tertiary outcome was TBI with non-discharge (TBI-ND), defined as patients with TBI excluding ED discharged patients. Because patients with TBI-ND needed further management by hospitalisation or transfer, we thought that this group of patients had clinically significant severity. A quaternary outcome measure was TBI with death (TBI-D), defined as patients with TBI who died in ED or hospital. Because patients with TBI-D are most severe group, patients with TBI-D were also included in TBI-ND.
Variables and preprocessing
We collected patients’ demographic data, circumstances of trauma, chief complaints, EMS vital sign assessment, EMS management and hospital outcomes. The detailed descriptions of each variable are described in online supplemental table 1. Categorical variables were preprocessed with the one-hot encoding (dummy variable encoding) method. Continuous variables were divided into four quantiles, and unknown or missing values were categorised as a fifth category. One hot encoding was also applied to discretised continuous variables. Preprocessing measures including discretisation results of continuous variables are presented in online supplemental table 1.
We developed prediction models for outcomes by using five machine learning algorithms: traditional logistic regression (LR) analyses, extreme gradient boost (XGB), random forest (RF), support vector machine (SVM) and elastic net (EN). The LR algorithm was chosen as baseline comparison algorithm because it is widely used in the medical field and has been used for previous prediction model development in TBI studies.12 Backward stepwise LR was selected for feature selection, and we used the default parameter of stepAIC function from MASS package (V.7.3–53.1) in R for the selection. The other four algorithms were selected based on their ability to model non-linear associations, their relative ease of implementation and their general acceptance in the machine learning community.24–26 All algorithms have a method to calculate the probability of the outcome occurring and algorithms other than LR need hyperparameter tuning for proper training and prediction.
The study population was split into training cohorts that included development, validation and test cohorts. The development cohort included a training cohort from which each of the machine learning prediction models were derived and a validation cohort in which the prediction models were applied to adjust the hyperparameters of the algorithm. The test cohort was used for the final evaluation of the performance of the prediction models. Chronological split was used for data split. Patients enrolled from 1 January 2014 to 31 December 2016 were used as the training cohort; patients from 1 January 2017 to 31 December 2017 were used as the validation cohort and patients from 1 January 2018 to 31 December 2018 were used as the test cohort. Hyperparameter tuning using validation data was conducted by, first, a random search within 10 000 randomly generated hyperparameters; then, grid search hyperparameters were chosen from random search with five candidates per each hyperparameter. Finally, hyperparameter with best area under receiver–operation curve (AUROC) in validation cohorts were selected. Test data were separated during training and tuning processes and used to measure algorithm performance.
The demographic findings and outcomes of the study population were described in this study. Additionally, the baseline characteristics of the training cohort and the validation cohort were compared. The continuous variables were compared by using Student’s t test or the Wilcoxon rank sum test, and the categorical variables were compared by using the χ2 test or the Fisher exact test, as appropriate.
We assessed discrimination performance by comparing the AUROC for each model in the test cohort. We considered an AUROC of 0.5 as no discrimination, 0.7–0.8 as acceptable, 0.8–0.9 as excellent and more than 0.9 is considered outstanding.27 Area under the precision-recall curve (AUPRC) was assessed for each model in the test cohort. We assessed the calibration power by using the Hosmer–Lemeshow test, the scaled Brier score and a calibration plot in the test cohort. For the delineation of test characteristics, the sensitivity, specificity and positive and negative predictive values with 95% CIs were determined using a cut-off probability at a sensitivity of 80%. Given that poor sensitivity of clinical predictors for TBI in previous studies,7 and almost 75% sensitivity level for other severe disease prediction in prehospital settings,28 29 we thought that 80% sensitivity was an appropriate target for our prediction model. We calculated false-positive rate as 1—specificity. The added prognostic power of each prediction model compared with the LR model was also evaluated by continuous net reclassification index (NRI). NRI is a statistical method to quantify how well a new model correctly reclassifies the study population with the other models. Details of NRI are described elsewhere.30
By using a model-specific metric, the variable importance of each model was assessed, except for the SVM algorithm. The variable importance was determined by the coefficient effect sizes for the LR model. The XGB and RF models were ranked by variable importance on the selection frequency of the variable as a decision node. The absolute value of the coefficients corresponding to the tuned model were used for the measurement of variable importance in the EN algorithm. To compare the variable importance of each prediction model efficiently, top five variables of each model were presented.
All statistical analyses were performed with R Statistical Software (V.4.0.1; R Foundation for Statistical Computing, Vienna, Austria). Packages included caret, e1071, xgboost, randomForest and glmnet for the analysis of the machine learning algorithms.
Patient and public involvement
This research was done without patient involvement. Patients were not invited to comment on the study design and were not consulted to develop patient relevant outcomes or interpret the results. Patients were not invited to contribute to the writing or editing of this document for readability or accuracy.
Among the 157 134 EMS users transported to three hospitals from 2014 to 2018, 1169 patients were included in the final analysis (figure 1). Patients were split into two data sets: data from 2014 to 2017, consisting of 867 patients (74.2%) in the development cohort and the remaining data from 2018 consisting of 302 patients (25.8%) in the test cohort (figure 1). Among the development cohort, data from 2014 to 2016—consisting of 661 patients—were used as the training cohort, and 2017 data—consisting of 206 patients—were used as the validation cohort in the model.
Table 1 shows key demographic findings of the development and test cohorts. Median (IQR) age was 52 years (35–66) in the development cohort and 56 years (40–69) in the test cohort. Traffic accident was most common mechanism of trauma (43.3% for the development cohort and 41.4% for the test cohort). The proportion of patients with alert mental status was 58.1% for the development cohort and 69.5% in the test cohort. Overall, TBI, TBI-I, TBI-ND, TBI-D occurred in 215 (24.8%), 195 (22.5%), 192 (22.1%) and 32 (3.7%) in the development cohort; and 66 (21.9%), 56 (18.5%), 57 (18.9%) and 11 (3.6%) in the test cohort. All demographic characteristics of the development and test cohorts are described in online supplemental table 2.
The final hyperparameters of prediction models are described in online supplemental table 3. The discrimination and NRI of the prediction models on the test cohort are presented in table 2. The AUROC for outcomes was 0.770–0.809 for TBI, 0.812–0.844 for TBI-I, 0.767–0.811 for TBI-ND and 0.664–0.889 for TBI-D (table 2 and online supplemental figure 1). Compared with LR, XGB performed significantly well in predicting TBI, and RF and EN performed well in predicting TBI-ND and TBI-D. EN model generally performed well on all outcomes. The AUROC of the EN model for outcomes was 0.799 (95% CI 0.732 to 0.867), 0.844 (95% CI 0.779 to 0.910), 0.811 (95% CI 0.741 to 0.882) and 0.871 (95% CI 0.764 to 0.978) for TBI, TBI-I, TBI-ND and TBI-D, respectively. Machine learning models generally resulted in significant reclassification improvement compared with LR for TBI, TBI-I and TBI-ND. Prediction of TBI-D, AUROC difference and reclassification improvement compared with LR was non-significant in all machine learning models. The precision-recall curve is shown in online supplemental figure 2. AUPRC was 0.479–0.564 for TBI, 0.469–0.606 for TBI-I, 0.477–0.551 for TBI-ND and 0.094–0.293 for TBI-D. EN model showed highest AUPRC among all prediction models. Online supplemental figure 3 shows the calibration plot of prediction models according to outcomes. All prediction models generally showed poor calibration. Given the high AUROC and AUPRC among prediction models, and reclassification improvement compared with LR, we determined EN as a best-performing prediction model in our analysis.
Using cut-off of 80% sensitivity, specificity was 47.5%–72.5% for TBI, 71.1%–81.3% for TBI-I, 46.1%–74.3% for TBI-ND and 42.6%–79.0% for TBI-D. EN showed the highest specificity and PPV among all outcomes. False-positive rate (1—specificity) was almost 19.7%–39.0% according to outcomes in the EN model. The 95% CI of specificity of the EN model was not overlapped with LR in TBI, TBI-ND and TBI-D predictions. NPV was almost 89%–99% for all outcomes in the prediction models (table 3).
Table 4 shows the top five variable importance of prediction models according to outcomes. Variables related to patients’ symptom of loss of consciousness, Glasgow Coma Scale component and light reflex were the three most important variables to predict all outcomes. Compared with other outcomes, the difference between variable importance for TBI-D was prominent, and the mechanism of injury, heart rate and age showed the highest importance for predicting TBI-D.
By using prehospital data from EMS users visiting three teaching hospitals, we developed and validated prediction models for the diagnosis and prognosis of TBI using machine learning algorithms among patients with severe trauma, identified by EMS providers in South Korea. We found that 24% of patients were diagnosed with TBI, 22% showed intracranial injury, 21% could not be discharged from the ED with a TBI diagnosis and 4% showed TBI-related death. Machine learning models showed acceptable-to-excellent discrimination performance (AUROCs were 0.799–0.871 according to outcomes in the best-performing EN model). When identifying 80% of target patients with TBI, the false-positive rate was almost 19.7%–39.0%. Consciousness status-related variables ranging from patients’ symptom to EMS providers’ assessment showed the highest importance for predicting all outcomes. This study adds considerably to the understanding of prehospital prediction performance of TBI among patients with severe trauma. Use of comprehensive prehospital information and certain machine learning approaches led to increased performance with a diminished false-positive rate compared with those of the traditional statistical model.
Several studies reported that EMS providers’ assessment using prehospital information is effective for the identification of patients with severe trauma who require direct transport to a trauma centre.31–33 Because TBI accounts for a significant portion of patients with severe trauma,32 and the majority of patients have poor access to trauma centres,34 identification of TBI among patients with severe trauma by EMS providers could contribute to proper prehospital management and destination hospital decisions.3 However, prehospital identification of TBI is challenging.35 Prehospital clinical signs showed poor predictive performance for differentiating patients with TBI,7 and previous prediction models related to TBI mostly focused on TBI outcomes.8 9 13 One study reported the predictors for mild TBI with persistent symptoms, but a single-centre case–control study design and ED-based model development lack applicability to prehospital settings.36 In this study, we developed and tested TBI prediction models that used prehospital information, and we found acceptable discrimination power for the prediction of diagnosis and prognosis of TBI. Uniquely, we incorporated various demographic variables, trauma circumstances, patients’ complaints and EMS assessment information in the prediction models, and we adapted the machine learning algorithms.
When using a cut-off for 80% sensitivity for TBI detection, the false-positive rate was 19.7%–39.0% (table 2). Those false-positive rate levels are plausible for detecting severe diseases in EMS settings. A previous study reported a 26% of false-positive rate of EMS triage for myocardial infarction with a sensitivity of 74% and 50% of false-positive rate of EMS recognition of stroke in sensitivity of 74%.28 29 Considering the prevalence of outcomes (24% in TBI, 22% in TBI-I, 21% in TBI-ND and 4% in TBI-D; table 1), there would be 16, 9, 12 and 67 false-positive patients for every 10 patients who are accurately identified as TBI, TBI-I, TBI-ND and TBI-D, respectively (online supplemental table 4). Because of the low prevalence of TBI-D, a similar specificity of the prediction model for outcomes resulted in a very low positive predictive value and a high proportion of false-positive cases, which suggested the limited applicability of prediction models for TBI-D in prehospital settings.
Consciousness-status-related variables ranging from patients’ complaints to EMS assessment showed the highest importance regardless of models and outcomes in our study. Consciousness status is closely associated with head trauma. Head trauma can result in structural brain injury or physiological disruption of brain function, which could result in altered mental status.37 Mental status is also associated with TBI severity38 and its association with TBI outcomes has been reported.8 9 13 History taking and physical examination for altered mental status are key to early diagnosis and proper management of TBI in prehospital settings.39
We adapted machine learning algorithms for the prediction of TBI-related outcomes and found an improvement in discrimination and an increase in specificity with the same sensitivity thresholds. However, the LR model also showed acceptable or similar performance compared with machine learning models, according to the outcomes. In clinical prediction models, a previous systematic review reported no performance benefit of the machine learning model over LR.40 The previous study stated that machine learning models tend to show high performance with a strong signal-to-noise ratio problem like gaming, image recognition. However, clinical prediction problems often result in a poor signal-to-noise ratio.40 If we could use unstructured data, which have strong signal-to-noise ratio like continuous vital sign monitoring data or audiovisual data for patients’ appearance, machine learning models might perform better than LR models. In addition, if we analysed more patient data, the performance improvement of machine models might be elucidated.
Precise assessment in prehospital field could contribute to improved patient-related outcomes. High demand of EMS call and response, disparity in accessibility to definitive care capable hospitals according to regions34 and the importance of timely management in acute disease care are the chief reasons behind the necessity for the accurate assessment of EMS providers. Although information acquisition and processing are quite difficult in prehospital areas, various instruments and information systems could attribute to diminish those problems. Complex data acquisition like mobile CT or other unstructured data,41 information sharing through telemedicine42 and decision support tools in prehospital environments43 could contribute to the accurate assessment of EMS providers. More information acquisition and real-time processing of those data could improve the clinical prediction models in prehospital areas, which could lead to the improvement of patients’ safety and outcomes.
Our study had several limitations. First, our data were collected at three teaching hospitals in urban areas of South Korea. Therefore, external validation for other areas should be conducted to generalise the developed prediction model. Second, we used retrospective analysis of electronically collected prehospital and hospital data. There might be various information loss and missing data. We treated missing status as a separate category for our analysis;44 however, there could be different reasons for missing data. Third, there is a possibility that the prediction model was overfitted or underfitted. The use of large number of predictors also can contribute to overfitting. To minimise this issue, we rigorously searched hyperparameters and carefully chose hyperparameters according to the performance in independent validation cohorts. Fourth, we selected our study population using trauma centre transport criteria for EMS providers in Korea. Although those criteria are based on the field triage decision scheme, which is the most widely used prehospital trauma triage protocol,6 extrapolation to another EMS setting or general trauma patients would be limited. Fifth, Abbreviated Injury Scale codes were not used to identify our study outcome because of a lack of information. To compensate for this limitation, we further identified patients with TBI-I, TBI-ND and TBI-D to consider severity. However, different definitions of clinical severity, including ICU admission or emergency operation, might be possible.Finally, this study was performed in an intermediate-service-level EMS system. The generalisation of our study findings to different EMS settings should be made with caution.
In conclusion, we presented data on TBI among patients with severe trauma assessed by EMS providers, and our results inform the development of prediction models for the diagnosis and prognosis of TBI in our population. We used various information that can be obtained in prehospital settings and showed acceptable outcome performance. The consistent importance of consciousness-status-related variables emphasises the importance of assessment and monitoring of consciousness status in prehospital areas. Although prospective, and implementation studies are needed for TBI prediction in prehospital areas, our study outlined a novel method for the precise assessment of EMS providers using a machine learning-based prediction model. Further collection of various types of patient-related data would contribute to the enhanced performance of the clinical prediction model in prehospital settings.