A novel XGBoost method with entity embeddings for feature analysis and classification of traffic crash types


Ozinal Avsar Y., Yildirim Z. B., Avsar E., ÖZUYSAL M., ÇALIŞKANELLİ S. P.

INTERNATIONAL JOURNAL OF INJURY CONTROL AND SAFETY PROMOTION, 2026 (SSCI, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1080/17457300.2026.2678331
  • Dergi Adı: INTERNATIONAL JOURNAL OF INJURY CONTROL AND SAFETY PROMOTION
  • Derginin Tarandığı İndeksler: Social Sciences Citation Index (SSCI), Scopus, CINAHL, EMBASE, Environment Index, MEDLINE, Academic Search Ultimate (EBSCO), Biomedical Reference Collection: Corporate Edition (EBSCO), Health Research Premium Collection (ProQuest)
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Crash type is an important factor in understanding crash severity, as certain types lead to higher mortality rates. Predicting crash type for specific road sections can therefore support road safety assessments. This study examines the relationships between geometric road elements at crash sites and identifies key features in crash type classification. Crash data from a 10-year period in 10 central districts of & Idot;zmir, T & uuml;rkiye, were analysed. Among these districts, the three with the highest number of crashes, namely Bornova, Kar & scedil;& imath;yaka and Konak, were used as geographically distinct test districts, while model training was performed using data from the remaining central districts within each temporal group. Feature importance ranking was conducted using the Extreme Gradient Boosting (XGBoost) method. As the main contribution, we propose Embedding-XGBoost (E-XGB), a novel two-stage dimensionality reduction approach that integrates entity embeddings with XGBoost to improve classification performance. E-XGB enables modelling crash data in a lower-dimensional feature space, allowing predictions with reduced computational effort and robustness against missing data. The superiority of E-XGB was demonstrated by comparing its performance with four machine learning algorithms: XGBoost, support vector machine, K-nearest neighbours and multilayer perceptron. Results show that E-XGB achieves classification performance values, in terms of accuracy, F1-score and precision up to 85.42%, 85.09% and 86.03%, respectively, when 10 features are used.