Predicting Injury Severity in Vehicle-Non-Motorist Crashes: A Comparative Machine Learning Framework with Interpretability Analysis
DOI:
https://doi.org/10.31224/6973Keywords:
Crash severity prediction, vulnerable road users, machine learning, pedestrian safety, SHAP interpretability, traffic safety, bicyclist safet, class imbalanceAbstract
Pedestrians and bicyclists account for a disproportionate share of traffic fatalities, yet predicting crash severity remains challenging due to class imbalance and inconsistent benchmarking. This study analyzes 12,563 vehicle-non-motorist crashes from Florida’s Signal4 database, comparing statistical models (Ordered Probit, Multinomial Logit), machine learning classifiers (Random Forest, XGBoost, LightGBM, SVM), and deep learning models (MLP, CNN) under identical conditions. Tree-based ensembles achieve the best performance (macro-F1: 0.48, ROC-AUC: 0.65 multiclass; 0.68 and 0.77 binary). Class-weighted training outperforms synthetic resampling, and tree ensembles match or exceed deep learning on tabular data. SHAP analysis identifies non-motorist age, violation history, lighting, and roadway type as the strongest severity predictors, with injury probability rising sharply beyond age 60. Calibration shows Gradient Boosting and SVM yield the most reliable probability estimates, while top-performing tree ensembles may need post-hoc calibration. The findings support prioritized infrastructure for elderly pedestrians and improved lighting on high-exposure corridors.
Downloads
Downloads
Posted
License
Copyright (c) 2026 Parvez Anowar

This work is licensed under a Creative Commons Attribution 4.0 International License.