Preprint has been submitted for publication in journal
Preprint / Version 1

Predicting Injury Severity in Vehicle-Non-Motorist Crashes: A Comparative Machine Learning Framework with Interpretability Analysis

##article.authors##

  • Parvez Anowar University of Central Florida

DOI:

https://doi.org/10.31224/6973

Keywords:

Crash severity prediction, vulnerable road users, machine learning, pedestrian safety, SHAP interpretability, traffic safety, bicyclist safet, class imbalance

Abstract

Pedestrians and bicyclists account for a disproportionate share of traffic fatalities, yet predicting crash severity remains challenging due to class imbalance and inconsistent benchmarking. This study analyzes 12,563 vehicle-non-motorist crashes from Florida’s Signal4 database, comparing statistical models (Ordered Probit, Multinomial Logit), machine learning classifiers (Random Forest, XGBoost, LightGBM, SVM), and deep learning models (MLP, CNN) under identical conditions. Tree-based ensembles achieve the best performance (macro-F1: 0.48, ROC-AUC: 0.65 multiclass; 0.68 and 0.77 binary). Class-weighted training outperforms synthetic resampling, and tree ensembles match or exceed deep learning on tabular data. SHAP analysis identifies non-motorist age, violation history, lighting, and roadway type as the strongest severity predictors, with injury probability rising sharply beyond age 60. Calibration shows Gradient Boosting and SVM yield the most reliable probability estimates, while top-performing tree ensembles may need post-hoc calibration. The findings support prioritized infrastructure for elderly pedestrians and improved lighting on high-exposure corridors.

Downloads

Download data is not yet available.

Downloads

Posted

2026-05-03