Improving Construction Cost Prediction and Uncertainty Quantification with a Machine Learning Imputation Framework
DOI:
https://doi.org/10.31224/7294Keywords:
Construction Cost, Data Imputation, Machine Learning, Ensemble Methods, Uncertainty QuantificationAbstract
Large-scale public procurement databases offer substantial opportunities for construction cost modeling, yet their utility is frequently compromised by pervasive missing data, often exceeding 78% for critical financial outcomes. While existing literature typically addresses missingness through listwise deletion or simplistic substitution, this paper presents a novel methodological framework that treats data imputation as a core predictive task rather than a preliminary preprocessing step. Utilizing a dataset of approximately 4.3 million records, we propose a multi-stage pipeline integrating Isolation Forest for outlier detection, employing a 'nullify and re-impute' strategy to preserve valid structural metadata, and an optimized XGBoost algorithm for reconstructing missing final prices. The primary contribution of this research is the empirical validation of this frameworkâs impact on both imputation accuracy and downstream predictive utility. Results demonstrate that the XGBoost imputation model reduced Root Mean Squared Error (RMSE) by 45.5% compared to standard mean substitution. Furthermore, when evaluated on a downstream Quantile Gradient Boosting cost prediction task, the model trained on the XGBoost-imputed dataset outperformed baselines utilizing listwise deletion, reducing Mean Absolute Error (MAE) by 14.6% and RMSE by 8.3%. Critically, the proposed framework improved uncertainty quantification, achieving a highly calibrated 79.85% empirical coverage for an 80% nominal prediction interval, whereas conventional methods resulted in poorly calibrated bounds. This study equips construction management practitioners with an empirically validated methodology to process heavily incomplete datasets, enabling more reliable cost forecasting and mathematically sound contingency planning.
Downloads
Downloads
Posted
License
Copyright (c) 2026 Amr A. Mohy, ElBadr O. Elgendi

This work is licensed under a Creative Commons Attribution 4.0 International License.