Preprint / Version 1

COVID-19 Spread Prediction: A Comparative Study




Machine Learning, Random Forest, Linear Regression, COVID-19, XGBoost


COVID-19 is the latest infectious virus that has become a global pandemic and brought the global economies to their knees. Precise analysis and forecast of the disease spread can help with resource planning and create strategies to slow down the progress of this deadly virus. This paper explores a variety of machine learning models, from heuristic statistical techniques to advanced deep learning methods, to forecast the COVID-19 dynamic. To measure the daily spread of COVID-19, we opt for two target variables: the number of daily positive cases and the number of daily deaths. Although the chance of irregularities and reporting lags is high, it is more sensitive to short-term time series forecasting. These two variables look for stable and reliable estimates for COVID-19 spread. The peculiarity of the data is that it is time series but without one complete period, thereby preventing us from directly using established forecasting methods. Thus, our analysis uses some non-time series methods by including time factors and a few time series methods with the inclusion of exogenous variables by tailoring the data into the appropriate format. We aim to find an optimal model for each family of models where possible. To illustrate the results, India has been chosen for the case study, as this country presently recorded the fastest pace of COVID-19 spread in the first six months of the pandemic. A comparative study has been included with different evaluation metrics. The metrics such as Mean absolute error (MAE), Mean squared error (MSE), Median squared error (MEME), and Mean squared log error (MSLE) has been used for evaluating the spread of COVID-19. We have compared methods such as Liner Regression, Elastic net regularization, Random-forest regressor, XGBoost regressor, Simple exponential smoothing, and so on. Among these methods, the Random-forest regressor shows the highest MAE (11351.8833), MSE(11827.2160), MEME(9998.6333), and MSLE(0.0220) values than the other state-of-the-art methods. Our study indicates that more complex models may not be more reliable compared to simpler ones for forecast COVID-19 spread. We have used python to analyze our results.


Download data is not yet available.