Preprint / Version 1

Stage-Specific Comparative Assessment of Machine Learning Techniques within the Lean Six Sigma DMAIC Methodology for Industrial Process Performance Enhancement

##article.authors##

  • Charles Onyeka Nwamekwe Industrial and Production Engineering, P.M.B. 5025, Faculty of Engineering, Nnamdi Azikiwe University, Awka https://orcid.org/0009-0002-1918-1350
  • Raphael Olumese Edokpia Department of Production Engineering, University of Benin, Benin City, Edo state - Nigeria.
  • Christopher Igbinosa Eboigbe Department of Production Engineering, University of Benin, Benin City, Edo state - Nigeria.

DOI:

https://doi.org/10.31224/7082

Keywords:

DMAIC, Lean Six Sigma, Phase-Wise Model Evaluation, Machine Learning, Concept Drift Monitoring

Abstract

Lean Six Sigma projects increasingly generate nonlinear, multivariate production data, yet machine-learning models are often selected using pooled predictive accuracy that ignores the decision logic of Define, Measure, Analyse, Improve, and Control. This weakens CTQ prioritization, diagnostic reliability, prescriptive action, and monitoring. This study developed a phase-wise framework for model selection within DMAIC process optimization. An industrial dataset of 2,190 batch-level observations was restructured into five DMAIC-specific datasets. Define used binary CTQ failure indicators and ranking stability metrics. Measure expanded the data into 6,570 replicated observations with injected missingness and noise. Analyse preserved the multivariate CTQ structure for diagnostic regression across tablet weight, hardness, thickness, friability, and disintegration. Improve used 500 operating scenarios and composite desirability to assess prescriptive capability. Control incorporated temporal segmentation and drift to evaluate monitoring readiness. At least five models were evaluated per phase using aligned metrics, including AUC, F1-score, balanced accuracy, robust RMSE, degradation index, R², desirability gain, drift sensitivity, and rolling error stability. Results confirmed phase-dependent model superiority. Logistic Regression led Define by discrimination, with AUC = 0.90 ± 0.16, but required threshold calibration because positive-class retrieval remained weak. Random Forest was most robust in Measure, with robust RMSE = 4.31 ± 1.06 and degradation index = -17.98, and was also the most stable Control-phase model, with aggregate control score = 5.34. Analyse showed CTQ-specific dominance, while XGBoost achieved the highest Improve-phase desirability gain of 0.16. The framework provides defensible phase-adaptive guidance for ML-enabled Lean Six Sigma deployment.

Downloads

Download data is not yet available.

Downloads

Posted

2026-05-26