Preprint / Version 1

The Information Dilution Paradox in Low-Dimensional Thermodynamic Manifolds

##article.authors##

  • Zulman Arif Independent Researcher

DOI:

https://doi.org/10.31224/7499

Abstract

Predicting net hourly electrical energy output (PE) in Combined Cycle Power Plants (CCPP) is fundamental to grid dispatch optimization, typically addressed through tree-based ensembles. Recently, hybrid architectures incorporating Transformer-based self-attention have been proposed to enhance tabular regression by capturing latent feature interactions. This study empirically evaluates a Residual Hybrid Transformer-GBDT architecture against a standalone LightGBM baseline using the UCI CCPP dataset. Contrary to the prevailing hypothesis that latent embedding expansion improves predictive accuracy, our results reveal a performance degradation, with Root Mean Squared Error (RMSE) increasing from 3.2777 (Baseline) to 3.2853 (Hybrid). Detailed error segmentation identifies a significant performance collapse during low-load operational regimes (< 430 MW), where the Mean Absolute Error (MAE) increased by 36.5% compared to the baseline. We characterize this phenomenon as the 'Information Dilution Paradox', wherein mapping a low-dimensional physical manifold (d=4) into a high-dimensional latent space (d=16) introduces stochastic noise and feature redundancy rather than discriminative signal. These findings provide a critical counter-narrative to the adoption of high-capacity deep learning for low-cardinality tabular data, suggesting that raw physical feature representations remain superior for thermodynamic manifolds governed by strong intrinsic correlations. 

Downloads

Download data is not yet available.

Downloads

Posted

2026-07-03