Temporal Instability Phases Precede and Predict Reasoning Failure in Generative Pre-Trained Transformers
DOI:
https://doi.org/10.31224/6533Keywords:
failure forecasting, hallucination prediction, Large Language Models (LLMs), LLM BehaviorAbstract
Large language model failures are often treated as isolated turn-level events. Here we show that this view is incomplete. Analyzing multi-turn GPT conversations, we identify persistent latent risk states, inferred from a frozen prospective instability signal, that are temporally non-random, persist across multiple consecutive turns, and forecast elevated future failure probability over subsequent horizons. Higher latent states are associated with systematically higher future failure risk, a monotone ordering that replicates across four disjoint held-out GPT datasets, with state-transition $\chi^{2}$ ranging from 91.64 to 486.15 and high-versus-low risk ratios reaching 2.76. Within-conversation analyses further show that instability rises before failure events, arguing against a purely cross-sectional explanation. Because the latent states are inferred from observable behavioral features without access to ground-truth failure labels at inference time, the resulting signal is prospective and potentially usable during deployment. These findings suggest that reasoning failure in large language models is better understood not as isolated noise, but as entry into temporally persistent high-risk regimes. This reframes model unreliability as a dynamical systems problem and has direct implications for real-time monitoring, safety evaluation and training-time intervention.
Downloads
Downloads
Posted
Versions
- 2026-03-19 (3)
- 2026-03-09 (2)
- 2026-03-02 (1)
License
Copyright (c) 2026 Venkata Siddharth Pendyala

This work is licensed under a Creative Commons Attribution 4.0 International License.