Preprint / Version 3

Temporal Instability Phases Precede and Predict Reasoning Failure in Generative Pre-Trained Transformers

##article.authors##

  • Venkata Siddharth Pendyala Interlake High School

DOI:

https://doi.org/10.31224/6533

Keywords:

failure forecasting, hallucination prediction, Large Language Models (LLMs), LLM Behavior

Abstract

Large language model failures are often treated as isolated turn-level events. Here we show that this view is incomplete. Analyzing multi-turn GPT conversations, we identify persistent latent risk states, inferred from a frozen prospective instability signal, that are temporally non-random, persist across multiple consecutive turns, and forecast elevated future failure probability over subsequent horizons. Higher latent states are associated with systematically higher future failure risk, a monotone ordering that replicates across four disjoint held-out GPT datasets, with state-transition $\chi^{2}$ ranging from 91.64 to 486.15 and high-versus-low risk ratios reaching 2.76. Within-conversation analyses further show that instability rises before failure events, arguing against a purely cross-sectional explanation. Because the latent states are inferred from observable behavioral features without access to ground-truth failure labels at inference time, the resulting signal is prospective and potentially usable during deployment. These findings suggest that reasoning failure in large language models is better understood not as isolated noise, but as entry into temporally persistent high-risk regimes. This reframes model unreliability as a dynamical systems problem and has direct implications for real-time monitoring, safety evaluation and training-time intervention.

Downloads

Download data is not yet available.

Downloads

Posted

2026-03-02 — Updated on 2026-03-19

Versions

Version justification

Larger dataset, more rigorous statistical analysis, human review of labels, and addition of two holdout sets.