Preprint / Version 1

No Single Basis Wins: A Cross-Family Study of Diffusion Feature Forecasting and the Limits of Training-Free Basis Selection

##article.authors##

DOI:

https://doi.org/10.31224/7309

Keywords:

diffusion models, feature caching, training-free acceleration, dynamic mode decomposition, benchmark methodology

Abstract

Feature-caching methods accelerate diffusion and flow-matching samplers by skipping the network on most denoising steps and forecasting the cached features from recent compute-step anchors. The literature has treated the forecast basis as a ladder to climb: monomials (TaylorSeer), scaled Hermite polynomials (HiCache), rational and multistep variants (FoCa, HyCa), global Chebyshev fits (Spectrum), each rung validated on a single sampler family. We went to the ladder's mathematical endpoint, the exponential solution class of the local feature-ODE, fitted it with rank-truncated Dynamic Mode Decomposition (Prony's method in its modern multivariate form), and expected it to win everywhere the polynomials do. It does not. The exponential basis wins on every flow-matching 3D generator we test (+0.13/+0.24 F-score over the deployed Hermite arm at intervals 5/6 on Hunyuan3D-2.1; geometry-lossless through interval 6 on SAM3D at 1.56x), yet on DiT-XL/2 ImageNet-256 with the literature-standard 250-step DDPM sampler the ranking inverts outright: a sign-correct TaylorSeer is near-lossless (paired-noise FID drift 2.27 at 3.81x speedup), the corrected Hermite sweeps the interval ladder (3.54/6.46/10.74 at intervals 4/6/8), and the exponential basis drifts 5-9x more than the corrected polynomial at every interval. No basis in our study wins both families. The obvious remedy fails too. Training-free holdout selection, which is exact on synthetic regime switches (120/120 windows) and harmless to helpful on the 3D generators, serves the losing exponential arm on DiT in both holdout modes, at FID drift 18.11 where the corrected polynomial sits at 3.54; this refutes a prediction we pre-registered before the deciding run. The mechanism deserves to be stated plainly: the richer exponential parameterization fits, and therefore backcasts, the in-window history better at any holdout distance while extrapolating forward worse, so in-window backcast fit carries no signal about forward extrapolation quality on real features. We close with an uncomfortable methodological finding. A one-character sign bug made our Hermite baseline anti-extrapolative, and it survived every end-to-end benchmark we and our integrations ran, because near-reuse fails safe: quality metrics cannot distinguish a forecaster from a damped cache, and hyperparameter tuning actively conceals the defect. We propose directional closed-form regression tests as a community-level remedy, and report every affected number as-released alongside corrected values. Code, ledgers, and all benchmarks: https://github.com/Archerkattri/hicache-plus-plus

Downloads

Download data is not yet available.

Downloads

Posted

2026-06-12