Voice Stress Markers Are Orthogonal to Speech Disfluency Labels: A Large-Scale Analysis on SEP-28K
DOI:
https://doi.org/10.31224/6767Keywords:
voice stress analysis, disfluency detection, stuttering, SEP-28K, speech assessment, jitter, shimmer, F0 variability, correlation analysisAbstract
The relationship between voice stress markers and speech disfluency events has not been systematically quantified at scale, despite both being targets of clinical assessment in stuttering populations. We examine correlations between four acoustic stress features—jitter, shimmer, fundamental frequency (F0) standard deviation, and a composite stress score—and five disfluency types (prolongation, block, sound repetition, word repetition, interjection) across 14,645 three-second clips from the SEP-28K dataset with valid pitch estimates. Using both Pearson and point-biserial correlations with Bonferroni correction for 20 comparisons, we find that all absolute correlations fall below 0.05, with all effect sizes negligible by Cohen's convention (|r| < 0.10). The strongest observed association (composite stress × prolongation, r = -0.050) explains only 0.25% of variance. Distribution comparisons between fluent and disfluent clips yield Cohen's d < 0.10 for all stress features. These findings suggest that, at least in terms of linear associations in this dataset, acoustic voice stress markers and disfluency labels carry largely non-overlapping information. While non-linear or conditional dependencies cannot be ruled out from marginal correlations alone, the negligible effect sizes suggest that multimodal speech assessment systems may benefit from treating disfluency detection and stress monitoring as separate modules rather than modeling them jointly. We release analysis code and detailed statistical outputs to support reproducibility.
Downloads
Downloads
Posted
License
Copyright (c) 2026 Nazar Kozak

This work is licensed under a Creative Commons Attribution 4.0 International License.