On-Device Multi-Type Disfluency Detection with Sub-Millisecond Inference on Apple Silicon

Nazar Kozak

doi:10.31224/6814

##article.authors##

Nazar Kozak Kozak Technologies Inc https://orcid.org/0009-0001-8858-6098

DOI:

https://doi.org/10.31224/6814

Keywords:

speech disfluency detection, stuttering, on-device inference, CoreML, Apple Neural Engine, voice stress analysis, SEP-28K, mobile speech processing

Abstract

Published multi-type disfluency detection systems achieve their best results with 300M+ parameter server-class backbones, leaving speech-therapy applications without a concrete reference for the detection performance and inference latency achievable on a smartphone.

We present DisfluoSDK, a multi-type disfluency classifier running entirely on-device on Apple Silicon. On SEP-28K (20,131 clips, episode-grouped 5-fold cross-validation) a 617K-parameter CNN achieves macro-F1 0.382 (1.2 MB CoreML) and an adapted ResNet-18 achieves 0.404 (11.2M parameters, 21 MB)—occupying an otherwise unpopulated region of the accuracy–efficiency Pareto frontier where on-device deployment is feasible.

A four-way CoreML compute-unit sweep across four hardware generations (M1 Max, A19 Pro, A18, A15; 16,000+ timed trials) shows that the Neural Engine delivers sub-millisecond mean inference across all tested devices (CNN 0.225–0.635 ms), providing ample real-time headroom for speech processing. The sweep also surfaces a desktop/mobile CoreML scheduler divergence in GPU routing with a direct consequence for deployment practice. PyTorch-to-CoreML export fidelity is numerically verified on 500 test-fold spectrograms (cell-level agreement 99.96%/100.00%, ΔF1 ≤ 0.003).

As an auxiliary empirical result, voice-stress features show no practically meaningful linear association with any disfluency type across 14,645 clips (|r| < 0.05, all Cohen-negligible), supporting the architectural separation of stress and disfluency modules.

Downloads

Download data is not yet available.

On-Device Multi-Type Disfluency Detection with Sub-Millisecond Inference on Apple Silicon

##article.authors##

DOI:

Keywords:

Abstract

Downloads

Downloads

Posted

License

Latest preprints