Generalising AI-Music Detection: Decoder Artefacts, Self-Supervised Features, and the Limits of Hand-Crafted Cues
DOI:
https://doi.org/10.31224/7433Keywords:
AI-generated music, Synthetic audio detection, Decoder artifacts, Temporal stationarity, Spectral periodicity, Audio forensicsAbstract
A follow-up to "Detecting AI-Generated Piano Music with a Spectrogram CNN: A Proof of Concept, and a Study in Shortcuts."
Our previous study showed that a spectrogram CNN can distinguish one generator's piano output from human recordings with high in-distribution confidence, but fails to generalise: it collapsed on an unseen generator and on an unseen genre, and part of its signal was traceable to data-collection confounds. This follow-up turns from that single working classifier to the broader question it raised — how can AI-music detection be made to generalise across generators? — and reports a small empirical study of the most-discussed answer: detecting the artefacts left by the neural decoders that all current generators use to render audio. We first survey the candidate approaches (supervised spectrogram classification, self-supervised foundation features, neural-decoder artefact detection, musical-structure analysis, and watermarking) with their respective strengths and limitations. We then test, on a multi-generator dataset spanning seven systems (AudioLDM, MusicGen, Mustango, Riffusion, Stable Audio, Suno, Udio) plus real music, whether the decoder artefact can be captured by a simple, training-free signal-processing feature. A naive spectral-periodicity ("comb-strength") feature failed (AUC 0.37): it measures musical harmonicity, which real instrumental music has in abundance. A temporal-stationarity feature — exploiting that decoder artefacts are frozen in frequency while music moves — recovered a real but modest signal (AUC 0.68). Crucially, it separated codec-based generators (AudioLDM, MusicGen) well but was blind to the two most prominent commercial systems, Suno and Udio, whose polished output suppresses the artefact. We conclude that the decoder-artefact route is the most principled path to generator-agnostic detection, but that the artefact in today's best generators is too subtle for hand-crafted features and requires a learned model (e.g. the autoencoder round-trip of Afchar et al.). No single approach is a silver bullet; a practical detector is an ensemble, and the field is an arms race.
Downloads
Downloads
Posted
License
Copyright (c) 2026 Daniel Bordovský

This work is licensed under a Creative Commons Attribution 4.0 International License.