Preprint / Version 1

Benchmarking Self-Supervised Speech Models on Multilingual Nigerian Speech

##article.authors##

Omotayo Omoyemi University of Derby https://orcid.org/0009-0003-6625-680X
Ifeoluwa Oladeni National Open University of Nigeria https://orcid.org/0009-0005-3395-1309

DOI:

Keywords:

Automatic Speech Recognition (ASR), Self-Supervised Learning, Nigerian Languages, Low-Resource Languages, Whisper Model, Wav2vec 2.0, Multilingual Benchmarking

Abstract

Self-supervised speech models such as Whisper and wav2vec 2.0 have significantly advanced automatic speech recognition (ASR) performance for high-resource languages. However, their robustness and generalization to underrepresented African languages remain insufficiently studied.

In this work, we present a systematic benchmark of modern self-supervised ASR models on a multilingual Nigerian speech corpus comprising English, Hausa, Igbo, and Yoruba. Using the Nigerian Common Voice dataset (158 hours), we evaluate zero-shot performance of pretrained models and compare it with supervised adaptation using fine-tuning of multilingual speech encoders. We report Word Error Rate (WER) and Character Error Rate (CER) across languages and analyze the effect of supervised adaptation and cross-language transfer.

Our results show that zero-shot ASR performance is substantially degraded for Nigerian languages compared to widely represented benchmark languages. Supervised fine-tuning consistently improves recognition accuracy, although the magnitude of improvement varies across languages and depends on the compatibility between the pretrained checkpoint and the target language. In particular, adaptation from a Hausa-pretrained XLS-R model yields strong gains for Hausa but more limited improvements for Igbo, highlighting the importance of language-specific training data.

These findings demonstrate that multilingual pretraining alone is insufficient for reliable ASR in underrepresented African languages and that supervised adaptation remains necessary for robust deployment. The study provides reproducible benchmarks for multilingual ASR evaluation in African contexts and offers practical guidance for adapting large-scale speech models to underrepresented languages.

Downloads

Download data is not yet available.

Additional Files

Posted

2026-03-20

License

This work is licensed under a Creative Commons Attribution 4.0 International License.