Benchmarking Self-Supervised Speech Models on Multilingual Nigerian Speech
DOI:
https://doi.org/10.31224/6650Keywords:
Automatic Speech Recognition (ASR), Self-Supervised Learning, Nigerian Languages, Low-Resource Languages, Whisper Model, Wav2vec 2.0, Multilingual BenchmarkingAbstract
Self-supervised speech models such as Whisper and wav2vec 2.0 have significantly advanced automatic speech recognition (ASR) performance for high-resource languages. However, their robustness and generalization to underrepresented African languages remain insufficiently studied.
In this work, we present a systematic benchmark of modern self-supervised ASR models on a multilingual Nigerian speech corpus comprising English, Hausa, Igbo, and Yoruba. Using the Nigerian Common Voice dataset (158 hours), we evaluate zero-shot performance of pretrained models and compare it with supervised adaptation using fine-tuning of multilingual speech encoders. We report Word Error Rate (WER) and Character Error Rate (CER) across languages and analyze the effect of supervised adaptation and cross-language transfer.
Our results show that zero-shot ASR performance is substantially degraded for Nigerian languages compared to widely represented benchmark languages. Supervised fine-tuning consistently improves recognition accuracy, although the magnitude of improvement varies across languages and depends on the compatibility between the pretrained checkpoint and the target language. In particular, adaptation from a Hausa-pretrained XLS-R model yields strong gains for Hausa but more limited improvements for Igbo, highlighting the importance of language-specific training data.
These findings demonstrate that multilingual pretraining alone is insufficient for reliable ASR in underrepresented African languages and that supervised adaptation remains necessary for robust deployment. The study provides reproducible benchmarks for multilingual ASR evaluation in African contexts and offers practical guidance for adapting large-scale speech models to underrepresented languages.
Downloads
Additional Files
Posted
License
Copyright (c) 2026 Omotayo Omoyemi, Ifeoluwa Oladeni

This work is licensed under a Creative Commons Attribution 4.0 International License.