Enhancing Multi-codebook Vector Quantization for Knowledge Distillation via Multi-layer Supervision and Label Smoothing
DOI:
https://doi.org/10.31224/6551Keywords:
knowledge distillation, vector quantization, label smoothing, multi-layer distillationAbstract
This paper focuses on the limitations of single-layer supervision and overconfident one-hot targets in Multi-codebook Vector Quantization (MVQ) for knowledge distillation. To this end, we enhance MVQ by integrating multi-layer supervision and label smoothing. The implementation involves two key steps: during the knowledge extraction phase, knowledge is drawn from multiple teacher layers instead of a single one; during the knowledge transfer phase, label smoothing is applied to the one-hot codebook index targets. Cross-modal experiments on image (CIFAR-100) and speech (AISHELL-1) tasks show that multi-layer supervision and label smoothing can improve student performance in a complementary manner: multi-layer supervision provides a direct and robust gain, whereas the benefit of label smoothing is obtained through careful tuning of its noise parameter. Our work provides a straightforward enhancement for MVQ-based knowledge distillation and suggests that future work could explore dynamic noise scheduling for further performance improvement.
Downloads
Downloads
Posted
License
Copyright (c) 2026 Tongtong Zhao, Liangxun Shuo

This work is licensed under a Creative Commons Attribution 4.0 International License.