Enhancing Multi-codebook Vector Quantization for Knowledge Distillation via Multi-layer Supervision and Label Smoothing

Tongtong Zhao; Liangxun Shuo

doi:10.31224/6551

##article.authors##

Tongtong Zhao Hebei GEO University https://orcid.org/0009-0006-5801-4554
Liangxun Shuo

DOI:

https://doi.org/10.31224/6551

Keywords:

knowledge distillation, vector quantization, label smoothing, multi-layer distillation

Abstract

This paper focuses on the limitations of single-layer supervision and overconfident one-hot targets in Multi-codebook Vector Quantization (MVQ) for knowledge distillation. To this end, we enhance MVQ by integrating multi-layer supervision and label smoothing. The implementation involves two key steps: during the knowledge extraction phase, knowledge is drawn from multiple teacher layers instead of a single one; during the knowledge transfer phase, label smoothing is applied to the one-hot codebook index targets. Cross-modal experiments on image (CIFAR-100) and speech (AISHELL-1) tasks show that multi-layer supervision and label smoothing can improve student performance in a complementary manner: multi-layer supervision provides a direct and robust gain, whereas the benefit of label smoothing is obtained through careful tuning of its noise parameter. Our work provides a straightforward enhancement for MVQ-based knowledge distillation and suggests that future work could explore dynamic noise scheduling for further performance improvement.

Downloads

Download data is not yet available.

Enhancing Multi-codebook Vector Quantization for Knowledge Distillation via Multi-layer Supervision and Label Smoothing

##article.authors##

DOI:

Keywords:

Abstract

Downloads

Downloads

Posted

License

Latest preprints