Preprint / Version 1

Towards Robust and Scalable Mixture of Experts Architectures for Large Language and Vision Models

##article.authors##

  • Aamina Yousra King Abdullah University of Science and Technology
  • Jumanah Fawziya King Abdullah University of Science and Technology
  • Fawzi Gamal King Abdullah University of Science and Technology https://orcid.org/0009-0007-0487-9330

DOI:

https://doi.org/10.31224/4764

Keywords:

Mixture of Experts

Abstract

The advent of foundation-scale deep learning models, characterized by unprecedented model sizes and multi-modal capabilities, has revitalized interest in Mixture of Experts (MoE) architectures due to their potential for efficient conditional computation and scalability. However, robustness challenges—including routing instability, expert overload, and vulnerability to distributional shifts and adversarial attacks—pose significant barriers to reliable deployment in large language and vision models. This survey presents a comprehensive and mathematically rigorous overview of robust MoE methods in the era of foundation models. We systematically examine foundational theories, algorithmic advances in capacity-aware routing and auxiliary regularization, and state-of-the-art training strategies designed to enhance robustness and scalability. Empirical evaluations across diverse language, vision, and multi-modal benchmarks highlight the strengths and limitations of current approaches. We further identify critical open problems spanning theoretical guarantees, differentiable routing optimization, multi-modal consistency, and efficient training under resource constraints. By synthesizing recent developments and articulating future directions, this survey aims to provide a unified framework for advancing robust MoE research, facilitating their broader adoption in next-generation AI systems.

Downloads

Download data is not yet available.

Downloads

Posted

2025-07-02