Preprint / Version 1

What is attention mechanism? A comprehensive survey of attention methods and transformer models

##article.authors##

  • Farhad Mortezapour Shiri Faculty of Computer Science and Information Technology, University Putra Malaysia (UPM), Serdang, Malaysia
  • Fateme Memar Department of electrical engineering and computer science, University of Kansas, Lawrence, Kansas, USA
  • Maryam Parhizgar Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Qazvin, Iran

DOI:

https://doi.org/10.31224/7307

Keywords:

Attention Mechanism , Transformer, Deep Learning, Computer Vision, Natural Language Processing (NLP)

Abstract

The attention mechanism is a fundamental component widely used in deep learning models across numerous domains and tasks. It enables models to selectively focus on the most relevant parts of the input, rather than processing all elements equally, by assigning weights according to their importance for the task. This paper presents a comprehensive overview of the attention mechanism, outlining its general framework and offering a taxonomy of attention models, including Hierarchical, Bidirectional, Multi-Head, Multi-query, Group-query, Graph, Channel, Spatial, Channel-Spatial, Temporal, Spatial-Temporal, Cross, Axial and Flash Attention. The computational trade-offs of these models are also analyzed, and new developments in efficient attention are highlighted. We also discuss the Transformer architecture, one of the most influential deep learning frameworks that makes effective use of attention mechanism and review its major variants including BERT, GPT, Transformer XL, XLNet, BART, Fast Transformer, T5, Longformer, BIGBIRD, Performer, Linformer, Reformer, RoFormer, ALiBi, Switch Transformer, LLaMA, ViT, DETR, DeepViT, DeiT, T2T ViT, CrossViT, PVT, Swin Transformer, TNT, MViT, ViViT, DAT, and Spiking Transformer. Additionally, we provide a comparative overview of these models, highlighting their key ideas, results, advantages, and limitations. Through this survey, we emphasized the pivotal role of attention-based models, especially those built on the Transformer architecture, in shaping diverse application domains such as natural language processing, computer vision, recommender systems, and sensor data analysis.

Downloads

Download data is not yet available.

Downloads

Posted

2026-06-11