What is attention mechanism? A comprehensive survey of attention methods and transformer models
DOI:
https://doi.org/10.31224/7307Keywords:
Attention Mechanism , Transformer, Deep Learning, Computer Vision, Natural Language Processing (NLP)Abstract
The attention mechanism is a fundamental component widely used in deep learning models across numerous domains and tasks. It enables models to selectively focus on the most relevant parts of the input, rather than processing all elements equally, by assigning weights according to their importance for the task. This paper presents a comprehensive overview of the attention mechanism, outlining its general framework and offering a taxonomy of attention models, including Hierarchical, Bidirectional, Multi-Head, Multi-query, Group-query, Graph, Channel, Spatial, Channel-Spatial, Temporal, Spatial-Temporal, Cross, Axial and Flash Attention. The computational trade-offs of these models are also analyzed, and new developments in efficient attention are highlighted. We also discuss the Transformer architecture, one of the most influential deep learning frameworks that makes effective use of attention mechanism and review its major variants including BERT, GPT, Transformer XL, XLNet, BART, Fast Transformer, T5, Longformer, BIGBIRD, Performer, Linformer, Reformer, RoFormer, ALiBi, Switch Transformer, LLaMA, ViT, DETR, DeepViT, DeiT, T2T ViT, CrossViT, PVT, Swin Transformer, TNT, MViT, ViViT, DAT, and Spiking Transformer. Additionally, we provide a comparative overview of these models, highlighting their key ideas, results, advantages, and limitations. Through this survey, we emphasized the pivotal role of attention-based models, especially those built on the Transformer architecture, in shaping diverse application domains such as natural language processing, computer vision, recommender systems, and sensor data analysis.
Downloads
Downloads
Posted
License
Copyright (c) 2026 Farhad Mortezapour Shiri, Fateme Memar , Maryam Parhizgar

This work is licensed under a Creative Commons Attribution 4.0 International License.