FaseehGPT: A Lightweight Transformer Model for Arabic Text Generation with Enhanced Morphological Understanding
DOI:
https://doi.org/10.31224/5287Keywords:
Arabic Natural Language Processing, Transformer Architecture, Text Generation, Low-Resource NLP, Morphological Analysis, Dialectal Arabic, Modern Standard ArabicAbstract
We present FaseehGPT, a specialized transformer-based language model designed for high-quality Arabic text generation in resource-constrained environments. Unlike existing Arabic language models that primarily focus on understanding tasks, FaseehGPT is optimized for generative applications while maintaining computational efficiency suitable for deployment on consumer-grade hardware. The model employs a decoder-only transformer architecture with 70.7 million parameters, trained on a carefully curated corpus of 8.7 million Arabic texts spanning colloquial tweets, formal news articles, and classical literature. Our approach leverages the morphological richness of Arabic through strategic tokenization using a pre-trained Arabic BERT tokenizer, enabling effective handling of the language’s complex derivational and inflectional patterns. Extensive evaluation demonstrates FaseehGPT’s capability to generate coherent, contextually appropriate text across multiple Arabic varieties and registers. The model achieves competitive performance while requiring significantly fewer computational resources than comparable systems, with training completed on a single NVIDIA T4 GPU. We provide comprehensive technical details, reproducible training procedures, and make the complete model and codebase publicly available to advance Arabic NLP research. Evaluation metrics show consistent improvement across training epochs, with final perplexity scores indicating strong language modeling performance comparable to larger models in the Arabic domain. https://huggingface.co/alphatechlogics/FaseehGPT
Downloads
Downloads
Posted
License
Copyright (c) 2025 Ahsan Umar

This work is licensed under a Creative Commons Attribution 4.0 International License.