Efficient Model Pruning for Large-Scale Deep Learning Models: Enhancing Performance and Reducing Computational Overhead
DOI:
https://doi.org/10.31224/5216Keywords:
Model Pruning, Large Language Models(LLMs), Deep Learning, Deep Learning Efficiency, Sparse Neural Networks, Computational Overhead Protection, AI Model Compression, Inference OptimizationAbstract
Deep learning models, particularly large-scale language and vision architectures, are computationally intensive due to their extensive number of parameters and complex neural network designs. This paper presents an improved method for model pruning aimed at reducing the computational burden while maintaining performance levels comparable to unpruned models. By analyzing weights, biases, activations, and other key indicators, we propose a novel algorithm that effectively identifies and removes neurons or connections with minimal contribution to the model’s output quality. Our approach achieves a higher pruning efficiency across various pruning ratios, resulting in smaller, faster, and more cost-effective models. Experimental results demonstrate that our method significantly outperforms state-of-the-art (SOTA) pruning techniques in terms of both inference speed and memory usage, with negligible degradation in accuracy. This work contributes to the development of resource-efficient models suitable for deployment in environments with limited computational resources, paving the way for more scalable and sustainable deep-learning applications.
Downloads
Downloads
Posted
License
Copyright (c) 2025 Dinesh Kumar Koilada

This work is licensed under a Creative Commons Attribution 4.0 International License.