Preprint / Version 1

Efficient Model Pruning for Large-Scale Deep Learning Models: Enhancing Performance and Reducing Computational Overhead

##article.authors##

  • Dinesh Kumar Koilada JNTU Hyderabad

DOI:

https://doi.org/10.31224/5216

Keywords:

Model Pruning, Large Language Models(LLMs), Deep Learning, Deep Learning Efficiency, Sparse Neural Networks, Computational Overhead Protection, AI Model Compression, Inference Optimization

Abstract

Deep learning models, particularly large-scale language and vision architectures, are computationally intensive due to their extensive number of parameters and complex neural network designs. This paper presents an improved method for model pruning aimed at reducing the computational burden while maintaining performance levels comparable to unpruned models. By analyzing weights, biases, activations, and other key indicators, we propose a novel algorithm that effectively identifies and removes neurons or connections with minimal contribution to the model’s output quality. Our approach achieves a higher pruning efficiency across various pruning ratios, resulting in smaller, faster, and more cost-effective models. Experimental results demonstrate that our method significantly outperforms state-of-the-art (SOTA) pruning techniques in terms of both inference speed and memory usage, with negligible degradation in accuracy. This work contributes to the development of resource-efficient models suitable for deployment in environments with limited computational resources, paving the way for more scalable and sustainable deep-learning applications.

Downloads

Download data is not yet available.

Downloads

Posted

2025-09-03