Interpreting BERT Using LIME and SHAP
DOI:
https://doi.org/10.31224/5078Keywords:
Artificial Intelligence, BERT, Explainable AI, LIME (Local Interpretable Model-Agnostic Explanations), SHAP (Shapley Additive Explanations), Interpretability, ClassificationAbstract
Transformer-based language models such as BERT have achieved state-of-the-art performance on diverse natural language processing tasks, yet their decision processes remain opaque. This paper presents a comprehensive framework for interpreting BERT’s predictions in multi-label text classification using two leading model-agnostic explainability techniques—Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP). An end-to-end pipeline for fine-tuning BERT and producing token-level attributions is introduced. We systematically compare the explainers with respect to local fidelity, global consistency, stability and computational cost. Experimental results suggest that LIME generates intuitive, case-specific explanations while SHAP provides theoretically grounded and globally consistent attributions. By integrating the complementary strengths of both methods, we propose a hybrid interpretation strategy that balances interpretability, scalability and accuracy. The methodology is illustrated through a case study on multi-label genre classification from movie plot summaries. Detailed guidelines and synthetic visualisations are provided to enable practitioners to apply these techniques effectively and responsibly.
Downloads
Downloads
Posted
License
Copyright (c) 2025 Manish Shukla

This work is licensed under a Creative Commons Attribution 4.0 International License.