Explainable Multimodal Deep Learning Framework for Dental Disease Diagnosis
DOI:
https://doi.org/10.31224/6984Keywords:
Deep Learning, Multi-modal Learning, Explainable AI, ResNet, BERT, Attention Mechanism, Dental Disease DiagnosisAbstract
Early and accurate diagnosis of dental diseases is essential for preventing disease progression and improving patient outcomes. This paper proposes an explainable multimodal deep learning framework that integrates intraoral RGB images and patient-reported symptom descriptions for automated dental disease diagnosis. The framework combines a convolutional neural network (ResNet) for visual feature extraction and a transformer-based model (BERT) for contextual understanding of symptoms. A cross-modal attention-based fusion mechanism is employed to effectively integrate image and text representations, enabling more robust and reliable predictions.
To enhance clinical interpretability, the system incorporates Grad-CAM for visual explanations and attention-based textual attribution for symptom-level reasoning. Experimental results demonstrate that the proposed multimodal model achieves an accuracy of 97%, outperforming both image-only and text-only approaches. Overall, the proposed framework provides a scalable, low-cost, and explainable solution for clinical decision support and early dental disease screening.
Downloads
Downloads
Posted
License
Copyright (c) 2026 Jayani Malsha Katugampala Kankanamalage, Nishantha J. Chandrasena

This work is licensed under a Creative Commons Attribution 4.0 International License.