Enhancing Query Expansion for Rare Diseases in PubMed Using Embedding-Based Semantic Representations
DOI:
https://doi.org/10.31224/4535Keywords:
Query Expansion, latent semantic analysis, AI adaptive learning, Contrastive LearningAbstract
Searching for rare diseases in scholarly databases like PubMed remains challenging due to terminology variability and low-frequency terms. Traditional keyword-based methods (e.g., TF-IDF, BM25) often fail to capture semantic relationships, leading to suboptimal recall. This paper proposes an embedding-based query expansion framework leveraging pre-trained biomedical language models (e.g., BioBERT, SciBERT) to improve retrieval of rare disease literature. We demonstrate that contextual embeddings can effectively expand queries with synonymous or related terms (e.g., Gaucher disease), outperforming baseline PubMed searches in precision-k and recall. Our approach bridges the gap between sparse lexical matching and semantic understanding in biomedical information retrieval.
Downloads
Downloads
Posted
License
Copyright (c) 2025 Sam Kerr Kelly

This work is licensed under a Creative Commons Attribution 4.0 International License.