Taxonomy Classification using Machine Learning Based Model
DOI:
https://doi.org/10.31224/3967Keywords:
Data Exploration, Data Visualization, Classifiers, LLM Survey, TF-IDF, Confusion MatrixAbstract
Large language model (LLM) trends and taxonomy have changed rapidly in the last few years, primarily due to the advancement of data sciences like natural language processing (NLP), deep learning, and the ever-growing size of computational resources. These models aim to enhance logical and mathematical reasoning beyond pattern recognition. This work aims to explore trends in survey papers over time and analyze their associated taxonomies through data exploration, visualization, and machine learning modeling. Initially, the dataset of survey papers is preprocessed by grouping the number of surveys by year and month, revealing publication trends across time. A detailed analysis of taxonomy distributions is performed to identify the prevalence of various survey categories. Using the TF-IDF method, the titles and summaries of papers are vectorized, transforming textual information into numerical features. A one-hot encoding approach is applied to the survey categories to enable better feature representation for machine learning models. The results show that the Random Forest Classifier and Support Vector Machine achieved the highest accuracies in classifying survey papers based on their taxonomy. This research not only highlights trends in the publication of surveys but also offers an automated approach for classifying them, potentially aiding future research in organizing and categorizing survey literature efficiently.
Downloads
Downloads
Posted
Versions
- 2024-10-22 (2)
- 2024-09-30 (1)
License
Copyright (c) 2024 Anup Majumder
This work is licensed under a Creative Commons Attribution 4.0 International License.