Exploring Large Language Models Survey papers using Random Forest Classification with Over Sampling the Minority classes
DOI:
https://doi.org/10.31224/4061Abstract
The rapid advancement in Large Language Models (LLMs) has made it challenging for researchers to keep up with new models and innovations. While many scholars have published survey papers to synthesize this work, the growing number of surveys has added complexity, making it harder to stay updated on recent developments. In this report, I present an analysis of a dataset comprising 144 survey papers published between 2023 and January 2024. To address the dataset's class imbalance, I applied oversampling techniques, specifically the Synthetic Minority Over-sampling Technique (SMOTE) and Random Over Sampling. These methods, combined with a Random Forest Classifier, were used to predict the 'taxonomy class' of new papers, resulting in an improvement in the model's precision score from 0.35 to 0.42. Future work will focus on enhancing classification accuracy and exploring other machine learning algorithms to expand the applicability of this approach.
Downloads
Downloads
Posted
License
Copyright (c) 2024 Thi Lan Anh Bui
This work is licensed under a Creative Commons Attribution 4.0 International License.