Preprint / Version 1

Exploring Large Language Models Survey papers using Random Forest Classification with Over Sampling the Minority classes

##article.authors##

  • Thi Lan Anh Bui Boise State University

DOI:

https://doi.org/10.31224/4061

Abstract

The rapid advancement in Large Language Models (LLMs) has made it challenging for researchers to keep up with new models and innovations. While many scholars have published survey papers to synthesize this work, the growing number of surveys has added complexity, making it harder to stay updated on recent developments. In this report, I present an analysis of a dataset comprising 144 survey papers published between 2023 and January 2024. To address the dataset's class imbalance, I applied oversampling techniques, specifically the Synthetic Minority Over-sampling Technique (SMOTE) and Random Over Sampling. These methods, combined with a Random Forest Classifier, were used to predict the 'taxonomy class' of new papers, resulting in an improvement in the model's precision score from 0.35 to 0.42. Future work will focus on enhancing classification accuracy and exploring other machine learning algorithms to expand the applicability of this approach.

Downloads

Download data is not yet available.

Downloads

Posted

2024-11-07