Survey Trends using LLM Models
DOI:
https://doi.org/10.31224/3977Abstract
This report outlines a comprehensive analysis of survey papers within a specific dataset using various data science techniques. The primary objective is to explore, manipulate, and evaluate the data to understand the trends and taxonomy distributions of surveys in this domain.
Data exploration began with a time-series analysis of survey releases, visualizing trends over time. Taxonomy distributions were then examined using bar charts and pie charts to uncover the most frequent categories.
In the data manipulation phase, we constructed a feature matrix by applying TF-IDF vectorization to the text fields (titles and summaries) and using one-hot encoding for the categorical variables. These features were then normalized and split into training and testing sets to prepare for model evaluation.
The data evaluation process employed a Random Forest classifier to predict the taxonomy of surveys based on the features extracted. Performance was measured using accuracy, precision, recall, and F1-score, with the model achieving an accuracy of 34.48 percentage. Although the model's performance indicates room for improvement, this analysis demonstrates the potential of machine learning in automating the classification of survey papers based on their content.
This study illustrates how data science techniques, including natural language processing (NLP) and machine learning, can be applied to understand trends, perform feature engineering, and evaluate models in the context of survey data. Future work could involve the use of more advanced models and feature selection techniques to enhance predictive accuracy.
Downloads
Downloads
Posted
License
Copyright (c) 2024 Tasvi Adappa
This work is licensed under a Creative Commons Attribution 4.0 International License.