Ben-Sarc: A Self-Annotated Corpus for Sarcasm Detection from Bengali Social Media Comments and Its Baseline Evaluation

Sanzana Karim Lora; G. M. Shahariar; Tamanna Nazmin; Noor Nafeur Rahman; Rafsan Rahman; Miyad Bhuiyan; Faisal Muhammad shah

doi:10.31224/osf.io/7yb4c

##article.authors##

Sanzana Karim Lora https://orcid.org/0000-0001-6647-1639
G. M. Shahariar https://orcid.org/0000-0001-9757-7663
Tamanna Nazmin
Noor Nafeur Rahman
Rafsan Rahman
Miyad Bhuiyan
Faisal Muhammad shah https://orcid.org/0000-0002-5118-8571

DOI:

https://doi.org/10.31224/osf.io/7yb4c

Keywords:

Bengali sarcasm, Bengali sarcasm detection, sarcasm, sarcasm detection

Abstract

Sarcasm detection research of the Bengali language so far can be considered to be narrow due to the unavailability of resources. In this paper, we introduce a large-scale self-annotated Bengali corpus for sarcasm detection research problem in the Bengali language named ’Ben-Sarc’ containing 25,636 comments, manually collected from different public Facebook pages and evaluated by external evaluators. Then we present a complete strategy to utilize different models of traditional machine learning, deep learning, and transfer learning to detect sarcasm from text using the Ben-Sarc corpus. Finally, we demonstrate a comparison between the performance of traditional machine learning, deep learning, and transfer learning models on our Ben-Sarc corpus. Transfer learning using Indic-Transformers Bengali BERT as a pre-trained source model has achieved the highest accuracy of 75.05%. The second highest accuracy is obtained by the LSTM model with 72.48% and Multinomial Naive Bayes is acquired the third highest with 72.36% accuracy for deep learning and machine learning, respectively. The Ben-Sarc corpus is made publicly available in the hope of advancing the Bengali Natural Language Processing community.

Downloads

Download data is not yet available.

Ben-Sarc: A Self-Annotated Corpus for Sarcasm Detection from Bengali Social Media Comments and Its Baseline Evaluation

##article.authors##

DOI:

Keywords:

Abstract

Downloads

Downloads

Posted

Versions

License

Version justification

Latest preprints