This study aims to provide an overview of the current research on detecting abusive language in Indonesian social media. The study examines existing datasets, methods, and challenges and opportunities in this field. The research found that most existing datasets for detecting abusive language were col-lected from social media platforms such as Twitter, Facebook, and Instagram, with Twitter being the most commonly used source. The study also found that hate speech is the most researched type of abusive language. Various models, including traditional machine learning and deep learning approaches, have been im-plemented for this task, with deep learning models showing more competitive results. However, the use of transformer-based models is less popular in Indonesian hate speech studies. The study also emphasizes the importance of exploring more diverse phenomena, such as islamophobia and political hate speech. Additionally, the study suggests crowdsourcing as a potential solution for the annotation approach for labeling datasets. Furthermore, it encourages researchers to consider code-mixing issues in abusive language datasets in Indonesia, as it could improve the overall model performance for detecting abusive language in Indonesian data. The study also suggests that the lack of effective regulations and the anonymity afforded to users on most social networking sites, as well as the increasing number of Twitter users in Indonesia, have contributed to the rising prevalence of hate speech in Indonesian social media. The study also notes the importance of considering code-mixed language, out-of-vocabulary words, grammatical errors, and limited context when working with social media data.
Pamungkas, E., Putri, D., Fatmawati, A. (2023). Hate Speech Detection in Bahasa Indonesia: Challenges and Opportunities. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 14(6), 1175-1181 [10.14569/IJACSA.2023.01406125].
Hate Speech Detection in Bahasa Indonesia: Challenges and Opportunities
Putri D. G. P.;
2023
Abstract
This study aims to provide an overview of the current research on detecting abusive language in Indonesian social media. The study examines existing datasets, methods, and challenges and opportunities in this field. The research found that most existing datasets for detecting abusive language were col-lected from social media platforms such as Twitter, Facebook, and Instagram, with Twitter being the most commonly used source. The study also found that hate speech is the most researched type of abusive language. Various models, including traditional machine learning and deep learning approaches, have been im-plemented for this task, with deep learning models showing more competitive results. However, the use of transformer-based models is less popular in Indonesian hate speech studies. The study also emphasizes the importance of exploring more diverse phenomena, such as islamophobia and political hate speech. Additionally, the study suggests crowdsourcing as a potential solution for the annotation approach for labeling datasets. Furthermore, it encourages researchers to consider code-mixing issues in abusive language datasets in Indonesia, as it could improve the overall model performance for detecting abusive language in Indonesian data. The study also suggests that the lack of effective regulations and the anonymity afforded to users on most social networking sites, as well as the increasing number of Twitter users in Indonesia, have contributed to the rising prevalence of hate speech in Indonesian social media. The study also notes the importance of considering code-mixed language, out-of-vocabulary words, grammatical errors, and limited context when working with social media data.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.