Nowadays, review sites are more and more confronted with the spread of misinformation, i.e., opinion spam, which aims at promoting or damaging some target businesses, by misleading either human readers, or automated opinion mining and sentiment analysis systems. For this reason, in the last years, several data-driven approaches have been proposed to assess the credibility of user-generated content diffused through social media in the form of on-line reviews. Distinct approaches often consider different subsets of characteristics, i.e., features, connected to both reviews and reviewers, as well as to the network structure linking distinct entities on the review-site in exam. This article aims at providing an analysis of the main review- and reviewer-centric features that have been proposed up to now in the literature to detect fake reviews, in particular from those approaches that employ supervised machine learning techniques. These solutions provide in general better results with respect to purely unsupervised approaches, which are often based on graph-based methods that consider relational ties in review sites. Furthermore, this work proposes and evaluates some additional new features that can be suitable to classify genuine and fake reviews. For this purpose, a supervised classifier based on Random Forests have been implemented, by considering both well-known and new features, and a large-scale labeled dataset from which all these features have been extracted. The good results obtained show the effectiveness of new features to detect in particular singleton fake reviews, and in general the utility of this study.
Fontanarava, J., Pasi, G., Viviani, M. (2017). Feature analysis for fake review detection through supervised classification. In Data Science and Advanced Analytics (DSAA), 2017 IEEE International Conference on (pp.658-666). Institute of Electrical and Electronics Engineers Inc. [10.1109/DSAA.2017.51].
Feature analysis for fake review detection through supervised classification
Pasi, G;Viviani, M
2017
Abstract
Nowadays, review sites are more and more confronted with the spread of misinformation, i.e., opinion spam, which aims at promoting or damaging some target businesses, by misleading either human readers, or automated opinion mining and sentiment analysis systems. For this reason, in the last years, several data-driven approaches have been proposed to assess the credibility of user-generated content diffused through social media in the form of on-line reviews. Distinct approaches often consider different subsets of characteristics, i.e., features, connected to both reviews and reviewers, as well as to the network structure linking distinct entities on the review-site in exam. This article aims at providing an analysis of the main review- and reviewer-centric features that have been proposed up to now in the literature to detect fake reviews, in particular from those approaches that employ supervised machine learning techniques. These solutions provide in general better results with respect to purely unsupervised approaches, which are often based on graph-based methods that consider relational ties in review sites. Furthermore, this work proposes and evaluates some additional new features that can be suitable to classify genuine and fake reviews. For this purpose, a supervised classifier based on Random Forests have been implemented, by considering both well-known and new features, and a large-scale labeled dataset from which all these features have been extracted. The good results obtained show the effectiveness of new features to detect in particular singleton fake reviews, and in general the utility of this study.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.