Several code smell detection tools have been developed providing different results, because smells can be subjectively interpreted, and hence detected, in different ways. In this paper, we perform the largest experiment of applying machine learning algorithms to code smells to the best of our knowledge. We experiment 16 different machine-learning algorithms on four code smells (Data Class, Large Class, Feature Envy, Long Method) and 74 software systems, with 1986 manually validated code smell samples. We found that all algorithms achieved high performances in the cross-validation data set, yet the highest performances were obtained by J48 and Random Forest, while the worst performance were achieved by support vector machines. However, the lower prevalence of code smells, i.e., imbalanced data, in the entire data set caused varying performances that need to be addressed in the future studies. We conclude that the application of machine learning to the detection of these code smells can provide high accuracy (>96 %), and only a hundred training examples are needed to reach at least 95 % accuracy.

ARCELLI FONTANA, F., Mäntylä, M., Zanoni, M., Marino, A. (2016). Comparing and experimenting machine learning techniques for code smell detection. EMPIRICAL SOFTWARE ENGINEERING, 21(3), 1143-1191 [10.1007/s10664-015-9378-4].

Comparing and experimenting machine learning techniques for code smell detection

ARCELLI FONTANA, FRANCESCA
Primo
;
ZANONI, MARCO
;
2016

Abstract

Several code smell detection tools have been developed providing different results, because smells can be subjectively interpreted, and hence detected, in different ways. In this paper, we perform the largest experiment of applying machine learning algorithms to code smells to the best of our knowledge. We experiment 16 different machine-learning algorithms on four code smells (Data Class, Large Class, Feature Envy, Long Method) and 74 software systems, with 1986 manually validated code smell samples. We found that all algorithms achieved high performances in the cross-validation data set, yet the highest performances were obtained by J48 and Random Forest, while the worst performance were achieved by support vector machines. However, the lower prevalence of code smells, i.e., imbalanced data, in the entire data set caused varying performances that need to be addressed in the future studies. We conclude that the application of machine learning to the detection of these code smells can provide high accuracy (>96 %), and only a hundred training examples are needed to reach at least 95 % accuracy.
Articolo in rivista - Articolo scientifico
Benchmark for code smell detection; Code smells detection; Machine learning techniques;
English
2016
21
3
1143
1191
reserved
ARCELLI FONTANA, F., Mäntylä, M., Zanoni, M., Marino, A. (2016). Comparing and experimenting machine learning techniques for code smell detection. EMPIRICAL SOFTWARE ENGINEERING, 21(3), 1143-1191 [10.1007/s10664-015-9378-4].
File in questo prodotto:
File Dimensione Formato  
ERA-EXTENSION-SUBMITTED-3.docx

Solo gestori archivio

Descrizione: articolo
Tipologia di allegato: Author’s Accepted Manuscript, AAM (Post-print)
Dimensione 601.83 kB
Formato Microsoft Word XML
601.83 kB Microsoft Word XML   Visualizza/Apri   Richiedi una copia
9-Comparing ML-ESE-Springer-2016.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 1.9 MB
Formato Adobe PDF
1.9 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/84895
Citazioni
  • Scopus 315
  • ???jsp.display-item.citation.isi??? 225
Social impact