Bicocca Open Archive

In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations, namely outliers and data with incorrect labels, can strongly undermine the classifier performance, especially if the training size is small. The present work introduces a robust modification to the Model-Based Classification framework, employing impartial trimming and constraints on the ratio between the maximum and the minimum eigenvalue of the group scatter matrices. The proposed method effectively handles noise presence in both response and exploratory variables, providing reliable classification even when dealing with contaminated datasets. A robust information criterion is proposed for model selection. Experiments on real and simulated data, artificially adulterated, are provided to underline the benefits of the proposed method.

Cappozzo, A., Greselin, F., Murphy, T. (2020). A robust approach to model-based classification based on trimming and constraints: Semi-supervised learning in presence of outliers and label noise. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 14(2), 327-354 [10.1007/s11634-019-00371-w].

A robust approach to model-based classification based on trimming and constraints: Semi-supervised learning in presence of outliers and label noise

Cappozzo, Andrea;Greselin, Francesca;Murphy, Thomas Brendan

2020

Abstract

In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations, namely outliers and data with incorrect labels, can strongly undermine the classifier performance, especially if the training size is small. The present work introduces a robust modification to the Model-Based Classification framework, employing impartial trimming and constraints on the ratio between the maximum and the minimum eigenvalue of the group scatter matrices. The proposed method effectively handles noise presence in both response and exploratory variables, providing reliable classification even when dealing with contaminated datasets. A robust information criterion is proposed for model selection. Experiments on real and simulated data, artificially adulterated, are provided to underline the benefits of the proposed method.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Eigenvalues restrictions; Impartial trimming; Label noise; Model-based classification; Outliers detection; Robust estimation;
			
	Parole chiave
	
				Model-based classification; Label noise; Outliers detection; Impartial trimming; Eigenvalues restrictions; Robust estimation
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				14-ago-2019
			
	Data di pubblicazione
	
				2020
			
	Rivista
	
				ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
			
	Numero del volume
	
				14
			
	Fascicolo
	
				2
			
	Pagina iniziale
	
				327
			
	Pagina finale
	
				354
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1007/s11634-019-00371-w
			
	Fulltext
	
				none
			
	Citazione
	
				Cappozzo, A., Greselin, F., Murphy, T. (2020). A robust approach to model-based classification based on trimming and constraints: Semi-supervised learning in presence of outliers and label noise. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 14(2), 327-354 [10.1007/s11634-019-00371-w].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/240136

Citazioni

12

8

Social impact