Bicocca Open Archive

Background: With the growing popularity of using QSAR predictions towards regulatory purposes, such predictive models are now required to be strictly validated, an essential feature of which is to have the model's Applicability Domain (AD) defined clearly. Although in recent years several different approaches have been proposed to address this goal, no optimal approach to define the model's AD has yet been recognized. Results: This study proposes a novel descriptor-based AD method which accounts for the data distribution and exploits k-Nearest Neighbours (kNN) principle to derive a heuristic decision rule. The proposed method is a three-stage procedure to address several key aspects relevant in judging the reliability of QSAR predictions. Inspired from the adaptive kernel method for probability density function estimation, the first stage of the approach defines a pattern of thresholds corresponding to the various training samples and these thresholds are later used to derive the decision rule. Criterion deciding if a given test sample will be retained within the AD is defined in the second stage of the approach. Finally, the last stage tries reflecting upon the reliability in derived results taking model statistics and prediction error into account. Conclusions: The proposed approach addressed a novel strategy that integrated the kNN principle to define the AD of QSAR models. Relevant features that characterize the proposed AD approach include: a) adaptability to local density of samples, useful when the underlying multivariate distribution is asymmetric, with wide regions of low data density; b) unlike several kernel density estimators (KDE), effectiveness also in high-dimensional spaces; c) low sensitivity to the smoothing parameter k; and d) versatility to implement various distances measures. The results derived on a case study provided a clear understanding of how the approach works and defines the model's AD for reliable predictions. © 2013 Sahigara et al.; licensee Chemistry Central Ltd.

Sahigara, F., Ballabio, D., Todeschini, R., Consonni, V. (2013). Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions. JOURNAL OF CHEMINFORMATICS, 5(5) [10.1186/1758-2946-5-27].

Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions

Sahigara, F;BALLABIO, DAVIDE;TODESCHINI, ROBERTO;CONSONNI, VIVIANA

2013

Abstract

Background: With the growing popularity of using QSAR predictions towards regulatory purposes, such predictive models are now required to be strictly validated, an essential feature of which is to have the model's Applicability Domain (AD) defined clearly. Although in recent years several different approaches have been proposed to address this goal, no optimal approach to define the model's AD has yet been recognized. Results: This study proposes a novel descriptor-based AD method which accounts for the data distribution and exploits k-Nearest Neighbours (kNN) principle to derive a heuristic decision rule. The proposed method is a three-stage procedure to address several key aspects relevant in judging the reliability of QSAR predictions. Inspired from the adaptive kernel method for probability density function estimation, the first stage of the approach defines a pattern of thresholds corresponding to the various training samples and these thresholds are later used to derive the decision rule. Criterion deciding if a given test sample will be retained within the AD is defined in the second stage of the approach. Finally, the last stage tries reflecting upon the reliability in derived results taking model statistics and prediction error into account. Conclusions: The proposed approach addressed a novel strategy that integrated the kNN principle to define the AD of QSAR models. Relevant features that characterize the proposed AD approach include: a) adaptability to local density of samples, useful when the underlying multivariate distribution is asymmetric, with wide regions of low data density; b) unlike several kernel density estimators (KDE), effectiveness also in high-dimensional spaces; c) low sensitivity to the smoothing parameter k; and d) versatility to implement various distances measures. The results derived on a case study provided a clear understanding of how the approach works and defines the model's AD for reliable predictions. © 2013 Sahigara et al.; licensee Chemistry Central Ltd.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				QSAR, Applicability domain, kNN, Nearest neighbour, Model validation
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2013
			
	Rivista
	
				JOURNAL OF CHEMINFORMATICS
			
	Numero del volume
	
				5
			
	Fascicolo
	
				5
			
	Article number
	
				27
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1186/1758-2946-5-27
			
	Fulltext
	
				open
			
	Citazione
	
				Sahigara, F., Ballabio, D., Todeschini, R., Consonni, V. (2013). Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions. JOURNAL OF CHEMINFORMATICS, 5(5) [10.1186/1758-2946-5-27].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Sahigara_Ballabio_JCheminformatics_2013.pdf accesso aperto Dimensione 750.86 kB Formato Adobe PDF Visualizza/Apri	750.86 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/44582

Citazioni

82

79

Social impact