Bicocca Open Archive

The interest in multitask and deep learning strategies has been increasing in the last few years, in application to large and complex dataset for quantitative structure-activity relationship (QSAR) analysis. Multitask approaches allow the simultaneous prediction of molecular properties that are related, through information sharing, whereas deep learning strategies increase the potential of capturing nonlinear relationships. In this work, we compare the binary classification capability of multitask deep and shallow neural networks to single-task strategies used as benchmark (i.e., as k-nearest neighbours, N-nearest neighbours, random forest and Naïve Bayes), as well as multitask supervised self-organizing maps. Comparison was carried out with an extended QSAR dataset containing annotations of molecular binding, agonism and antagonism activity on 11 nuclear receptors, for a total of 14,963 molecules, divided into training and test sets and labelled for their bioactivity on at least one of 30 binary tasks. Additional 304 chemicals were used as external evaluation set to further validate models. Although no approach systematically overperformed the others, task-specific differences were found, suggesting the benefit of multitask learning for tasks that are less represented. On average, some of the single-task approaches and multitask deep learning strategies had similar performances. However, the latter can have advantages, such as a simpler management of predictions and applicability domain assessment for future samples. On the other hand, the parameter tuning required by neural networks are generally time expensive suggesting that the modelling strategy should be evaluated case by case.

Valsecchi, C., Collarile, M., Grisoni, F., Todeschini, R., Ballabio, D., Consonni, V. (2022). Predicting molecular activity on nuclear receptors by multitask neural networks. JOURNAL OF CHEMOMETRICS, 36(2 (February 2022)) [10.1002/cem.3325].

Predicting molecular activity on nuclear receptors by multitask neural networks

Valsecchi C.;Collarile M.;Grisoni F.;Todeschini R.;Ballabio D.;Consonni V.

2022

Abstract

The interest in multitask and deep learning strategies has been increasing in the last few years, in application to large and complex dataset for quantitative structure-activity relationship (QSAR) analysis. Multitask approaches allow the simultaneous prediction of molecular properties that are related, through information sharing, whereas deep learning strategies increase the potential of capturing nonlinear relationships. In this work, we compare the binary classification capability of multitask deep and shallow neural networks to single-task strategies used as benchmark (i.e., as k-nearest neighbours, N-nearest neighbours, random forest and Naïve Bayes), as well as multitask supervised self-organizing maps. Comparison was carried out with an extended QSAR dataset containing annotations of molecular binding, agonism and antagonism activity on 11 nuclear receptors, for a total of 14,963 molecules, divided into training and test sets and labelled for their bioactivity on at least one of 30 binary tasks. Additional 304 chemicals were used as external evaluation set to further validate models. Although no approach systematically overperformed the others, task-specific differences were found, suggesting the benefit of multitask learning for tasks that are less represented. On average, some of the single-task approaches and multitask deep learning strategies had similar performances. However, the latter can have advantages, such as a simpler management of predictions and applicability domain assessment for future samples. On the other hand, the parameter tuning required by neural networks are generally time expensive suggesting that the modelling strategy should be evaluated case by case.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				classification; deep learning; genetic algorithms; multitask; nuclear receptors; QSAR;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				9-dic-2020
			
	Data di pubblicazione
	
				2022
			
	Rivista
	
				JOURNAL OF CHEMOMETRICS
			
	Numero del volume
	
				36
			
	Fascicolo
	
				2 (February 2022)
			
	Article number
	
				e3325
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1002/cem.3325
			
	Fulltext
	
				partially_open
			
	Citazione
	
				Valsecchi, C., Collarile, M., Grisoni, F., Todeschini, R., Ballabio, D., Consonni, V. (2022). Predicting molecular activity on nuclear receptors by multitask neural networks. JOURNAL OF CHEMOMETRICS, 36(2 (February 2022)) [10.1002/cem.3325].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Valsecchi-2022-J Chemometrics-VoR.pdf Solo gestori archivio Descrizione: articolo Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Tutti i diritti riservati Dimensione 3.82 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.82 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
Valsecchi-2022-J Chemometrics-AAM.pdf accesso aperto Descrizione: This is the peer reviewed version of the article. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions. This article may not be enhanced, enriched or otherwise transformed into a derivative work, without express permission from Wiley or by statutory rights under applicable legislation. Copyright notices must not be removed, obscured or modified. Tipologia di allegato: Author’s Accepted Manuscript, AAM (Post-print) Licenza: Altro Dimensione 2.52 MB Formato Adobe PDF Visualizza/Apri	2.52 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/354371

Citazioni

14

17

Social impact