Bicocca Open Archive

The classification of cancer patients into risk classes is a very active field of research, with direct clinical applications. We have recently compared several machine learning methods on the well known 70-genes signature dataset. In that study, genetic programming showed promising results, given that it outperformed all the other techniques. Nevertheless, the study was preliminary, mainly because the validation dataset was preprocessed and all its features binarized in order to use logical operators for the genetic programming functional nodes. If this choice allowed simple interpretation of the solutions from the biological viewpoint, on the other hand the binarization of data was limiting, since it amounts to a sizable loss of information. The goal of this paper is to overcome this limitation, using the 70-genes signature dataset with real-valued expression data. The results we present show that genetic programming using the number of incorrectly classified instances as fitness function is not able to outperform the other machine learning methods. However, when a weighted average between false positives and false negatives is used to calculate fitness values, genetic programming obtains performances that are comparable with the other methods in the minimization of incorrectly classified instances and outperforms all the other methods in the minimization of false negatives, which is one of the main goals in breast cancer clinical applications. Also in this case, the solutions returned by genetic programming are simple, easy to understand, and they use a rather limited subset of the available features.

Farinaccio, A., Giacobini, M., Mauri, G., Provero, P., Vanneschi, L. (2010). On the Use of Genetic Programming for the Prediction of Survival in Cancer. In Gecco 2010. Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (pp.163-170). New York : ACM Press [10.1145/1830483.1830514].

On the Use of Genetic Programming for the Prediction of Survival in Cancer

FARINACCIO, ANTONELLA;Giacobini, M;MAURI, GIANCARLO;Provero, P;VANNESCHI, LEONARDO

2010

Abstract

The classification of cancer patients into risk classes is a very active field of research, with direct clinical applications. We have recently compared several machine learning methods on the well known 70-genes signature dataset. In that study, genetic programming showed promising results, given that it outperformed all the other techniques. Nevertheless, the study was preliminary, mainly because the validation dataset was preprocessed and all its features binarized in order to use logical operators for the genetic programming functional nodes. If this choice allowed simple interpretation of the solutions from the biological viewpoint, on the other hand the binarization of data was limiting, since it amounts to a sizable loss of information. The goal of this paper is to overcome this limitation, using the 70-genes signature dataset with real-valued expression data. The results we present show that genetic programming using the number of incorrectly classified instances as fitness function is not able to outperform the other machine learning methods. However, when a weighted average between false positives and false negatives is used to calculate fitness values, genetic programming obtains performances that are comparable with the other methods in the minimization of incorrectly classified instances and outperforms all the other methods in the minimization of false negatives, which is one of the main goals in breast cancer clinical applications. Also in this case, the solutions returned by genetic programming are simple, easy to understand, and they use a rather limited subset of the available features.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				slide + paper
			
	Parole chiave
	
				Genetic Programming; machine learning; clustering
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				Gecco – Annual Conference on Genetic and Evolutionary Computation
			
	Anno del convegno
	
				2010
			
	Titolo degli atti
	
				Gecco 2010. Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation
			
	ISBN del volume degli atti
	
				978-1-4503-0072-8
			
	Data di pubblicazione
	
				2010
			
	Pagina iniziale
	
				163
			
	Pagina finale
	
				170
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1145/1830483.1830514
			
	Fulltext
	
				none
			
	Citazione
	
				Farinaccio, A., Giacobini, M., Mauri, G., Provero, P., Vanneschi, L. (2010). On the Use of Genetic Programming for the Prediction of Survival in Cancer. In Gecco 2010. Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (pp.163-170). New York : ACM Press [10.1145/1830483.1830514].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/17887

Citazioni

1

ND

Social impact