The classification of cancer patients into risk classes is a very active field of research, with direct clinical applications. We have recently compared several machine learning methods on the well known 70-genes signature dataset. In that study, genetic programming showed promising results, given that it outperformed all the other techniques. Nevertheless, the study was preliminary, mainly because the validation dataset was preprocessed and all its features binarized in order to use logical operators for the genetic programming functional nodes. If this choice allowed simple interpretation of the solutions from the biological viewpoint, on the other hand the binarization of data was limiting, since it amounts to a sizable loss of information. The goal of this paper is to overcome this limitation, using the 70-genes signature dataset with real-valued expression data. The results we present show that genetic programming using the number of incorrectly classified instances as fitness function is not able to outperform the other machine learning methods. However, when a weighted average between false positives and false negatives is used to calculate fitness values, genetic programming obtains performances that are comparable with the other methods in the minimization of incorrectly classified instances and outperforms all the other methods in the minimization of false negatives, which is one of the main goals in breast cancer clinical applications. Also in this case, the solutions returned by genetic programming are simple, easy to understand, and they use a rather limited subset of the available features.

Farinaccio, A., Giacobini, M., Mauri, G., Provero, P., Vanneschi, L. (2010). On the Use of Genetic Programming for the Prediction of Survival in Cancer. In Gecco 2010. Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (pp.163-170). New York : ACM Press [10.1145/1830483.1830514].

On the Use of Genetic Programming for the Prediction of Survival in Cancer

FARINACCIO, ANTONELLA;MAURI, GIANCARLO;VANNESCHI, LEONARDO
2010

Abstract

The classification of cancer patients into risk classes is a very active field of research, with direct clinical applications. We have recently compared several machine learning methods on the well known 70-genes signature dataset. In that study, genetic programming showed promising results, given that it outperformed all the other techniques. Nevertheless, the study was preliminary, mainly because the validation dataset was preprocessed and all its features binarized in order to use logical operators for the genetic programming functional nodes. If this choice allowed simple interpretation of the solutions from the biological viewpoint, on the other hand the binarization of data was limiting, since it amounts to a sizable loss of information. The goal of this paper is to overcome this limitation, using the 70-genes signature dataset with real-valued expression data. The results we present show that genetic programming using the number of incorrectly classified instances as fitness function is not able to outperform the other machine learning methods. However, when a weighted average between false positives and false negatives is used to calculate fitness values, genetic programming obtains performances that are comparable with the other methods in the minimization of incorrectly classified instances and outperforms all the other methods in the minimization of false negatives, which is one of the main goals in breast cancer clinical applications. Also in this case, the solutions returned by genetic programming are simple, easy to understand, and they use a rather limited subset of the available features.
slide + paper
Genetic Programming; machine learning; clustering
English
Gecco – Annual Conference on Genetic and Evolutionary Computation
2010
Gecco 2010. Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation
978-1-4503-0072-8
2010
163
170
none
Farinaccio, A., Giacobini, M., Mauri, G., Provero, P., Vanneschi, L. (2010). On the Use of Genetic Programming for the Prediction of Survival in Cancer. In Gecco 2010. Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (pp.163-170). New York : ACM Press [10.1145/1830483.1830514].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/17887
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
Social impact