Tasks related to Natural Language Processing (NLP) have recently been the focus of a large research endeavor by the machine learning community. The increased interest in this area is mainly due to the success of deep learning methods. Genetic Programming (GP), however, was not under the spotlight with respect to NLP tasks. Here, we propose a first proof-of-concept that combines GP with the well established NLP tool word2vec for the next word prediction task. The main idea is that, once words have been moved into a vector space, traditional GP operators can successfully work on vectors, thus producing meaningful words as the output. To assess the suitability of this approach, we perform an experimental evaluation on a set of existing newspaper headlines. Individuals resulting from this (pre-)training phase can be employed as the initial population in other NLP tasks, like sentence generation, which will be the focus of future investigations, possibly employing adversarial co-evolutionary approaches.

Manzoni, L., Jakobovic, D., Mariot, L., Picek, S., Castelli, M. (2020). Towards an evolutionary-based approach for natural language processing. In GECCO 2020 - Proceedings of the 2020 Genetic and Evolutionary Computation Conference (pp.985-993). Association for Computing Machinery [10.1145/3377930.3390248].

Towards an evolutionary-based approach for natural language processing

Manzoni, L;Mariot, L;
2020

Abstract

Tasks related to Natural Language Processing (NLP) have recently been the focus of a large research endeavor by the machine learning community. The increased interest in this area is mainly due to the success of deep learning methods. Genetic Programming (GP), however, was not under the spotlight with respect to NLP tasks. Here, we propose a first proof-of-concept that combines GP with the well established NLP tool word2vec for the next word prediction task. The main idea is that, once words have been moved into a vector space, traditional GP operators can successfully work on vectors, thus producing meaningful words as the output. To assess the suitability of this approach, we perform an experimental evaluation on a set of existing newspaper headlines. Individuals resulting from this (pre-)training phase can be employed as the initial population in other NLP tasks, like sentence generation, which will be the focus of future investigations, possibly employing adversarial co-evolutionary approaches.
paper
Genetic programming; Natural language processing; Next word prediction;
English
2020 Genetic and Evolutionary Computation Conference, GECCO 2020 - 8 July 2020 through 12 July 2020
2020
GECCO 2020 - Proceedings of the 2020 Genetic and Evolutionary Computation Conference
9781450371285
2020
985
993
partially_open
Manzoni, L., Jakobovic, D., Mariot, L., Picek, S., Castelli, M. (2020). Towards an evolutionary-based approach for natural language processing. In GECCO 2020 - Proceedings of the 2020 Genetic and Evolutionary Computation Conference (pp.985-993). Association for Computing Machinery [10.1145/3377930.3390248].
File in questo prodotto:
File Dimensione Formato  
Manzoni-2020-GECCO-Arxix-Preprint.pdf

accesso aperto

Tipologia di allegato: Submitted Version (Pre-print)
Licenza: Altro
Dimensione 740.55 kB
Formato Adobe PDF
740.55 kB Adobe PDF Visualizza/Apri
Manzoni-2020-GECCO-VoR.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 893.12 kB
Formato Adobe PDF
893.12 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/501799
Citazioni
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 3
Social impact