Large language models (LLMs) are increasingly gaining relevance in every-day life, due to their apparent ability in solving tasks that demand intricate linguistic comprehension. Recent studies state that one of the key points that impact their outcome is the quality of the prompt used to interact with them. This work proposes a grammar-based evolutionary approach for exploring the prompt space of LLMs, driven by a fitness function that aims at optimizing the performance on a given task. We tested our technique by steering two state-of-the-art models through evolved prompts, and by comparing the performance they obtain on 8 benchmark tasks with that obtained when using other baseline prompts on the same tasks, showing that in most cases our prompts yield better results. Further, we defined a constrained mutation operator that limits the changes to specific grammar non-terminals, allowing to study and highlight the elements in the prompt that mostly affect the output of the LLM. Finally, a thorough discussion points out some issues that limit the relevance of the emerging prompt engineering discipline, given the existence of many effective prompt structures and the possible diversity that can be observed in the LLM output given the same input to the model.
Saletta, M., Ferretti, C. (2024). Exploring the Prompt Space of Large Language Models through Evolutionary Sampling. In GECCO '24: Proceedings of the Genetic and Evolutionary Computation Conference (pp.1345-1353) [10.1145/3638529.3654049].
Exploring the Prompt Space of Large Language Models through Evolutionary Sampling
Saletta, Martina
Primo
;Ferretti, ClaudioSecondo
2024
Abstract
Large language models (LLMs) are increasingly gaining relevance in every-day life, due to their apparent ability in solving tasks that demand intricate linguistic comprehension. Recent studies state that one of the key points that impact their outcome is the quality of the prompt used to interact with them. This work proposes a grammar-based evolutionary approach for exploring the prompt space of LLMs, driven by a fitness function that aims at optimizing the performance on a given task. We tested our technique by steering two state-of-the-art models through evolved prompts, and by comparing the performance they obtain on 8 benchmark tasks with that obtained when using other baseline prompts on the same tasks, showing that in most cases our prompts yield better results. Further, we defined a constrained mutation operator that limits the changes to specific grammar non-terminals, allowing to study and highlight the elements in the prompt that mostly affect the output of the LLM. Finally, a thorough discussion points out some issues that limit the relevance of the emerging prompt engineering discipline, given the existence of many effective prompt structures and the possible diversity that can be observed in the LLM output given the same input to the model.File | Dimensione | Formato | |
---|---|---|---|
Saletta-2024-GECCO-VoR.pdf
accesso aperto
Descrizione: CC BY-ND 4.0
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
831.36 kB
Formato
Adobe PDF
|
831.36 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.