Bicocca Open Archive

Pseudowords such as “knackets” or “spechy”—letter strings that are consistent with the orthotactical rules of a language but do not appear in its lexicon—are traditionally considered to be meaningless, and employed as such in empirical studies. However, recent studies that show specific semantic patterns associated with these words as well as semantic effects on human pseudoword processing have cast doubt on this view. While these studies suggest that pseudowords have meanings, they provide only extremely limited insight as to whether humans are able to ascribe explicit and declarative semantic content to unfamiliar word forms. In the present study, we employed an exploratory-confirmatory study design to examine this question. In a first exploratory study, we started from a pre-existing dataset of words and pseudowords alongside human-generated definitions for these items. Employing 18 different language models, we showed that the definitions actually produced for (pseudo)words were closer to their respective (pseudo)words than the definitions for the other items. Based on these initial results, we conducted a second, pre-registered, high-powered confirmatory study collecting a new, controlled set of (pseudo)word interpretations. This second study confirmed the results of the first one. Taken together, these findings support the idea that meaning construction is supported by a flexible form-to-meaning mapping system based on statistical regularities in the language environment that can accommodate novel lexical entries as soon as they are encountered.

de Varda, A., Gatti, D., Marelli, M., Günther, F. (2024). Meaning beyond lexicality: Capturing Pseudoword Definitions with Language Models. COMPUTATIONAL LINGUISTICS, 50(4), 1313-1343 [10.1162/coli_a_00527].

Meaning beyond lexicality: Capturing Pseudoword Definitions with Language Models

de Varda, Andrea Gregor;Gatti, Daniele;Marelli, Marco;Günther, Fritz

2024

Abstract

Pseudowords such as “knackets” or “spechy”—letter strings that are consistent with the orthotactical rules of a language but do not appear in its lexicon—are traditionally considered to be meaningless, and employed as such in empirical studies. However, recent studies that show specific semantic patterns associated with these words as well as semantic effects on human pseudoword processing have cast doubt on this view. While these studies suggest that pseudowords have meanings, they provide only extremely limited insight as to whether humans are able to ascribe explicit and declarative semantic content to unfamiliar word forms. In the present study, we employed an exploratory-confirmatory study design to examine this question. In a first exploratory study, we started from a pre-existing dataset of words and pseudowords alongside human-generated definitions for these items. Employing 18 different language models, we showed that the definitions actually produced for (pseudo)words were closer to their respective (pseudo)words than the definitions for the other items. Based on these initial results, we conducted a second, pre-registered, high-powered confirmatory study collecting a new, controlled set of (pseudo)word interpretations. This second study confirmed the results of the first one. Taken together, these findings support the idea that meaning construction is supported by a flexible form-to-meaning mapping system based on statistical regularities in the language environment that can accommodate novel lexical entries as soon as they are encountered.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				novel words; definitions; distributional semantics; large language models
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				16-set-2024
			
	Data di pubblicazione
	
				2024
			
	Rivista
	
				COMPUTATIONAL LINGUISTICS
			
	Numero del volume
	
				50
			
	Fascicolo
	
				4
			
	Pagina iniziale
	
				1313
			
	Pagina finale
	
				1343
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1162/coli_a_00527
			
	Fulltext
	
				partially_open
			
	Citazione
	
				de Varda, A., Gatti, D., Marelli, M., Günther, F. (2024). Meaning beyond lexicality: Capturing Pseudoword Definitions with Language Models. COMPUTATIONAL LINGUISTICS, 50(4), 1313-1343 [10.1162/coli_a_00527].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
deVarda-2024-Computational Linguistics-AAM.pdf Solo gestori archivio Tipologia di allegato: Author’s Accepted Manuscript, AAM (Post-print) Licenza: Tutti i diritti riservati Dimensione 1.48 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.48 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
deVarda-2024-Computational Linguistics-VoR.pdf accesso aperto Descrizione: Uncorrected Proof Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 1.85 MB Formato Adobe PDF Visualizza/Apri	1.85 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/512720

Citazioni

ND

0

Social impact