Automating Gender-Inclusive Language Modification in Italian University Administrative Documents

Cerabolini, A; Pasi, G; Viviani, M

doi:10.1007/978-3-031-70239-6_23

In this work, we address the issue of automating the identification of non-inclusive language in administrative documents of Italian universities as well as providing gender-inclusive corrections. To achieve this objective, data from various Italian universities were gathered, leading to the creation of a dictionary containing potentially non-inclusive terms, and of a dataset containing gender non-inclusive sentences and their corresponding inclusive versions. Subsequently, three distinct approaches have been defined and evaluated: a rule-based and two neural approaches. In the development of the rule-based approach, Italian Part-of-Speech tagging, dependency parsing, and morphologization techniques were employed to detect masculine trigger words within sentences, ascertain whether they functioned as generic masculine terms, and offer gender-inclusive alternatives. In contrast, for the implementation of the two neural approaches, both the mT5 model and ChatGPT were utilized, and their respective outputs were compared against the rewritten sentences they generated. The experimental evaluations conducted suggest the effectiveness of the proposed solutions.

Cerabolini, A., Pasi, G., Viviani, M. (2024). Automating Gender-Inclusive Language Modification in Italian University Administrative Documents. In Natural Language Processing and Information Systems 29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Turin, Italy, June 25–27, 2024, Proceedings, Part I (pp.333-347). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-70239-6_23].

Automating Gender-Inclusive Language Modification in Italian University Administrative Documents

Cerabolini, Aurora;Pasi, Gabriella;Viviani, Marco

2024

Abstract

In this work, we address the issue of automating the identification of non-inclusive language in administrative documents of Italian universities as well as providing gender-inclusive corrections. To achieve this objective, data from various Italian universities were gathered, leading to the creation of a dictionary containing potentially non-inclusive terms, and of a dataset containing gender non-inclusive sentences and their corresponding inclusive versions. Subsequently, three distinct approaches have been defined and evaluated: a rule-based and two neural approaches. In the development of the rule-based approach, Italian Part-of-Speech tagging, dependency parsing, and morphologization techniques were employed to detect masculine trigger words within sentences, ascertain whether they functioned as generic masculine terms, and offer gender-inclusive alternatives. In contrast, for the implementation of the two neural approaches, both the mT5 model and ChatGPT were utilized, and their respective outputs were compared against the rewritten sentences they generated. The experimental evaluations conducted suggest the effectiveness of the proposed solutions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				ChatGPT; Gender Bias; Inclusive Language; Large Language Models (LLMs); Natural Language Processing (NLP);
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024 - June 25–27, 2024
			
	Anno del convegno
	
				2024
			
	Titolo degli atti
	
				Natural Language Processing and Information Systems
29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Turin, Italy, June 25–27, 2024, Proceedings, Part I
			
	ISBN del volume degli atti
	
				9783031702389
			
	Collana o serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Data di pubblicazione
	
				2024
			
	Numero del volume
	
				14762 LNCS
			
	Pagina iniziale
	
				333
			
	Pagina finale
	
				347
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1007/978-3-031-70239-6_23
			
	Fulltext
	
				none
			
	Citazione
	
				Cerabolini, A., Pasi, G., Viviani, M. (2024). Automating Gender-Inclusive Language Modification in Italian University Administrative Documents. In Natural Language Processing and Information Systems
29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Turin, Italy, June 25–27, 2024, Proceedings, Part I (pp.333-347). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-70239-6_23].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/517899

Citazioni

0

ND

Bicocca Open Archive

Automating Gender-Inclusive Language Modification in Italian University Administrative Documents

Cerabolini, Aurora;Pasi, Gabriella;Viviani, Marco

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

Social impact

Bicocca Open Archive

Automating Gender-Inclusive Language Modification in Italian University Administrative Documents

Cerabolini, Aurora;Pasi, Gabriella;Viviani, Marco

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Citazioni

Social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)