Bicocca Open Archive

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data. However, the resulting word groups are often not coherent, making them harder to interpret. Recently, neural topic models have shown improvements in overall coherence. Concurrently, contextual embeddings have advanced the state of the art of neural models in general. In this paper, we combine contextualized representations with neural topic models. We find that our approach produces more meaningful and coherent topics than traditional bag-of-words topic models and recent neural models. Our results indicate that future improvements in language models will translate into better topic models.

Bianchi, F., Terragni, S., Hovy, D. (2021). Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence. In ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp.759-766). 209 N EIGHTH STREET, STROUDSBURG, PA 18360 USA : Association for Computational Linguistics (ACL) [10.18653/v1/2021.acl-short.96].

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Bianchi, Federico^Primo;Terragni, Silvia^Secondo;Hovy, Dirk^Ultimo

2021

Abstract

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data. However, the resulting word groups are often not coherent, making them harder to interpret. Recently, neural topic models have shown improvements in overall coherence. Concurrently, contextual embeddings have advanced the state of the art of neural models in general. In this paper, we combine contextualized representations with neural topic models. We find that our approach produces more meaningful and coherent topics than traditional bag-of-words topic models and recent neural models. Our results indicate that future improvements in language models will translate into better topic models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				slide + paper
			
	Parole chiave
	
				topic modeling; natural language processing;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021 - 1 August 2021 through 6 August 2021
			
	Anno del convegno
	
				2021
			
	Titolo degli atti
	
				ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference
			
	ISBN del volume degli atti
	
				978-195408552-7
			
	Data di pubblicazione
	
				2021
			
	Numero del volume
	
				2
			
	Pagina iniziale
	
				759
			
	Pagina finale
	
				766
			
	DOI dell'intervento
	
				https://dx.doi.org/10.18653/v1/2021.acl-short.96
			
	Fulltext
	
				open
			
	Citazione
	
				Bianchi, F., Terragni, S., Hovy, D. (2021). Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence. In ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp.759-766). 209 N EIGHTH STREET, STROUDSBURG, PA 18360 USA : Association for Computational Linguistics (ACL) [10.18653/v1/2021.acl-short.96].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
unpaywall-bitstream-979328762.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 275.31 kB Formato Adobe PDF Visualizza/Apri	275.31 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/332958

Citazioni

152

80

Social impact