This paper describes the ICoN corpus, a corpus of academic written Italian, some of the directions of research it could open, and some of the first outcomes of research conducted on it. The ICoN corpus includes 2,115,000 tokens written by students having Italian as L2 students (level B2 or higher) and 1,769,000 tokens written by students having Italian as L1; this makes it the largest corpus of its kind. The texts included in the corpus come from the online examinations taken by 787 different students for the ICoN Degree Program in Italian Language and Culture for foreign students and Italian citizens residing abroad. The texts were produced by students having 41 different L1s, and 18 different L1s are represented in the corpus by more than 20,000 tokens. The corpus is encoded in XML files; it can be freely queried online and it is available upon request for research purposes. The paper includes the discussion of preliminary research in the field of collocations, showing that, in the texts included in the corpus, while learners and natives do use multiword expressions in a similar way, learners can overuse relatively infrequent forms of multiword adverbials, or use some adverbials in a non-standard way.

Cominetti, F., Tavosanis, M. (2018). The ICoN Corpus of Academic Written Italian (L1 and L2). In H. Isahara, B. Maegaard, S. Piperidis, C. Cieri, T. Declerck, K. Hasida, et al. (a cura di), LREC 2018 - 11th International Conference on Language Resources and Evaluation (pp. 4077-4083). European Language Resources Association (ELRA).

The ICoN Corpus of Academic Written Italian (L1 and L2)

Cominetti, F;
2018

Abstract

This paper describes the ICoN corpus, a corpus of academic written Italian, some of the directions of research it could open, and some of the first outcomes of research conducted on it. The ICoN corpus includes 2,115,000 tokens written by students having Italian as L2 students (level B2 or higher) and 1,769,000 tokens written by students having Italian as L1; this makes it the largest corpus of its kind. The texts included in the corpus come from the online examinations taken by 787 different students for the ICoN Degree Program in Italian Language and Culture for foreign students and Italian citizens residing abroad. The texts were produced by students having 41 different L1s, and 18 different L1s are represented in the corpus by more than 20,000 tokens. The corpus is encoded in XML files; it can be freely queried online and it is available upon request for research purposes. The paper includes the discussion of preliminary research in the field of collocations, showing that, in the texts included in the corpus, while learners and natives do use multiword expressions in a similar way, learners can overuse relatively infrequent forms of multiword adverbials, or use some adverbials in a non-standard way.
Capitolo o saggio
Collocations; Corpus linguistics; Italian; Learners; Multiword expressions;
English
LREC 2018 - 11th International Conference on Language Resources and Evaluation
Isahara, H; Maegaard, B; Piperidis, S; Cieri, C; Declerck, T; Hasida, K; Mazo. H; Choukri, K; Goggi, S; Mariani, J; Moreno, A; Calzolari, N; Odijk, J; Tokunaga, T
2018
9791095546009
European Language Resources Association (ELRA)
4077
4083
Cominetti, F., Tavosanis, M. (2018). The ICoN Corpus of Academic Written Italian (L1 and L2). In H. Isahara, B. Maegaard, S. Piperidis, C. Cieri, T. Declerck, K. Hasida, et al. (a cura di), LREC 2018 - 11th International Conference on Language Resources and Evaluation (pp. 4077-4083). European Language Resources Association (ELRA).
open
File in questo prodotto:
File Dimensione Formato  
Cominetti-2018-LREC 2018-VoR.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 379.21 kB
Formato Adobe PDF
379.21 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/549362
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
Social impact