Document - Word embeddings go to Italy: A comparison of models and training datasets

2015

Conference article Restricted

Word embeddings go to Italy: A comparison of models and training datasets

Berardi G., Esuli A., Marcheggiani D.

Glove Word embeddings Word2vec

In this paper we present some preliminary results on the generation of word embeddings for the Italian language. We compare two popular word representation models, word2vec and GloVe, and train them on two datasets with different stylistic properties. We test the generated word embeddings on a word analogy test derived from the one originally proposed for word2vec, adapted to capture some of the linguistic aspects that are specific of Italian. Results show that the tested models are able to create syntactically and semantically meaningful word embeddings despite the higher morphological complexity of Italian with respect to English. Moreover, we have found that the stylistic properties of the training dataset plays a relevant role in the type of information captured by the produced vectors.

Source: 6th Italian Information Retrieval Workshop, Cagliari, 25-26/05/2015

Back to previous page

Cite as

BibTeX entry

@inproceedings{oai:it.cnr:prodotti:344516,
	title = {Word embeddings go to Italy: A comparison of models and training datasets},
	author = {Berardi G. and Esuli A. and Marcheggiani D.},
	booktitle = {6th Italian Information Retrieval Workshop, Cagliari, 25-26/05/2015},
	year = {2015}
}

CNR authors and affiliations

CNR authors

Berardi, Giacomo
Esuli, Andrea
0000-0002-5725-4322
Marcheggiani, Diego

Laboratories

Networked Multimedia Information System (2002-2020)

Download

CNR ExploRA

Bibliographic record

Also available from

ceur-ws.org

Word embeddings go to Italy: A comparison of models and training datasets

Share

Cite as

CNR authors and affiliations

Download