Document - Italian word embeddings for the medical domain

2024

Conference article Open Access

Italian word embeddings for the medical domain

Cardillo F. A., Debole F.

NLP Distributed Representations

Neural word embeddings have proven valuable in the development of medical applications. However, for the Italian language, there are no publicly available corpora, embeddings, or evaluation resources tailored to this domain. In this paper, we introduce an Italian corpus for the medical domain, that includes texts from Wikipedia, medical journals, drug leaflets, and specialized websites. Using this corpus, we generate neural word embeddings from scratch. These embeddings are then evaluated using standard evaluation resources, that we translated into Italian exploiting the concept graph in the UMLS Metathesaurus. Despite the relatively small size of the corpus, our experimental results indicate that the new embeddings correlate well with human judgments regarding the similarity and the relatedness of medical concepts. Moreover, these medical-specific embeddings outperform a baseline model trained on the full Wikipedia corpus, which includes the medical pages we used. We believe that our embeddings and the newly introduced textual resources will foster further advancements in the field of Italian medical Natural Language Processing.

Back to previous page

Cite as

BibTeX entry

@inproceedings{oai:iris.cnr.it:20.500.14243/505144,
	title = {Italian word embeddings for the medical domain},
	author = {Cardillo F.  A. and Debole F.},
	year = {2024}
}

CNR authors and affiliations

CNR authors

Cardillo, Franco Alberto
0000-0003-0940-4768
Debole, Franca
0000-0002-0369-6045

Laboratories

Infrastructures for Science (2021-ongoing)

Groups/Services

Servizio Infrastruttura Informatica ISTI e Supporto ai Servizi (2018-ongoing)

Download

CNR IRIS

Bibliographic record
Deposited version

Also available from

aclanthology.org

Projects (via OpenAIRE)

DeepHealth
Deep-Learning and HPC to Boost Biomedical Applications for Health
TAILOR
Foundations of Trustworthy AI - Integrating Reasoning, Learning and Optimization
STARWARS
STormwAteR and WastewAteR networkS heterogeneous data AI-driven management

Italian word embeddings for the medical domain

Share

Cite as

CNR authors and affiliations

Download

Projects (via OpenAIRE)