Document - Generalized funnelling: ensemble learning and heterogeneous document embeddings for cross-lingual text classification

2021

Conference article Open Access

Generalized funnelling: ensemble learning and heterogeneous document embeddings for cross-lingual text classification

Moreo A., Pedrotti A., Sebastiani F.

Transfer learning Cross-lingual text classification Ensemble learning Word embeddings

Funnelling (Fun) is a method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a metaclassifier that uses this vector as its input. In this paper we describe Generalized Funnelling (gFun), a generalization of Fun consisting of a HTL architecture in which 1st-tier components can be arbitrary view-generating functions, i.e., language-dependent functions that each produce a language-independent representation ("view") of the document. We describe an instance of gFun in which the metaclassifier receives as input a vector of calibrated posterior probabilities (as in Fun) aggregated to other embedded representations that embody other types of correlations. We describe preliminary results that we have obtained on a large standard dataset for multilingual multilabel text classification.

Source: IIR 2021 - 11th Italian Information Retrieval Workshop, Bari, Italy, 13-15/09/21

Back to previous page

Cite as

BibTeX entry

@inproceedings{oai:it.cnr:prodotti:457947,
	title = {Generalized funnelling: ensemble learning and heterogeneous document embeddings for cross-lingual text classification},
	author = {Moreo A. and Pedrotti A. and Sebastiani F.},
	booktitle = {IIR 2021 - 11th Italian Information Retrieval Workshop, Bari, Italy, 13-15/09/21},
	year = {2021}
}

CNR authors and affiliations

CNR authors

Moreo Fernandez, Alejandro David
0000-0002-0377-1025
Pedrotti, Andrea
0000-0002-2322-7043
Sebastiani, Fabrizio
0000-0003-4221-6427

Laboratories

Artificial Intelligence for Media and Humanities (2021-ongoing)

Download

CNR ExploRA

Bibliographic record

ISTI Repository

Published version

Also available from

ceur-ws.org

Generalized funnelling: ensemble learning and heterogeneous document embeddings for cross-lingual text classification

Share

Cite as

CNR authors and affiliations

Download