2021
Conference article  Open Access

Generalized funnelling: ensemble learning and heterogeneous document embeddings for cross-lingual text classification

Moreo A., Pedrotti A., Sebastiani F.

Transfer learning  Cross-lingual text classification  Ensemble learning  Word embeddings 

Funnelling (Fun) is a method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a metaclassifier that uses this vector as its input. In this paper we describe Generalized Funnelling (gFun), a generalization of Fun consisting of a HTL architecture in which 1st-tier components can be arbitrary view-generating functions, i.e., language-dependent functions that each produce a language-independent representation ("view") of the document. We describe an instance of gFun in which the metaclassifier receives as input a vector of calibrated posterior probabilities (as in Fun) aggregated to other embedded representations that embody other types of correlations. We describe preliminary results that we have obtained on a large standard dataset for multilingual multilabel text classification.

Source: IIR 2021 - 11th Italian Information Retrieval Workshop, Bari, Italy, 13-15/09/21



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:457947,
	title = {Generalized funnelling: ensemble learning and heterogeneous document embeddings for cross-lingual text classification},
	author = {Moreo A. and Pedrotti A. and Sebastiani F.},
	booktitle = {IIR 2021 - 11th Italian Information Retrieval Workshop, Bari, Italy, 13-15/09/21},
	year = {2021}
}
CNR ExploRA

Bibliographic record

ISTI Repository

Published version Open Access

Also available from

ceur-ws.orgOpen Access