Moreo A., Pedrotti A., Sebastiani F.
Transfer learning Cross-lingual text classification Ensemble learning Word embeddings
Funnelling (Fun) is a method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a metaclassifier that uses this vector as its input. In this paper we describe Generalized Funnelling (gFun), a generalization of Fun consisting of a HTL architecture in which 1st-tier components can be arbitrary view-generating functions, i.e., language-dependent functions that each produce a language-independent representation ("view") of the document. We describe an instance of gFun in which the metaclassifier receives as input a vector of calibrated posterior probabilities (as in Fun) aggregated to other embedded representations that embody other types of correlations. We describe preliminary results that we have obtained on a large standard dataset for multilingual multilabel text classification.
Source: IIR 2021 - 11th Italian Information Retrieval Workshop, Bari, Italy, 13-15/09/21
@inproceedings{oai:it.cnr:prodotti:457947, title = {Generalized funnelling: ensemble learning and heterogeneous document embeddings for cross-lingual text classification}, author = {Moreo A. and Pedrotti A. and Sebastiani F.}, booktitle = {IIR 2021 - 11th Italian Information Retrieval Workshop, Bari, Italy, 13-15/09/21}, year = {2021} }