Journal article  Open Access

Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Cross-Lingual Text Classification

Esuli A., Moreo Fernandez A. D., Sebastiani F.

Computer Science - Machine Learning  Machine Learning (stat.ML)  Statistics - Machine Learning  Management and Accounting  Information Systems  Utility Theory  Information Retrieval (cs.IR)  Semi-automated Text Classification  FOS: Computer and information sciences  Computer Science - Information Retrieval  Computer Science Applications  Artificial Intelligence (cs.AI)  General Business  E-discovery  Technology-Assisted Review  Machine Learning (cs.LG)  Computer Science - Artificial Intelligence 

Cross-lingual Text Classification(CLC) consists of automatically classifying, according to a common setCofclasses, documents each written in one of a set of languagesL, and doing so more accurately than when"naïvely" classifying each document via its corresponding language-specific classifier. In order to obtain anincrease in the classification accuracy for a given language, the system thus needs to also leverage the trainingexamples written in the other languages. We tackle "multilabel" CLC viafunnelling, a new ensemble learningmethod that we propose here. Funnelling consists of generating a two-tier classification system where alldocuments, irrespectively of language, are classified by the same (2nd-tier) classifier. For this classifier alldocuments are represented in a common, language-independent feature space consisting of the posteriorprobabilities generated by 1st-tier, language-dependent classifiers. This allows the classification of all testdocuments, of any language, to benefit from the information present in all training documents, of any language.We present substantial experiments, run on publicly available multilingual text collections, in which funnellingis shown to significantly outperform a number of state-of-the-art baselines. All code and datasets (in vectorform) are made publicly available.

Source: ACM transactions on information systems 37 (2019): 1–30. doi:10.1145/3326065

Publisher: Association for Computing Machinery,, New York, NY , Stati Uniti d'America

[1] Georgios Balikas and Massih-Reza Amini. 2016. Multi-label, multi-class classi€cation using polylingual embeddings. In Proceedings of the 38th European Conference on Information Retrieval (ECIR 2016). Padova, IT, 723-728. hŠps: //doi.org/10.1007/978-3-319-30671-1 59
[2] Nuria Bel, Cornelis H. Koster, and Marta Villegas. 2003. Cross-lingual text categorization. In Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2003). Trondheim, NO, 126-139. hŠps://doi.org/10.1007/978-3-540-45175-4 13
[3] Natalia Y. Bilenko and Jack L. Gallant. 2016. Pyrcca: Regularized kernel canonical correlation analysis in Python and its applications to neuroimaging. Frontiers in Neuroinformatics 10 (2016), 49. hŠps://doi.org/10.3389/fninf.2016.00049
[4] Christopher M. Bishop. 2006. Paˆern Recognition and Machine Learning. Springer, Heidelberg, DE.
[5] Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (1996), 123-140. hŠps://doi.org/10.1007/bf00058655
[6] Philip K. Chan and Salvatore J. Stolfo. 1997. On the Accuracy of Meta-Learning for Scalable Data Mining. Journal of Intelligent Information Systems 8, 1 (1997), 5-28. hŠps://doi.org/10.1023/A:1008640732416
[7] Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, and Herve´ Je´gou. 2018. Word Translation without Parallel Data. In Proceedings of the 6th International Conference on Learning Representations (ICLR 2018). Vancouver, CA.
[8] Oscar Day and Taghi M. Khoshgo‰aar. 2017. A survey on heterogeneous transfer learning. Journal of Big Data 4 (2017), Article 17 (1-42). hŠps://doi.org/10.1186/s40537-017-0089-0
[9] Morris H. DeGroot and Stephen E. Fienberg. 1983. Œe comparison and evaluation of forecasters. Še Statistician 32, 1/2 (1983), 12-22. hŠps://doi.org/10.2307/2987588
[10] Susan T. Dumais, Todd A. Letsche, Michael L. LiŠman, and Œomas K. Landauer. 1997. Automatic cross-language retrieval using latent semantic indexing. In Working Notes of the AAAI Spring Symposium on Cross-language Text and Speech Retrieval. Stanford, US, 18-24. hŠps://doi.org/10.1007/978-1-4615-5661-9 5
[11] Saso Dzˇeroski and Bernard Zˇ enko. 2004. Is Combining Classi€ers with Stacking BeŠer than Selecting the Best One? Machine Learning 54, 3 (2004), 255-273. hŠps://doi.org/10.1023/b:mach.0000015881.36452.6e
[12] Manaal Faruqui and Chris Dyer. 2014. Improving Vector Space Word Representations Using Multilingual Correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014). Gothenburg, SE, 462-471. hŠps://doi.org/10.3115/v1/e14-1049
[13] Marc Franco-Salvador, Paolo Rosso, and Roberto Navigli. 2014. A Knowledge-based Representation for Cross-Language Document Retrieval and Categorization. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014). Gothenburg, SE, 414-423. hŠps://doi.org/10.3115/v1/e14-1044
[14] Yoav Freund and Robert E. Schapire. 1996. Experiments with a New Boosting Algorithm. In Proceedings of the 13th International Conference on Machine Learning (ICML 1996). Bari, IT, 148-156.
[15] Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis. In Proceedings of the 20th International Joint Conference on Arti€cal Intelligence (IJCAI 2007). San Francisco, US, 1606-1611.
[16] Juan Jose´ Garc´ıa Adeva, Rafael A. Calvo, and Diego Lo´pez de Ipin´a. 2005. Multilingual approaches to text categorisation. European Journal for the Informatics Professional 5, 3 (2005), 43-51.
[17] Shantanu Godbole and Sunita Sarawagi. 2004. Discriminative Methods for Multi-labeled Classi€cation. In Proceedings of the 8th Paci€c-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004). Sydney, AU, 22-30. hŠps: //doi.org/10.1007/978-3-540-24775-3 5
[18] Stephan Gouws, Yoshua Bengio, and Greg Corrado. 2015. Bilbowa: Fast bilingual distributed representations without word alignments. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015). Lille, FR, 748-756.
[19] Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).
[20] David R. Hardoon, Sandor Szedmak, and John Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16, 12 (2004), 2639-2664. hŠps://doi.org/10.1162/0899766042321814
[21] Sepp Hochreiter and Ju¨rgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735-1780.
[22] Harold Hotelling. 1936. Relations between two sets of variates. Biometrika 28, 3/4 (1936), 321-377. hŠps://doi.org/10. 2307/2333955
[23] Alexandre Klementiev, Ivan Titov, and Binod BhaŠarai. 2012. Inducing Crosslingual Distributed Representations of Words. In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012). Mumbai, IN, 1459-1474.
[24] Ludmila I. Kuncheva. 2004. Combining Paˆern Classi€ers: Methods and Algorithms. John Wiley & Sons, Hoboken, US.
[25] David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research 5 (2004), 361-397.
[26] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Je‚rey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS 2013). Lake Tahoe, US, 3111-3119.
[27] David Mimno, Hanna M. Wallach, Jason Naradowsky, David A. Smith, and Andrew McCallum. 2009. Polylingual topic models. In Proceedings of the 7th Conference on Empirical Methods in Natural Language Processing (EMNLP 2009). Singapore, SN, 880-889. hŠps://doi.org/10.3115/1699571.1699627
[28] Alejandro Moreo, Andrea Esuli, and Fabrizio Sebastiani. 2016. Distributional Correspondence Indexing for CrossLingual and Cross-Domain Sentiment Classi€cation. Journal of Arti€cial Intelligence Research 55 (2016), 131-163.
[29] Alejandro Moreo, Andrea Esuli, and Fabrizio Sebastiani. 2016. Lightweight Random Indexing for Polylingual Text Classi€cation. Journal of Arti€cial Intelligence Research 57 (2016), 151-185.
[30] Steven R. Ness, Anthony Œeocharis, George Tzanetakis, and Luis G. Martins. 2009. Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs. In Proceedings of the 17th International Conference on Multimedia (MM 2009). Vancouver, CA, 705-708. hŠps://doi.org/10.1145/1631272.1631393
[31] Weike Pan, Erheng Zhong, and Qiang Yang. 2012. Transfer Learning for Text Mining. In Mining Text Data, Charu C. Aggarwal and ChengXiang Zhai (Eds.). Springer, Heidelberg, DE, 223-258. hŠps://doi.org/10.1007/978-1-4614-3223-4 7
[32] John C. PlaŠ. 2000. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Margin Classi€ers, Alexander Smola, Peter BartleŠ, Bernard Scho¨lkopf, and Dale Schuurmans (Eds.). Œe MIT Press, Cambridge, MA, 61-74.
[33] Peter PreŠenhofer and Benno Stein. 2010. Cross-language text classi€cation using structural correspondence learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010). Uppsala, SE, 1118-1127.
[34] Leonardo Rigutini, Marco Maggini, and Bing Liu. 2005. An EM-based training algorithm for cross-language text categorization. In Proceedings of the 3rd IEEE/WIC/ACM International Conference on Web Intelligence (WI 2005). Compie`gne, FR, 529-535. hŠps://doi.org/10.1109/wi.2005.29
[35] Magnus Sahlgren. 2005. An introduction to random indexing. In Proceedings of the Workshop on Methods and Applications of Semantic Indexing. Copenhagen, DK.
[36] Magnus Sahlgren and Rickard Co¨ster. 2004. Using bag-of-concepts to improve the performance of support vector machines in text categorization. In Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004). Geneva, CH, 487. hŠps://doi.org/10.3115/1220355.1220425
[37] Georgios Sakkis, Ion Androutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Constantine D. Spyropoulos, and Panagiotis Stamatopoulos. 2001. Stacking classi€ers for anti-spam €ltering of e-mail. In Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing (EMNLP 2001). PiŠsburgh, US, 44-50.
[38] Fabrizio Sebastiani. 2015. An Axiomatically Derived Measure for the Evaluation of Classi€cation Algorithms. In Proceedings of the 5th ACM International Conference on the Šeory of Information Retrieval (ICTIR 2015). Northampton, US, 11-20. hŠps://doi.org/10.1145/2808194.2809449
[39] Yangqiu Song, Shyam Upadhyay, Haoruo Peng, and Dan Roth. 2016. Cross-Lingual Dataless Classi€cation for Many Languages.. In Proceedings of the 26th International Joint Conference on Arti€cial Intelligence (IJCAI 2016). New York, US, 2901-2907.
[40] Philipp Sorg and Philipp Cimiano. 2008. Cross-language Information Retrieval with Explicit Semantic Analysis. In Working Notes of the 2008 Cross-Language Evaluation Forum (CLEF 2008). Aarhus, DE.
[41] Philipp Sorg and Philipp Cimiano. 2012. Exploiting Wikipedia for cross-lingual and multilingual information retrieval. Data and Knowledge Engineering 74 (2012), 26-45. hŠps://doi.org/10.1016/j.datak.2012.02.003
[42] Ralf Steinberger, Bruno Pouliquen, Anna Widiger, Camelia Ignat, Tomaz Erjavec, Dan Tu€s, and Da´niel Varga. 2006. Œe JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. (2006). CoRR abs/cs/0609058.
[43] Kai Ming Ting and Ian H. WiŠen. 1999. Issues in Stacked Generalization. Journal of Arti€cial Intelligence Research 10 (1999), 271-289.
[44] Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-Label Classi€cation: An Overview. International Journal of Data Warehousing and Mining 3, 3 (2007), 1-13. hŠps://doi.org/10.4018/jdwm.2007070101
[45] Ricardo Vilalta, Christophe Giraud-Carrier, Pavel Brazdil, and Carlos Soares. 2011. Inductive transfer. In Encyclopedia of Machine Learning, Claude Sammut and Geo‚rey I. Webb (Eds.). Springer, Heidelberg, DE, 545-548.
[46] Alexei Vinokourov, John Shawe-Taylor, and Nello Cristianini. 2002. Inferring a semantic representation of text via cross-language correlation analysis. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems (NIPS 2002). Vancouver, CA, 1473-1480.
[47] Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classi€cation. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing (ACL/IJCNLP 2009). Singapore, SN, 235-243. hŠps://doi.org/10.3115/1687878.1687913
[48] David H. Wolpert. 1992. Stacked generalization. Neural Networks 5, 2 (1992), 241-259. hŠps://doi.org/10.1016/ s0893-6080(05)80023-1
[49] Ting-Fan Wu, Chih-Jen Lin, and Ruby C. Weng. 2004. Probability Estimates for Multi-class Classi€cation by Pairwise Coupling. Journal of Machine Learning Research 5 (2004), 975-1005.


Back to previous page
BibTeX entry
	title = {Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Cross-Lingual Text Classification},
	author = {Esuli A. and Moreo Fernandez A.  D. and Sebastiani F.},
	publisher = {Association for Computing Machinery,, New York, NY , Stati Uniti d'America},
	doi = {10.1145/3326065 and 10.48550/arxiv.1901.11459},
	journal = {ACM transactions on information systems},
	volume = {37},
	pages = {1–30},
	year = {2019}

Advanced Research Infrastructure for Archaeological Data Networking in Europe - plus