Document - Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Cross-Lingual Text Classification

2019

Journal article Open Access

Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Cross-Lingual Text Classification

Esuli A., Moreo Fernandez A. D., Sebastiani F.

Computer Science - Machine Learning Machine Learning (stat.ML) Statistics - Machine Learning Management and Accounting Information Systems Utility Theory Information Retrieval (cs.IR) Semi-automated Text Classification FOS: Computer and information sciences Computer Science - Information Retrieval Computer Science Applications Artificial Intelligence (cs.AI) General Business E-discovery Technology-Assisted Review Machine Learning (cs.LG) Computer Science - Artificial Intelligence

Cross-lingual Text Classification(CLC) consists of automatically classifying, according to a common setCofclasses, documents each written in one of a set of languagesL, and doing so more accurately than when"naïvely" classifying each document via its corresponding language-specific classifier. In order to obtain anincrease in the classification accuracy for a given language, the system thus needs to also leverage the trainingexamples written in the other languages. We tackle "multilabel" CLC viafunnelling, a new ensemble learningmethod that we propose here. Funnelling consists of generating a two-tier classification system where alldocuments, irrespectively of language, are classified by the same (2nd-tier) classifier. For this classifier alldocuments are represented in a common, language-independent feature space consisting of the posteriorprobabilities generated by 1st-tier, language-dependent classifiers. This allows the classification of all testdocuments, of any language, to benefit from the information present in all training documents, of any language.We present substantial experiments, run on publicly available multilingual text collections, in which funnellingis shown to significantly outperform a number of state-of-the-art baselines. All code and datasets (in vectorform) are made publicly available.

Source: ACM transactions on information systems 37 (2019): 1–30. doi:10.1145/3326065

Publisher: Association for Computing Machinery,, New York, NY , Stati Uniti d'America

Citations

[1] Georgios Balikas and Massih-Reza Amini. 2016. Multi-label, multi-class classication using polylingual embeddings. In Proceedings of the 38th European Conference on Information Retrieval (ECIR 2016). Padova, IT, 723-728. hps: //doi.org/10.1007/978-3-319-30671-1 59
[2] Nuria Bel, Cornelis H. Koster, and Marta Villegas. 2003. Cross-lingual text categorization. In Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2003). Trondheim, NO, 126-139. hps://doi.org/10.1007/978-3-540-45175-4 13
[3] Natalia Y. Bilenko and Jack L. Gallant. 2016. Pyrcca: Regularized kernel canonical correlation analysis in Python and its applications to neuroimaging. Frontiers in Neuroinformatics 10 (2016), 49. hps://doi.org/10.3389/fninf.2016.00049
[4] Christopher M. Bishop. 2006. Paern Recognition and Machine Learning. Springer, Heidelberg, DE.
[5] Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (1996), 123-140. hps://doi.org/10.1007/bf00058655
[6] Philip K. Chan and Salvatore J. Stolfo. 1997. On the Accuracy of Meta-Learning for Scalable Data Mining. Journal of Intelligent Information Systems 8, 1 (1997), 5-28. hps://doi.org/10.1023/A:1008640732416
[7] Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, and Herve´ Je´gou. 2018. Word Translation without Parallel Data. In Proceedings of the 6th International Conference on Learning Representations (ICLR 2018). Vancouver, CA.
[8] Oscar Day and Taghi M. Khoshgoaar. 2017. A survey on heterogeneous transfer learning. Journal of Big Data 4 (2017), Article 17 (1-42). hps://doi.org/10.1186/s40537-017-0089-0
[9] Morris H. DeGroot and Stephen E. Fienberg. 1983. e comparison and evaluation of forecasters. e Statistician 32, 1/2 (1983), 12-22. hps://doi.org/10.2307/2987588
[10] Susan T. Dumais, Todd A. Letsche, Michael L. Liman, and omas K. Landauer. 1997. Automatic cross-language retrieval using latent semantic indexing. In Working Notes of the AAAI Spring Symposium on Cross-language Text and Speech Retrieval. Stanford, US, 18-24. hps://doi.org/10.1007/978-1-4615-5661-9 5
[11] Saso Dzˇeroski and Bernard Zˇ enko. 2004. Is Combining Classiers with Stacking Beer than Selecting the Best One? Machine Learning 54, 3 (2004), 255-273. hps://doi.org/10.1023/b:mach.0000015881.36452.6e
[12] Manaal Faruqui and Chris Dyer. 2014. Improving Vector Space Word Representations Using Multilingual Correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014). Gothenburg, SE, 462-471. hps://doi.org/10.3115/v1/e14-1049
[13] Marc Franco-Salvador, Paolo Rosso, and Roberto Navigli. 2014. A Knowledge-based Representation for Cross-Language Document Retrieval and Categorization. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014). Gothenburg, SE, 414-423. hps://doi.org/10.3115/v1/e14-1044
[14] Yoav Freund and Robert E. Schapire. 1996. Experiments with a New Boosting Algorithm. In Proceedings of the 13th International Conference on Machine Learning (ICML 1996). Bari, IT, 148-156.
[15] Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis. In Proceedings of the 20th International Joint Conference on Artical Intelligence (IJCAI 2007). San Francisco, US, 1606-1611.
[16] Juan Jose´ Garc´ıa Adeva, Rafael A. Calvo, and Diego Lo´pez de Ipin´a. 2005. Multilingual approaches to text categorisation. European Journal for the Informatics Professional 5, 3 (2005), 43-51.
[17] Shantanu Godbole and Sunita Sarawagi. 2004. Discriminative Methods for Multi-labeled Classication. In Proceedings of the 8th Pacic-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004). Sydney, AU, 22-30. hps: //doi.org/10.1007/978-3-540-24775-3 5
[18] Stephan Gouws, Yoshua Bengio, and Greg Corrado. 2015. Bilbowa: Fast bilingual distributed representations without word alignments. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015). Lille, FR, 748-756.
[19] Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).
[20] David R. Hardoon, Sandor Szedmak, and John Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16, 12 (2004), 2639-2664. hps://doi.org/10.1162/0899766042321814
[21] Sepp Hochreiter and Ju¨rgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735-1780.
[22] Harold Hotelling. 1936. Relations between two sets of variates. Biometrika 28, 3/4 (1936), 321-377. hps://doi.org/10. 2307/2333955
[23] Alexandre Klementiev, Ivan Titov, and Binod Bhaarai. 2012. Inducing Crosslingual Distributed Representations of Words. In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012). Mumbai, IN, 1459-1474.
[24] Ludmila I. Kuncheva. 2004. Combining Paern Classiers: Methods and Algorithms. John Wiley & Sons, Hoboken, US.
[25] David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research 5 (2004), 361-397.
[26] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jerey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS 2013). Lake Tahoe, US, 3111-3119.
[27] David Mimno, Hanna M. Wallach, Jason Naradowsky, David A. Smith, and Andrew McCallum. 2009. Polylingual topic models. In Proceedings of the 7th Conference on Empirical Methods in Natural Language Processing (EMNLP 2009). Singapore, SN, 880-889. hps://doi.org/10.3115/1699571.1699627
[28] Alejandro Moreo, Andrea Esuli, and Fabrizio Sebastiani. 2016. Distributional Correspondence Indexing for CrossLingual and Cross-Domain Sentiment Classication. Journal of Articial Intelligence Research 55 (2016), 131-163.
[29] Alejandro Moreo, Andrea Esuli, and Fabrizio Sebastiani. 2016. Lightweight Random Indexing for Polylingual Text Classication. Journal of Articial Intelligence Research 57 (2016), 151-185.
[30] Steven R. Ness, Anthony eocharis, George Tzanetakis, and Luis G. Martins. 2009. Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs. In Proceedings of the 17th International Conference on Multimedia (MM 2009). Vancouver, CA, 705-708. hps://doi.org/10.1145/1631272.1631393
[31] Weike Pan, Erheng Zhong, and Qiang Yang. 2012. Transfer Learning for Text Mining. In Mining Text Data, Charu C. Aggarwal and ChengXiang Zhai (Eds.). Springer, Heidelberg, DE, 223-258. hps://doi.org/10.1007/978-1-4614-3223-4 7
[32] John C. Pla. 2000. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Margin Classiers, Alexander Smola, Peter Bartle, Bernard Scho¨lkopf, and Dale Schuurmans (Eds.). e MIT Press, Cambridge, MA, 61-74.
[33] Peter Preenhofer and Benno Stein. 2010. Cross-language text classication using structural correspondence learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010). Uppsala, SE, 1118-1127.
[34] Leonardo Rigutini, Marco Maggini, and Bing Liu. 2005. An EM-based training algorithm for cross-language text categorization. In Proceedings of the 3rd IEEE/WIC/ACM International Conference on Web Intelligence (WI 2005). Compie`gne, FR, 529-535. hps://doi.org/10.1109/wi.2005.29
[35] Magnus Sahlgren. 2005. An introduction to random indexing. In Proceedings of the Workshop on Methods and Applications of Semantic Indexing. Copenhagen, DK.
[36] Magnus Sahlgren and Rickard Co¨ster. 2004. Using bag-of-concepts to improve the performance of support vector machines in text categorization. In Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004). Geneva, CH, 487. hps://doi.org/10.3115/1220355.1220425
[37] Georgios Sakkis, Ion Androutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Constantine D. Spyropoulos, and Panagiotis Stamatopoulos. 2001. Stacking classiers for anti-spam ltering of e-mail. In Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing (EMNLP 2001). Pisburgh, US, 44-50.
[38] Fabrizio Sebastiani. 2015. An Axiomatically Derived Measure for the Evaluation of Classication Algorithms. In Proceedings of the 5th ACM International Conference on the eory of Information Retrieval (ICTIR 2015). Northampton, US, 11-20. hps://doi.org/10.1145/2808194.2809449
[39] Yangqiu Song, Shyam Upadhyay, Haoruo Peng, and Dan Roth. 2016. Cross-Lingual Dataless Classication for Many Languages.. In Proceedings of the 26th International Joint Conference on Articial Intelligence (IJCAI 2016). New York, US, 2901-2907.
[40] Philipp Sorg and Philipp Cimiano. 2008. Cross-language Information Retrieval with Explicit Semantic Analysis. In Working Notes of the 2008 Cross-Language Evaluation Forum (CLEF 2008). Aarhus, DE.
[41] Philipp Sorg and Philipp Cimiano. 2012. Exploiting Wikipedia for cross-lingual and multilingual information retrieval. Data and Knowledge Engineering 74 (2012), 26-45. hps://doi.org/10.1016/j.datak.2012.02.003
[42] Ralf Steinberger, Bruno Pouliquen, Anna Widiger, Camelia Ignat, Tomaz Erjavec, Dan Tus, and Da´niel Varga. 2006. e JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. (2006). CoRR abs/cs/0609058.
[43] Kai Ming Ting and Ian H. Wien. 1999. Issues in Stacked Generalization. Journal of Articial Intelligence Research 10 (1999), 271-289.
[44] Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-Label Classication: An Overview. International Journal of Data Warehousing and Mining 3, 3 (2007), 1-13. hps://doi.org/10.4018/jdwm.2007070101
[45] Ricardo Vilalta, Christophe Giraud-Carrier, Pavel Brazdil, and Carlos Soares. 2011. Inductive transfer. In Encyclopedia of Machine Learning, Claude Sammut and Georey I. Webb (Eds.). Springer, Heidelberg, DE, 545-548.
[46] Alexei Vinokourov, John Shawe-Taylor, and Nello Cristianini. 2002. Inferring a semantic representation of text via cross-language correlation analysis. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems (NIPS 2002). Vancouver, CA, 1473-1480.
[47] Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classication. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing (ACL/IJCNLP 2009). Singapore, SN, 235-243. hps://doi.org/10.3115/1687878.1687913
[48] David H. Wolpert. 1992. Stacked generalization. Neural Networks 5, 2 (1992), 241-259. hps://doi.org/10.1016/ s0893-6080(05)80023-1
[49] Ting-Fan Wu, Chih-Jen Lin, and Ruby C. Weng. 2004. Probability Estimates for Multi-class Classication by Pairwise Coupling. Journal of Machine Learning Research 5 (2004), 975-1005.

Metrics

Back to previous page

Cite as

BibTeX entry

@article{oai:it.cnr:prodotti:403485,
	title = {Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Cross-Lingual Text Classification},
	author = {Esuli A. and Moreo Fernandez A.  D. and Sebastiani F.},
	publisher = {Association for Computing Machinery,, New York, NY , Stati Uniti d'America},
	doi = {10.1145/3326065 and 10.48550/arxiv.1901.11459},
	journal = {ACM transactions on information systems},
	volume = {37},
	pages = {1–30},
	year = {2019}
}