2020
Journal article  Open Access

Evaluation measures for quantification: an axiomatic approach

Sebastiani F.

Computer Science - Machine Learning  Statistics - Machine Learning  Machine Learning (stat.ML)  quantification  Quantification  Quantifiction  Supervised prevalence estimation  Library and Information Sciences  Information Systems  evaluation measures  Prevalence estimation  Information Retrieval (cs.IR)  Evaluation measures  Computer Science - Information Retrieval  FOS: Computer and information sciences  Artificial Intelligence (cs.AI)  Supervised learning  Machine Learning (cs.LG)  Computer Science - Artificial Intelligence 

Quantification is the task of estimating, given a set ? of unlabelled items and a set of classes ?={c1,...,c|?|}, the prevalence (or "relative frequency") in ? of each class ci??. While quantification may in principle be solved by classifying each item in ? and counting how many such items have been labelled with ci, it has long been shown that this "classify and count" method yields suboptimal quantification accuracy. As a result, quantification is no longer considered a mere byproduct of classification, and has evolved as a task of its own. While the scientific community has devoted a lot of attention to devising more accurate quantification methods, it has not devoted much to discussing what properties an evaluation measure for quantification (EMQ) should enjoy, and which EMQs should be adopted as a result. This paper lays down a number of interesting properties that an EMQ may or may not enjoy, discusses if (and when) each of these properties is desirable, surveys the EMQs that have been used so far, and discusses whether they enjoy or not the above properties. As a result of this investigation, some of the EMQs that have been used in the literature turn out to be severely unfit, while others emerge as closer to what the quantification community actually needs. However, a significant result is that no existing EMQ satisfies all the properties identified as desirable, thus indicating that more research is needed in order to identify (or synthesize) a truly adequate EMQ.

Source: Information retrieval (Boston) 23 (2020): 255–288. doi:10.1007/s10791-019-09363-y

Publisher: Kluwer Academic Publishers, Boston , Stati Uniti d'America


Roc o Ala z-Rodr guez, Alicia Guerrero-Curieses, and Jesus Cid-Sueiro. 2011. Class and subclass probability re-estimation to adapt a classi er in the presence of concept drift. Neurocomputing 74, 16 (2011), 2614{2623. DOI:http://dx.doi.org/10.1016/j.neucom. 2011.03.019
S. M. Ali and S. D. Silvey. 1966. A general class of coe cients of divergence of one distribution from another. Journal of the Royal Statistical Society, Series B 28, 1 (1966), 131{142.
Enrique Amigo, Julio Gonzalo, and Felisa Verdejo. 2011. A Comparison of Evaluation Metrics for Document Filtering. In Proceedings of the 2nd International Conference of the Cross-Language Evaluation Forum (CLEF 2011). Amsterdam, NL, 38{49. DOI: http://dx.doi.org/10.1007/978-3-642-23708-9_6
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2013. Variable-Constraint Classi cation and Quanti cation of Radiology Reports under the ACR Index. Expert Systems and Applications 40, 9 (2013), 3441{3449. DOI:http://dx.doi.org/10.1016/j. eswa.2012.12.052
Jose Barranquero, Jorge D ez, and Juan Jose del Coz. 2015. Quanti cation-oriented learning based on reliable classi ers. Pattern Recognition 48, 2 (2015), 591{604. DOI:http: //dx.doi.org/10.1016/j.patcog.2014.07.032
Jose Barranquero, Pablo Gonzalez, Jorge D ez, and Juan Jose del Coz. 2013. On the study of nearest neighbor algorithms for prevalence estimation in binary problems. Pattern Recognition 46, 2 (2013), 472{482. DOI:http://dx.doi.org/10.1016/j.patcog.2012. 07.022
Oscar Beijbom, Judy Ho man, Evan Yao, Trevor Darrell, Alberto Rodriguez-Ramirez, Manuel Gonzalez-Rivero, and Ove Hoegh-Guldberg. 2015. Quanti cation in-the-wild: Data-sets and baselines. (2015). CoRR abs/1510.04811 (2015). Presented at the NIPS 2015 Workshop on Transfer and Multi-Task Learning, Montreal, CA.
Antonio Bella, Cesar Ferri, Jose Hernandez-Orallo, and Mar a Jose Ram rez-Quintana. 2010. Quanti cation via Probability Estimators. In Proceedings of the 11th IEEE International Conference on Data Mining (ICDM 2010). Sydney, AU, 737{742. DOI: http://dx.doi.org/10.1109/icdm.2010.75
Antonio Bella, Cesar Ferri, Jose Hernandez-Orallo, and Mar a Jose Ram rez-Quintana. 2014. Aggregative quanti cation for regression. Data Mining and Knowledge Discovery 28, 2 (2014), 475{518.
Luca Busin and Stefano Mizzaro. 2013. Axiometrics: An Axiomatic Approach to Information Retrieval E ectiveness Metrics. In Proceedings of the 4th International Conference on the Theory of Information Retrieval (ICTIR 2013). Copenhagen, DK, 8. DOI:http: //dx.doi.org/10.1145/2499178.2499182
Dallas Card and Noah A. Smith. 2018. The Importance of Calibration for Estimating Proportions from Annotations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2018). New Orleans, US, 1636{1646. DOI:http://dx.doi.org/10.18653/v1/n18-1148
Andrea Ceron, Luigi Curini, and Stefano M. Iacus. 2016. iSA: A fast, scalable and accurate algorithm for sentiment analysis of social media content. Information Sciences 367/368 (2016), 105|124. DOI:http://dx.doi.org/10.1016/j.ins.2016.05.052
Imre Csiszar and Paul C. Shields. 2004. Information Theory and Statistics: A Tutorial. Foundations and Trends in Communications and Information Theory 1, 4 (2004), 417{ 528. DOI:http://dx.doi.org/10.1561/0100000004
Giovanni Da San Martino, Wei Gao, and Fabrizio Sebastiani. 2016a. Ordinal Text Quanti cation. In Proceedings of the 39th ACM Conference on Research and Development in Information Retrieval (SIGIR 2016). Pisa, IT, 937{940.
Giovanni Da San Martino, Wei Gao, and Fabrizio Sebastiani. 2016b. QCRI at SemEval-2016 Task 4: Probabilistic Methods for Binary and Ordinal Quanti cation. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016). San Diego, US, 58{63.
Marthinus C. du Plessis, Gang Niu, and Masashi Sugiyama. 2017. Class-prior estimation for learning from positive and unlabeled data. Machine Learning 106, 4 (2017), 463{492. DOI:http://dx.doi.org/10.1007/s10994-016-5604-6
Marthinus C. du Plessis and Masashi Sugiyama. 2012. Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching. In Proceedings of the 29th International Conference on Machine Learning (ICML 2012). Edinburgh, UK.
Marthinus C. du Plessis and Masashi Sugiyama. 2014. Class Prior Estimation from Positive and Unlabeled Data. IEICE Transactions 97-D, 5 (2014), 1358{1362. DOI:http://dx. doi.org/10.1587/transinf.e97.d.1358
Andrea Esuli. 2016. ISTI-CNR at SemEval-2016 Task 4: Quanti cation on an Ordinal Scale. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016). San Diego, US.
Andrea Esuli, Alejandro Moreo Fernandez, and Fabrizio Sebastiani. 2018. A Recurrent Neural Network for Sentiment Quanti cation. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018). Torino, IT.
Andrea Esuli and Fabrizio Sebastiani. 2010. Sentiment quanti cation. IEEE Intelligent Systems 25, 4 (2010), 72{75.
Andrea Esuli and Fabrizio Sebastiani. 2014. Explicit Loss Minimization in Quanti cation Applications (Preliminary Draft). In Proceedings of the 8th International Workshop on Information Filtering and Retrieval (DART 2014). Pisa, IT, 1{11.
Andrea Esuli and Fabrizio Sebastiani. 2015. Optimizing Text Quanti ers for Multivariate Loss Functions. ACM Transactions on Knowledge Discovery and Data 9, 4 (2015), Article 27. DOI:http://dx.doi.org/10.1145/2700406
Afonso Fernandes Vaz, Rafael Izbicki, and Rafael Bassi Stern. 2018. Quanti cation under prior probability shift: The ratio estimator and its extensions. (2018). arXiv preprint arXiv:1807.03929.
Marco Ferrante, Nicola Ferro, and Maria Maistro. 2015. Towards a Formal Framework for Utility-oriented Measurements of Retrieval E ectiveness. In Proceedings of the 5th ACM International Conference on the Theory of Information Retrieval (ICTIR 2015). Northampton, US, 21{30. DOI:http://dx.doi.org/10.1145/2808194.2809452
Marco Ferrante, Nicola Ferro, and Silvia Pontarollo. 2018. A General Theory of IR Evaluation Measures. IEEE Transactions on Knowledge and Data Engineering (2018). DOI: http://dx.doi.org/10.1109/TKDE.2018.2840708
George Forman. 2005. Counting Positives Accurately Despite Inaccurate Classi cation. In Proceedings of the 16th European Conference on Machine Learning (ECML 2005). Porto, PT, 564{575. DOI:http://dx.doi.org/10.1007/11564096_55
George Forman. 2006. Quantifying trends accurately despite classi er error and class imbalance. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006). Philadelphia, US, 157{166. DOI: http://dx.doi.org/10.1145/1150402.1150423
George Forman. 2008. Quantifying counts and costs via classi cation. Data Mining and Knowledge Discovery 17, 2 (2008), 164{206. DOI:http://dx.doi.org/10.1007/ s10618-008-0097-y
Wei Gao and Fabrizio Sebastiani. 2015. Tweet Sentiment: From Classi cation to Quanti cation. In Proceedings of the 7th International Conference on Advances in Social Network Analysis and Mining (ASONAM 2015). Paris, FR, 97{104. DOI:http: //dx.doi.org/10.1145/2808797.2809327
Wei Gao and Fabrizio Sebastiani. 2016. From Classi cation to Quanti cation in Tweet Sentiment Analysis. Social Network Analysis and Mining 6, 19 (2016), 1{22. DOI: http://dx.doi.org/10.1007/s13278-016-0327-z
Pablo Gonzalez, Eva Alvarez, Jorge D ez, Angel Lopez-Urrutia, and Juan J. del Coz. 2017. Validation methods for plankton image classi cation systems. Limnology and Oceanography: Methods 15 (2017), 221{237. DOI:http://dx.doi.org/10.1002/lom3.10151
Pablo Gonzalez, Alberto Castan~o, Nitesh V. Chawla, and Juan Jose del Coz. 2017. A Review on Quanti cation Learning. Comput. Surveys 50, 5 (2017), 74:1{74:40. DOI: http://dx.doi.org/10.1145/3117807
Pablo Gonzalez, Jorge D ez, Nitesh Chawla, and Juan Jose del Coz. 2017. Why is quanti - cation an interesting learning problem? Progress in Arti cial Intelligence 6, 1 (2017), 53{58. DOI:http://dx.doi.org/10.1007/s13748-016-0103-3
V ctor Gonzalez-Castro, Roc o Alaiz-Rodr guez, and Enrique Alegre. 2013. Class distribution estimation based on the Hellinger distance. Information Sciences 218 (2013), 146{164. DOI:http://dx.doi.org/10.1016/j.ins.2012.05.028
V ctor Gonzalez-Castro, Roc o Alaiz-Rodr guez, Laura Fernandez-Robles, R. GuzmanMart nez, and Enrique Alegre. 2010. Estimating Class Proportions in Boar Semen Analysis Using the Hellinger Distance. In Proceedings of the 23rd International Conference on Industrial Engineering and other Applications of Applied Intelligent Systems (IEA/AIE 2010). Cordoba, ES, 284{293. DOI:http://dx.doi.org/10.1007/ 978-3-642-13022-9_29
Daniel J. Hopkins and Gary King. 2010. A Method of Automated Nonparametric Content Analysis for Social Science. American Journal of Political Science 54, 1 (2010), 229{247. DOI:http://dx.doi.org/10.1111/j.1540-5907.2009.00428.x
Purushottam Kar, Shuai Li, Harikrishna Narasimhan, Sanjay Chawla, and Fabrizio Sebastiani. 2016. Online Optimization Methods for the Quanti cation Problem. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016). San Francisco, US, 1625{1634. DOI:http: //dx.doi.org/10.1145/2939672.2939832
Gary King and Ying Lu. 2008. Verbal Autopsy Methods with Multiple Causes of Death. Statist. Sci. 23, 1 (2008), 78{91. DOI:http://dx.doi.org/10.1214/07-sts247
Roy Levin and Haggai Roitman. 2017. Enhanced Probabilistic Classify and Count Methods for Multi-Label Text Quanti cation. In Proceedings of the 7th ACM International Conference on the Theory of Information Retrieval (ICTIR 2017). Amsterdam, NL, 229{232. DOI:http://dx.doi.org/10.1145/3121050.3121083
Friedrich Liese and Igor Vajda. 2006. On Divergences and Informations in Statistics and Information Theory. IEEE Transactions on Information Theory 52, 10 (2006), 4394{ 4412. DOI:http://dx.doi.org/10.1109/tit.2006.881731
Jianhua Lin. 1991. Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information Theory 37, 1 (1991), 145{151. DOI:http://dx.doi.org/10.1109/18. 61115
Andre G. Maletzke, Denis Moreira dos Reis, and Gustavo E. Batista. 2017. Quanti cation in Data Streams: Initial Results. In Proceedings of the 2017 Brazilian Conference on Intelligent Systems (BRACIS 2017). Uberl^andia, BZ, 43{48. DOI:http://dx.doi.org/ 10.1109/BRACIS.2017.74
Letizia Milli, Anna Monreale, Giulio Rossetti, Fosca Giannotti, Dino Pedreschi, and Fabrizio Sebastiani. 2013. Quanti cation Trees. In Proceedings of the 13th IEEE International Conference on Data Mining (ICDM 2013). Dallas, US, 528{536. DOI:http://dx.doi. org/10.1109/icdm.2013.122
Letizia Milli, Anna Monreale, Giulio Rossetti, Dino Pedreschi, Fosca Giannotti, and Fabrizio Sebastiani. 2015. Quanti cation in Social Networks. In Proceedings of the 2nd IEEE International Conference on Data Science and Advanced Analytics (DSAA 2015). Paris, FR. DOI:http://dx.doi.org/10.1109/dsaa.2015.7344845
Alastair Mo at. 2013. Seven numeric properties of e ectiveness metrics. In Proceedings of the 9th Conference of the Asia Information Retrieval Societies (AIRS 2013). Singapore, SN, 1{12. DOI:http://dx.doi.org/10.1007/978-3-642-45068-6_1
Denis Moreira dos Reis, Andre G. Maletzke, Diego F. Silva, and Gustavo E. Batista. 2018. Classifying and Counting with Recurrent Contexts. In Proceedings of the 24th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2018). London, UK, 1983{1992. DOI:http://dx.doi.org/10.1145/3219819.3220059
Preslav Nakov, Noura Farra, and Sara Rosenthal. 2017. SemEval-2017 Task 4: Sentiment Analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval 2017). Vancouver, CA. DOI:http://dx.doi.org/10.18653/v1/ s17-2088
Preslav Nakov, Alan Ritter, Sara Rosenthal, Fabrizio Sebastiani, and Veselin Stoyanov. 2016. SemEval-2016 Task 4: Sentiment Analysis in Twitter. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016). San Diego, US, 1{18. DOI:http://dx.doi.org/10.18653/v1/s16-1001
Pablo Perez-Gallego, Alberto Castan~o, Jose Ramon Quevedo, and Juan Jose del Coz. 2019. Dynamic Ensemble Selection for Quanti cation Tasks. Information Fusion 45 (2019),
ti cation. Information Fusion 34 (2017), 87{100. DOI:http://dx.doi.org/10.1016/j.
2018. Optimizing non-decomposable measures with deep networks. Machine Learning
Dirk Tasche. 2017.
2 (jp^0(c1)
2 (j(p(c1) + a)
2 (jaj + j
2 (j(p(c1)
2 (j(p0(c1)
1)(a + y)
1)x
2ax + x)
2ax + x) > (x
1) + y)

Metrics



Back to previous page
BibTeX entry
@article{oai:it.cnr:prodotti:424696,
	title = {Evaluation measures for quantification: an axiomatic approach},
	author = {Sebastiani F.},
	publisher = {Kluwer Academic Publishers, Boston , Stati Uniti d'America},
	doi = {10.1007/s10791-019-09363-y and 10.48550/arxiv.1809.01991},
	journal = {Information retrieval (Boston)},
	volume = {23},
	pages = {255–288},
	year = {2020}
}