2001
Conference article
Unknown
Adaptive web caching using decision trees
Bonchi F., Fenu R., Giannotti F., Gozzi C., Manco G., Nanni M., Pedreschi D., Renso C., Ruggieri S., Sannais L.An abstract is not available.Source: SDM01 Workshop on Web Mining, Chicago, April 2001
See at:
CNR ExploRA
2000
Report
Unknown
MineFAST: intelligent web caching based on data mining
Bonchi F., Fenu R., Giannotti F., Manco G., Nanni M., Pedreschi D., Renso C., Ruggieri S., Sannais L.An abstract is not available.Source: Project report, MineFAST, pp.1–101, 2000
See at:
CNR ExploRA
2004
Conference article
Closed Access
YaDT: Yet another Decision Tree builder
Salvatore RuggieriYaDT is a from-scratch main-memory implementation of the C4.5-like decision tree algorithm. Our presentation will be focused on the design principles that allowed for obtaining an extremely efficient system. Experimental results are reported comparing YaDT withWeka, dti, Xelopes and (E)C4.5.Source: 16th International Conference on Tools with Artificial Intelligence (ICTAI 2004), pp. 260–265, Boca Raton, FL, USA, 15-17 November 2004
DOI: 10.1109/ictai.2004.123Metrics:
See at:
doi.org | www.computer.org | CNR ExploRA
2011
Contribution to book
Open Access
Who/where are my new customers?
Rinzivillo Salvatore, Ruggieri SalvatoreWe present a knowledge discovery case study on customer classification having the objective of mining the distinctive characteristics of new customers of a service of tax return. Two general approaches are described. The first one, a symbolic approach, is based on extracting and ranking classification rules on the basis of significativeness measures defined on the 4-fold contingency table of a rule. The second one, a spatial approach, is based on extracting geographic areas with predominant presence of new customers.Source: Emerging Intelligent Technologies in Industry, edited by Dominik Ry?ko, Henryk Rybi?ski, Piotr Gawrysiak, Marzena Kryszkiewicz, pp. 307. Berlin/Heidelberg: Springer-Verlag, 2011
DOI: 10.1007/978-3-642-22732-5_25Metrics:
See at:
www.di.unipi.it | doi.org | link.springer.com | CNR ExploRA
2012
Conference article
Restricted
Computational complexities of inclusion queries over polyhedral sets
Eirinakis P., Ruggieri S., Subramani K., Wojciechowski P.In this paper we discuss the computational complexities of procedures for inclusion queries over polyhedral sets. The polyhedral sets that we consider occur in a wide range of applications, ranging from logistics to program verification. The goal of our study is to establish boundaries between hard and easy problems in this context.Source: International Symposium on Artificial Intelligence and Mathematics, Fort Lauderdale, FL, USA, 9-11 January 2012
See at:
www.cs.uic.edu | CNR ExploRA
2013
Conference article
Open Access
Learning from polyhedral sets
Ruggieri S.Parameterized linear systems allow for modelling and reasoning over classes of polyhedra. Collections of squares, rectangles, polytopes, and so on, can readily be defined by means of linear systems with parameters. In this paper, we investigate the problem of learning a parameterized linear system whose class of polyhedra includes a given set of example polyhedral sets and it is minimal.Source: IJCAI 2013 - Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1069–1075, Beijing, China, 3-9 August 2013
See at:
ijcai.org | CNR ExploRA
2001
Journal article
Restricted
Web log data warehousing and mining for intelligent web caching
Bonchi F., Giannotti F., Gozzi C., Manco G., Nanni M., Pedreschi D., Renso C., Ruggieri S.We introduce intelligent web caching algorithms that employ predictive models of web requests; the general idea is to extend the least recently used LRU) policy of web and proxy servers by making it sensitive to web access models extracted from web log data using data mining techniques. Two approaches have been studied in particular, frequent patterns and decision trees. The experimental results of the new algorithms show substantial improvement over existing LRU-basedcachingtechniques,intermsofhitrate.Wedesignedanddevelopedaprototypicalsystem,whichsupports data warehousing of web log data, extraction of data mining models and simulation of the web caching algorithms.Source: Data & knowledge engineering 39 (2001): 165–189. doi:10.1016/S0169-023X(01)00038-6
DOI: 10.1016/s0169-023x(01)00038-6Metrics:
See at:
Data & Knowledge Engineering | CNR ExploRA
2010
Conference article
Closed Access
DCUBE: Discrimination Discovery in Databases
Pedreschi D., Turini F., Ruggieri S.Discrimination discovery in databases consists in finding unfair practices against minorities which
are hidden in a dataset of historical decisions. The DCUBE system implements the approach of
[5], which is based on classification rule extraction and analysis, by centering the
analysis phase around an Oracle database. The proposed demonstration guides the audience through
the legal issues about discrimination hidden in data, and through several legally-grounded analyses
to unveil discriminatory situations. The SIGMOD attendees will freely pose complex discrimination
analysis queries over the database of extracted classification rules, once they are presented with
the database relational schema, a few ad-hoc functions and procedures, and several snippets of SQL
queries for discrimination discovery.Source: ACM International Conference on Management of Data (SIGMOD 2010), pp. 1127–1130, Indianapolis, IN, 6-11 June 2010
DOI: 10.1145/1807167.1807298Metrics:
See at:
dl.acm.org | doi.org | CNR ExploRA
2012
Conference article
Restricted
Subtree replacement in decision tree simplication
Ruggieri S.The current availability of efficient algorithms for decision tree induction makes intricate post-processing tech- niques worth to be investigated both for eciency and effectiveness. We study the simplification operator of subtree replacement, also known as grafting, originally implemented in the C4.5 system. We present a parametric bottom-up algorithm integrating grafting with the standard pruning operator, and analyze its complexity in terms of the number of nodes visited. Immediate instances of the parametric algorithm include extensions of error based, reduced error, minimum error, and pessimistic error pruning. Experimental results show that the computational cost of grafting is paid of by statis- tically significant smaller trees without accuracy loss.Source: 12th SIAM Conference on Data Mining, pp. 379–390, Anaheim, California USA, 26,28 April 2012
See at:
siam.omnibooksonline.com | CNR ExploRA
2013
Contribution to book
Restricted
Discrimination Data Analysis: A Multi-disciplinary Bibliography
Romei A., Ruggieri S.Discrimination data analysis has been investigated for the last fifty years in a large body of social, legal, and economic studies. Recently, discrimination discovery and prevention has become a blooming research topic in the knowledge discovery community. This chapter provides a multi-disciplinary annotated bibliography of the literature on discrimination data analysis, with the intended objective to provide a common basis to researchers from a multi-disciplinary perspective. We cover legal, sociological, economic and computer science referencesSource: Discrimination and Privacy in the Information Society, edited by Custers, Bart and Calders, Toon and Schermer, Bart and Zarsky, Tal, pp. 109–135, 2013
DOI: 10.1007/978-3-642-30487-3_6Metrics:
See at:
doi.org | CNR ExploRA
2013
Conference article
Open Access
Data anonimity meets non-discrimination
Ruggieri S.We investigate the relation between t-closeness, a well-known model of data anonymization, and alpha-protection, a model of data discrimination. We show that t-closeness implies bd(t)-protection, for a bound function bd() depending on the discrimination measure at hand. This allows us to adapt an inference control method, the Mondrian multidimensional generalization technique, to the purpose of non-discrimination data protection. The parallel between the two analytical models raises intriguing issues on the interplay between data anonymization and nondiscrimination research in data mining.Source: ICDMW 2013 - IEEE 13th International Conference on Data Mining Workshops, pp. 875–882, Dallas, Texas, USA, 7-10 December 2013
DOI: 10.1109/icdmw.2013.56Metrics:
See at:
www.di.unipi.it | doi.org | ieeexplore.ieee.org | CNR ExploRA
2013
Contribution to book
Restricted
The discovery of discrimination
Pedreschi D., Ruggieri S., Turini F.Discrimination discovery from data consists in the extraction of discriminatory situations and practices hidden in a large amount of historical decision records.We discuss the challenging problems in discrimination discovery, and present, in a unified form, a framework based on classification rules extraction and filtering on the basis of legally-grounded interestingness measures. The framework is implemented in the publicly available DCUBE tool. As a running example, we use a public dataset on credit scoring.Source: Discrimination and Privacy in the Information Society. Data Mining and Profiling in Large Databases., edited by Bart Custers, Toon Calders, Bart Schermer, Tal Zarsky, pp. 91–108. Berlin Heidelberg: Springer, 2013
DOI: 10.1007/978-3-642-30487-3_5Metrics:
See at:
doi.org | link.springer.com | CNR ExploRA
2018
Report
Open Access
Assessing the stability of interpretable models
Guidotti R., Ruggieri S.Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process, which, in particular, comprises data collection and filtering. Selection bias in data collection or in data pre-processing may affect the model learned. Although model induction algorithms are designed to learn to generalize, they pursue optimization of predictive accuracy. It remains unclear how interpretability is instead impacted. We conduct an experimental analysis to investigate whether interpretable models are able to cope with data selection bias as far as interpretability is concerned.Source: ISTI Technical reports, 2018
Project(s): SoBigData
See at:
arxiv.org | ISTI Repository | CNR ExploRA
2019
Conference article
Open Access
On the stability of interpretable models
Guidotti R., Ruggieri S.Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process. Bias in data collection and preparation, or in model's construction may severely affect the accountability of the design process. We conduct an experimental study of the stability of interpretable models with respect to feature selection, instance selection, and model selection. Our conclusions should raise awareness and attention of the scientific community on the need of a stability impact assessment of interpretable models.Source: IJCNN 2019 - International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14-19 July, 2019
DOI: 10.1109/ijcnn.2019.8852158DOI: 10.48550/arxiv.1810.09352Project(s): SoBigData Metrics:
See at:
arXiv.org e-Print Archive | arxiv.org | ISTI Repository | doi.org | doi.org | ieeexplore.ieee.org | CNR ExploRA
2016
Journal article
Open Access
Big data research in Italy: a perspective
Bergamaschi S., Carlini E., Ceci M., Furletti B., Giannotti F., Malerba D., Mezzanzanica M., Monreale A., Pasi G., Pedreschi D., Perego R., Ruggieri S.The aim of this article is to synthetically describe the research projects that a selection of Italian universities is undertaking in the context of big data. Far from being exhaustive, this article has the objective of offering a sample of distinct applications that address the issue of managing huge amounts of data in Italy, collected in relation to diverse domains.Source: Engineering (Beijing) 2 (2016): 163–170. doi:10.1016/J.ENG.2016.02.011
DOI: 10.1016/j.eng.2016.02.011Metrics:
See at:
doi.org | ISTI Repository | Engineering | CNR ExploRA
2023
Conference article
Open Access
Trustworthy AI at KDD Lab
Giannotti F., Guidotti R., Monreale A., Pappalardo L., Pedreschi D., Pellungrini R., Pratesi F., Rinzivillo S., Ruggieri S., Setzu M., Deluca R.This document summarizes the activities regarding the development of Responsible AI (Responsible Artificial Intelligence) conducted by the Knowledge Discovery and Data mining group (KDD-Lab), a joint research group of the Institute of Information Science and Technologies "Alessandro Faedo" (ISTI) of the National Research Council of Italy (CNR), the Department of Computer Science of the University of Pisa, and the Scuola Normale Superiore of Pisa.Source: Ital-IA 2023, pp. 388–393, Pisa, Italy, 29-30/05/2023
Project(s): SoBigData-PlusPlus
See at:
ceur-ws.org | ISTI Repository | CNR ExploRA
2011
Conference article
Open Access
k-NN as an implementation of situation testing for discrimination discovery and prevention
Luong Binh Thanh, Ruggieri Salvatore, Turini FrancoWith the support of the legally-grounded methodology of situation testing, we tackle the problems of discrimination discovery and prevention from a dataset of historical decisions by adopting a variant of k-NN classifi cation. A tuple is labeled as discriminated if we can observe a signi ficant di erence of treatment among its neighbors belonging to a protected-by-law group and its neighbors not belonging to it. Discrimination discovery boils down to extracting a classi fication model from the labeled tuples. Discrimination prevention is tackled by changing the decision value for tuples labeled as discriminated before training a classi fier. The approach of this paper overcomes legal weaknesses and technical limitations of existing proposals.Source: 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '11, pp. 502–510, San Diego, California, USA, August 21-24 2011
DOI: 10.1145/2020408.2020488Metrics:
See at:
www.di.unipi.it | doi.org | CNR ExploRA
2018
Contribution to book
Open Access
How data mining and machine learning evolved from relational data base to data science
Amato G., Candela L., Castelli D., Esuli A., Falchi F., Gennaro C., Giannotti F., Monreale A., Nanni M., Pagano P., Pappalardo L., Pedreschi D., Pratesi F., Rabitti F., Rinzivillo S., Rossetti G., Ruggieri S., Sebastiani F., Tesconi M.During the last 35 years, data management principles such as physical and logical independence, declarative querying and cost-based optimization have led to profound pervasiveness of relational databases in any kind of organization. More importantly, these technical advances have enabled the first round of business intelligence applications and laid the foundation for managing and analyzing Big Data today.Source: A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years, edited by Sergio Flesca, Sergio Greco, Elio Masciari, Domenico Saccà, pp. 287–306, 2018
DOI: 10.1007/978-3-319-61893-7_17Metrics:
See at:
arpi.unipi.it | ISTI Repository | doi.org | link.springer.com | CNR ExploRA