Page 1 of 2

2001 Conference article Unknown

Adaptive web caching using decision trees
Bonchi F., Fenu R., Giannotti F., Gozzi C., Manco G., Nanni M., Pedreschi D., Renso C., Ruggieri S., Sannais L.
An abstract is not available.Source: SDM01 Workshop on Web Mining, Chicago, April 2001

See at: CNR ExploRA

2000 Report Unknown

MineFAST: intelligent web caching based on data mining
Bonchi F., Fenu R., Giannotti F., Manco G., Nanni M., Pedreschi D., Renso C., Ruggieri S., Sannais L.
An abstract is not available.Source: Project report, MineFAST, pp.1–101, 2000

See at: CNR ExploRA

2004 Conference article Closed Access

YaDT: Yet another Decision Tree builder
Salvatore Ruggieri
YaDT is a from-scratch main-memory implementation of the C4.5-like decision tree algorithm. Our presentation will be focused on the design principles that allowed for obtaining an extremely efficient system. Experimental results are reported comparing YaDT withWeka, dti, Xelopes and (E)C4.5.Source: 16th International Conference on Tools with Artificial Intelligence (ICTAI 2004), pp. 260–265, Boca Raton, FL, USA, 15-17 November 2004
DOI: 10.1109/ictai.2004.123
Metrics:

See at: doi.org Restricted | www.computer.org | CNR ExploRA

2011 Contribution to book Open Access

Who/where are my new customers?
Rinzivillo Salvatore, Ruggieri Salvatore
We present a knowledge discovery case study on customer classification having the objective of mining the distinctive characteristics of new customers of a service of tax return. Two general approaches are described. The first one, a symbolic approach, is based on extracting and ranking classification rules on the basis of significativeness measures defined on the 4-fold contingency table of a rule. The second one, a spatial approach, is based on extracting geographic areas with predominant presence of new customers.Source: Emerging Intelligent Technologies in Industry, edited by Dominik Ry?ko, Henryk Rybi?ski, Piotr Gawrysiak, Marzena Kryszkiewicz, pp. 307. Berlin/Heidelberg: Springer-Verlag, 2011
DOI: 10.1007/978-3-642-22732-5_25
Metrics:

See at: www.di.unipi.it Open Access | doi.org Restricted | link.springer.com | CNR ExploRA

2012 Conference article Open Access

Discovering gender discrimination in project funding
Romei A., Ruggieri S., Turini F.
The selection of projects for funding can hide discriminatory decisions. We present a case study investigating gender discrimination in a dataset of scientific research proposals submitted to an Italian national call. The method for the analysis relies on a data mining classification strategy that is inspired by a legal methodology for proving evidence of social discrimination against protected-by-law groups.Source: IEEE, 12th International Conference on Data Mining Workshops, ICDMW 2012., Brussels, Belgium, 10 December 2012
DOI: 10.1109/icdmw.2012.39
Metrics:

See at: www.di.unipi.it Open Access | doi.org Restricted | ieeexplore.ieee.org | CNR ExploRA

2012 Conference article Restricted

Computational complexities of inclusion queries over polyhedral sets
Eirinakis P., Ruggieri S., Subramani K., Wojciechowski P.
In this paper we discuss the computational complexities of procedures for inclusion queries over polyhedral sets. The polyhedral sets that we consider occur in a wide range of applications, ranging from logistics to program verification. The goal of our study is to establish boundaries between hard and easy problems in this context.Source: International Symposium on Artificial Intelligence and Mathematics, Fort Lauderdale, FL, USA, 9-11 January 2012

See at: www.cs.uic.edu Restricted | CNR ExploRA

2013 Conference article Open Access

Learning from polyhedral sets
Ruggieri S.
Parameterized linear systems allow for modelling and reasoning over classes of polyhedra. Collections of squares, rectangles, polytopes, and so on, can readily be defined by means of linear systems with parameters. In this paper, we investigate the problem of learning a parameterized linear system whose class of polyhedra includes a given set of example polyhedral sets and it is minimal.Source: IJCAI 2013 - Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1069–1075, Beijing, China, 3-9 August 2013

See at: ijcai.org Open Access | CNR ExploRA

2001 Journal article Restricted

Web log data warehousing and mining for intelligent web caching
Bonchi F., Giannotti F., Gozzi C., Manco G., Nanni M., Pedreschi D., Renso C., Ruggieri S.
We introduce intelligent web caching algorithms that employ predictive models of web requests; the general idea is to extend the least recently used LRU) policy of web and proxy servers by making it sensitive to web access models extracted from web log data using data mining techniques. Two approaches have been studied in particular, frequent patterns and decision trees. The experimental results of the new algorithms show substantial improvement over existing LRU-basedcachingtechniques,intermsofhitrate.Wedesignedanddevelopedaprototypicalsystem,whichsupports data warehousing of web log data, extraction of data mining models and simulation of the web caching algorithms.Source: Data & knowledge engineering 39 (2001): 165–189. doi:10.1016/S0169-023X(01)00038-6
DOI: 10.1016/s0169-023x(01)00038-6
Metrics:

See at: Data & Knowledge Engineering Restricted | CNR ExploRA

2010 Conference article Closed Access

DCUBE: Discrimination Discovery in Databases
Pedreschi D., Turini F., Ruggieri S.
Discrimination discovery in databases consists in finding unfair practices against minorities which are hidden in a dataset of historical decisions. The DCUBE system implements the approach of [5], which is based on classification rule extraction and analysis, by centering the analysis phase around an Oracle database. The proposed demonstration guides the audience through the legal issues about discrimination hidden in data, and through several legally-grounded analyses to unveil discriminatory situations. The SIGMOD attendees will freely pose complex discrimination analysis queries over the database of extracted classification rules, once they are presented with the database relational schema, a few ad-hoc functions and procedures, and several snippets of SQL queries for discrimination discovery.Source: ACM International Conference on Management of Data (SIGMOD 2010), pp. 1127–1130, Indianapolis, IN, 6-11 June 2010
DOI: 10.1145/1807167.1807298
Metrics:

See at: dl.acm.org Restricted | doi.org | CNR ExploRA

2012 Conference article Restricted

Subtree replacement in decision tree simplication
Ruggieri S.
The current availability of efficient algorithms for decision tree induction makes intricate post-processing tech- niques worth to be investigated both for eciency and effectiveness. We study the simplification operator of subtree replacement, also known as grafting, originally implemented in the C4.5 system. We present a parametric bottom-up algorithm integrating grafting with the standard pruning operator, and analyze its complexity in terms of the number of nodes visited. Immediate instances of the parametric algorithm include extensions of error based, reduced error, minimum error, and pessimistic error pruning. Experimental results show that the computational cost of grafting is paid of by statis- tically significant smaller trees without accuracy loss.Source: 12th SIAM Conference on Data Mining, pp. 379–390, Anaheim, California USA, 26,28 April 2012

See at: siam.omnibooksonline.com Restricted | CNR ExploRA

2013 Contribution to book Restricted

Discrimination Data Analysis: A Multi-disciplinary Bibliography
Romei A., Ruggieri S.
Discrimination data analysis has been investigated for the last fifty years in a large body of social, legal, and economic studies. Recently, discrimination discovery and prevention has become a blooming research topic in the knowledge discovery community. This chapter provides a multi-disciplinary annotated bibliography of the literature on discrimination data analysis, with the intended objective to provide a common basis to researchers from a multi-disciplinary perspective. We cover legal, sociological, economic and computer science referencesSource: Discrimination and Privacy in the Information Society, edited by Custers, Bart and Calders, Toon and Schermer, Bart and Zarsky, Tal, pp. 109–135, 2013
DOI: 10.1007/978-3-642-30487-3_6
Metrics:

See at: doi.org Restricted | CNR ExploRA

2013 Conference article Open Access

Data anonimity meets non-discrimination
Ruggieri S.
We investigate the relation between t-closeness, a well-known model of data anonymization, and alpha-protection, a model of data discrimination. We show that t-closeness implies bd(t)-protection, for a bound function bd() depending on the discrimination measure at hand. This allows us to adapt an inference control method, the Mondrian multidimensional generalization technique, to the purpose of non-discrimination data protection. The parallel between the two analytical models raises intriguing issues on the interplay between data anonymization and nondiscrimination research in data mining.Source: ICDMW 2013 - IEEE 13th International Conference on Data Mining Workshops, pp. 875–882, Dallas, Texas, USA, 7-10 December 2013
DOI: 10.1109/icdmw.2013.56
Metrics:

See at: www.di.unipi.it Open Access | doi.org Restricted | ieeexplore.ieee.org | CNR ExploRA

2013 Contribution to book Restricted

The discovery of discrimination
Pedreschi D., Ruggieri S., Turini F.
Discrimination discovery from data consists in the extraction of discriminatory situations and practices hidden in a large amount of historical decision records.We discuss the challenging problems in discrimination discovery, and present, in a unified form, a framework based on classification rules extraction and filtering on the basis of legally-grounded interestingness measures. The framework is implemented in the publicly available DCUBE tool. As a running example, we use a public dataset on credit scoring.Source: Discrimination and Privacy in the Information Society. Data Mining and Profiling in Large Databases., edited by Bart Custers, Toon Calders, Bart Schermer, Tal Zarsky, pp. 91–108. Berlin Heidelberg: Springer, 2013
DOI: 10.1007/978-3-642-30487-3_5
Metrics:

See at: doi.org Restricted | link.springer.com | CNR ExploRA

2014 Journal article Open Access

Decision tree building on multi-core using FastFlow
Aldinucci M., Ruggieri S., Torquati M.
The whole computer hardware industry embraced the multi-core. The extreme optimisation of sequential algorithms is then no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an in-depth study of the parallelisation of an implementation of the C4.5 algorithm for multi-core architectures. We characterise elapsed time lower bounds for the forms of parallelisations adopted and achieve close to optimal performance. Our implementation is based on the FastFlow parallel programming environment, and it requires minimal changes to the original sequential code. Copyright © 2013 John Wiley & Sons, Ltd. Copyright © 2013 John Wiley & Sons, Ltd.Source: Concurrency and computation 26 (2014): 800–820. doi:10.1002/cpe.3063
DOI: 10.1002/cpe.3063
Metrics:

See at: Concurrency and Computation Practice and Experience Open Access | Concurrency and Computation Practice and Experience Restricted | onlinelibrary.wiley.com | CNR ExploRA

2018 Report Open Access

Assessing the stability of interpretable models
Guidotti R., Ruggieri S.
Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process, which, in particular, comprises data collection and filtering. Selection bias in data collection or in data pre-processing may affect the model learned. Although model induction algorithms are designed to learn to generalize, they pursue optimization of predictive accuracy. It remains unclear how interpretability is instead impacted. We conduct an experimental analysis to investigate whether interpretable models are able to cope with data selection bias as far as interpretability is concerned.Source: ISTI Technical reports, 2018
Project(s): SoBigData via OpenAIRE

See at: arxiv.org Open Access | ISTI Repository | CNR ExploRA

2019 Conference article Open Access

On the stability of interpretable models
Guidotti R., Ruggieri S.
Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process. Bias in data collection and preparation, or in model's construction may severely affect the accountability of the design process. We conduct an experimental study of the stability of interpretable models with respect to feature selection, instance selection, and model selection. Our conclusions should raise awareness and attention of the scientific community on the need of a stability impact assessment of interpretable models.Source: IJCNN 2019 - International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14-19 July, 2019
DOI: 10.1109/ijcnn.2019.8852158
DOI: 10.48550/arxiv.1810.09352
Project(s): SoBigData via OpenAIRE

Metrics:

2016 Journal article Open Access

Big data research in Italy: a perspective
Bergamaschi S., Carlini E., Ceci M., Furletti B., Giannotti F., Malerba D., Mezzanzanica M., Monreale A., Pasi G., Pedreschi D., Perego R., Ruggieri S.
The aim of this article is to synthetically describe the research projects that a selection of Italian universities is undertaking in the context of big data. Far from being exhaustive, this article has the objective of offering a sample of distinct applications that address the issue of managing huge amounts of data in Italy, collected in relation to diverse domains.Source: Engineering (Beijing) 2 (2016): 163–170. doi:10.1016/J.ENG.2016.02.011
DOI: 10.1016/j.eng.2016.02.011
Metrics:

See at: doi.org Open Access | ISTI Repository | Engineering | CNR ExploRA

2023 Conference article Open Access

Trustworthy AI at KDD Lab
Giannotti F., Guidotti R., Monreale A., Pappalardo L., Pedreschi D., Pellungrini R., Pratesi F., Rinzivillo S., Ruggieri S., Setzu M., Deluca R.
This document summarizes the activities regarding the development of Responsible AI (Responsible Artificial Intelligence) conducted by the Knowledge Discovery and Data mining group (KDD-Lab), a joint research group of the Institute of Information Science and Technologies "Alessandro Faedo" (ISTI) of the National Research Council of Italy (CNR), the Department of Computer Science of the University of Pisa, and the Scuola Normale Superiore of Pisa.Source: Ital-IA 2023, pp. 388–393, Pisa, Italy, 29-30/05/2023
Project(s): SoBigData-PlusPlus via OpenAIRE

See at: ceur-ws.org Open Access | ISTI Repository | CNR ExploRA

2011 Conference article Open Access

k-NN as an implementation of situation testing for discrimination discovery and prevention
Luong Binh Thanh, Ruggieri Salvatore, Turini Franco
With the support of the legally-grounded methodology of situation testing, we tackle the problems of discrimination discovery and prevention from a dataset of historical decisions by adopting a variant of k-NN classifi cation. A tuple is labeled as discriminated if we can observe a signi ficant di erence of treatment among its neighbors belonging to a protected-by-law group and its neighbors not belonging to it. Discrimination discovery boils down to extracting a classi fication model from the labeled tuples. Discrimination prevention is tackled by changing the decision value for tuples labeled as discriminated before training a classi fier. The approach of this paper overcomes legal weaknesses and technical limitations of existing proposals.Source: 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '11, pp. 502–510, San Diego, California, USA, August 21-24 2011
DOI: 10.1145/2020408.2020488
Metrics:

See at: www.di.unipi.it Open Access | doi.org Restricted | CNR ExploRA

2018 Contribution to book Open Access

How data mining and machine learning evolved from relational data base to data science
Amato G., Candela L., Castelli D., Esuli A., Falchi F., Gennaro C., Giannotti F., Monreale A., Nanni M., Pagano P., Pappalardo L., Pedreschi D., Pratesi F., Rabitti F., Rinzivillo S., Rossetti G., Ruggieri S., Sebastiani F., Tesconi M.
During the last 35 years, data management principles such as physical and logical independence, declarative querying and cost-based optimization have led to profound pervasiveness of relational databases in any kind of organization. More importantly, these technical advances have enabled the first round of business intelligence applications and laid the foundation for managing and analyzing Big Data today.Source: A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years, edited by Sergio Flesca, Sergio Greco, Elio Masciari, Domenico Saccà, pp. 287–306, 2018
DOI: 10.1007/978-3-319-61893-7_17
Metrics:

See at: arpi.unipi.it Open Access | ISTI Repository | doi.org Restricted | link.springer.com | CNR ExploRA