2004
Conference article
Closed Access
YaDT: Yet another Decision Tree builder
Salvatore RuggieriYaDT is a from-scratch main-memory implementation of the C4.5-like decision tree algorithm. Our presentation will be focused on the design principles that allowed for obtaining an extremely efficient system. Experimental results are reported comparing YaDT withWeka, dti, Xelopes and (E)C4.5.Source: 16th International Conference on Tools with Artificial Intelligence (ICTAI 2004), pp. 260–265, Boca Raton, FL, USA, 15-17 November 2004
DOI: 10.1109/ictai.2004.123Metrics:
See at:
doi.org
| www.computer.org
| CNR ExploRA
2010
Conference article
Open Access
Frequent regular itemset mining
Ruggieri S.Concise representations of frequent itemsets sacrifice readability and direct
interpretability by a data analyst of the concise patterns extracted. In this
paper, we introduce an extension of itemsets, called regular, with an immediate
semantics and interpretability, and a conciseness comparable to closed
itemsets. Regular itemsets allow for specifying that an item may or may not be
present; that any subset of an itemset may be present; and that any non-empty
subset of an itemset may be present. We devise a procedure, called {\bf
RegularMine}, for mining a set of regular itemsets that is a concise
representation of frequent itemsets. The procedure computes a covering, in
terms of regular itemsets, of the frequent itemsets in the class of equivalence
of a closed one. We report experimental results on several standard dense and
sparse datasets that validate the proposed approach.Source: 16th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2010), pp. 263–272, Washington D.C., USA, 25-28 July 2010
DOI: 10.1145/1835804.1835840Metrics:
See at:
www.di.unipi.it
| dl.acm.org
| doi.org
| CNR ExploRA
2012
Journal article
Restricted
A complexity perspective on entailment of parameterized linear constraints
Eirinakis P., Ruggieri S., Subramani K., Wojciechowski P.Extending linear constraints by admitting parameters allows for more abstract problem modeling and reasoning. A lot of focus has been given to conducting research that demonstrates the usefulness of parameterized linear constraints and implementing tools that utilize their modeling strength. However, there is no approach that considers basic theoretical tools related to such constraints that allow for reasoning over them. Hence, in this paper we introduce satisf iability with respect to polyhedral sets and entailment for the class of parameterized linear constraints. In order to study the computational complexities of these problems, we relate them to classes of quantified linear implications. The problem of satisfiability with respect to polyhedral sets is then shown to be co-NP hard. The entailment problem is also shown to be co-NP hard in its general form. Nevertheless, we characterize some subclasses for which this problem is in P. Furthermore, we examine a weakening and a strengthening extension of the entailment problem. The weak entailment problem is proved to be NP complete. On the other hand, the strong entailment problem is shown to be co-NP hard.Source: Constraints (Dordrecht. Online) 17 (2012): 461–487. doi:10.1007/s10601-012-9127-x
DOI: 10.1007/s10601-012-9127-xMetrics:
See at:
Constraints
| link.springer.com
| CNR ExploRA
2012
Conference article
Restricted
Computational complexities of inclusion queries over polyhedral sets
Eirinakis P., Ruggieri S., Subramani K., Wojciechowski P.In this paper we discuss the computational complexities of procedures for inclusion queries over polyhedral sets. The polyhedral sets that we consider occur in a wide range of applications, ranging from logistics to program verification. The goal of our study is to establish boundaries between hard and easy problems in this context.Source: International Symposium on Artificial Intelligence and Mathematics, Fort Lauderdale, FL, USA, 9-11 January 2012
See at:
www.cs.uic.edu
| CNR ExploRA
2012
Conference article
Restricted
Deciding membership in a class of polyhedra
Ruggieri S.Parameterized linear systems allow for modelling and reasoning over classes of polyhedra. Collections of squares, rectangles, polytopes, and so on can readily be defined by means of linear systems with parameters in constant terms. In this paper, we consider the membership problem of deciding whether a given polyhedron belongs to the class defined by a parameterized linear system. As an example, we are interested in questions such as: "does a given polytope belong to the class of hypercubes?" We show that the membership problem is NP-complete, even when restricting to the 2-dimensional plane. Despite the negative result, the constructive proof allows us to devise a concise decision procedure using constraint logic programming over the reals, namely CLP(R), which searches for a characterization of all instances of a parameterized system that are equivalent to a given polyhedron.Source: 20th European Conference on Artificial Intelligence, pp. 702–707, Montpellier, Francia, 27-31 agosto 2012
DOI: 10.3233/978-1-61499-098-7-702Metrics:
See at:
ebooks.iospress.nl
| CNR ExploRA
2012
Conference article
Restricted
Subtree replacement in decision tree simplication
Ruggieri S.The current availability of efficient algorithms for decision tree induction makes intricate post-processing tech- niques worth to be investigated both for eciency and effectiveness. We study the simplification operator of subtree replacement, also known as grafting, originally implemented in the C4.5 system. We present a parametric bottom-up algorithm integrating grafting with the standard pruning operator, and analyze its complexity in terms of the number of nodes visited. Immediate instances of the parametric algorithm include extensions of error based, reduced error, minimum error, and pessimistic error pruning. Experimental results show that the computational cost of grafting is paid of by statis- tically significant smaller trees without accuracy loss.Source: 12th SIAM Conference on Data Mining, pp. 379–390, Anaheim, California USA, 26,28 April 2012
See at:
siam.omnibooksonline.com
| CNR ExploRA
2013
Contribution to book
Restricted
Discrimination Data Analysis: A Multi-disciplinary Bibliography
Romei A., Ruggieri S.Discrimination data analysis has been investigated for the last fifty years in a large body of social, legal, and economic studies. Recently, discrimination discovery and prevention has become a blooming research topic in the knowledge discovery community. This chapter provides a multi-disciplinary annotated bibliography of the literature on discrimination data analysis, with the intended objective to provide a common basis to researchers from a multi-disciplinary perspective. We cover legal, sociological, economic and computer science referencesSource: Discrimination and Privacy in the Information Society, edited by Custers, Bart and Calders, Toon and Schermer, Bart and Zarsky, Tal, pp. 109–135, 2013
DOI: 10.1007/978-3-642-30487-3_6Metrics:
See at:
doi.org
| CNR ExploRA
2013
Conference article
Open Access
Data anonimity meets non-discrimination
Ruggieri S.We investigate the relation between t-closeness, a well-known model of data anonymization, and alpha-protection, a model of data discrimination. We show that t-closeness implies bd(t)-protection, for a bound function bd() depending on the discrimination measure at hand. This allows us to adapt an inference control method, the Mondrian multidimensional generalization technique, to the purpose of non-discrimination data protection. The parallel between the two analytical models raises intriguing issues on the interplay between data anonymization and nondiscrimination research in data mining.Source: ICDMW 2013 - IEEE 13th International Conference on Data Mining Workshops, pp. 875–882, Dallas, Texas, USA, 7-10 December 2013
DOI: 10.1109/icdmw.2013.56Metrics:
See at:
www.di.unipi.it
| doi.org
| ieeexplore.ieee.org
| CNR ExploRA
2013
Conference article
Open Access
Learning from polyhedral sets
Ruggieri S.Parameterized linear systems allow for modelling and reasoning over classes of polyhedra. Collections of squares, rectangles, polytopes, and so on, can readily be defined by means of linear systems with parameters. In this paper, we investigate the problem of learning a parameterized linear system whose class of polyhedra includes a given set of example polyhedral sets and it is minimal.Source: IJCAI 2013 - Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1069–1075, Beijing, China, 3-9 August 2013
See at:
ijcai.org
| CNR ExploRA
2014
Journal article
Open Access
A multidisciplinary survey on discrimination analysis
Romei A., Ruggieri S.The collection and analysis of observational and experimental data represent the main tools for assessing the presence, the extent, the nature, and the trend of discrimination phenomena. Data analysis techniques have been proposed in the last 50 years in the economic, legal, statistical, and, recently, in the data mining literature. This is not surprising, since discrimination analysis is a multidisciplinary problem, involving sociological causes, legal argumentations, economic models, statistical techniques, and computational issues. The objective of this survey is to provide a guidance and a glue for researchers and anti-discrimination data analysts on concepts, problems, application areas, datasets, methods, and approaches from a multidisciplinary perspective. We organize the approaches according to their method of data collection as observational, quasi-experimental, and experimental studies. A fourth line of recently blooming research on knowledge discovery based methods is also covered. Observational methods are further categorized on the basis of their application context: labor economics, social profiling, consumer markets, and others.Source: Knowledge engineering review (Print) 29 (2014): 582–638. doi:10.1017/S0269888913000039
DOI: 10.1017/s0269888913000039Metrics:
See at:
The Knowledge Engineering Review
| The Knowledge Engineering Review
| journals.cambridge.org
| CNR ExploRA
2017
Conference article
Open Access
Efficiently clustering very large attributed graphs
Baroni A., Conte A., Patrignani M., Ruggieri S.Attributed graphs model real networks by enriching their nodes with attributes accounting for properties. Several techniques have been proposed for partitioning these graphs into clusters that are homogeneous with respect to both semantic attributes and to the structure of the graph. However, time and space complexities of state of the art algorithms limit their scalability to medium-sized graphs. We propose SToC (for Semantic-Topological Clustering), a fast and scalable algorithm for partitioning large attributed graphs.
The approach is robust, being compatible both with categorical and with quantitative attributes, and it is tailorable, allowing the user to weight the semantic and topological components. Further, the approach does not require the user to guess in advance the number
of clusters. SToC relies on well known approximation techniques such as bottom-k sketches, traditional graph-theoretic concepts, and a new perspective on the composition of heterogeneous distance measures.
Experimental results demonstrate its ability to efficiently compute high-quality partitions of large scale attributed graphs.Source: 2017 IEEE/ACM: International Conference on Advances in Social Networks Analysis and Mining 2017, July-August 2017
DOI: 10.1145/3110025.3110030DOI: 10.48550/arxiv.1703.08590Metrics:
See at:
arXiv.org e-Print Archive
| arxiv.org
| ISTI Repository
| dl.acm.org
| doi.org
| doi.org
| CNR ExploRA
2011
Contribution to book
Open Access
Who/where are my new customers?
Rinzivillo Salvatore, Ruggieri SalvatoreWe present a knowledge discovery case study on customer classification having the objective of mining the distinctive characteristics of new customers of a service of tax return. Two general approaches are described. The first one, a symbolic approach, is based on extracting and ranking classification rules on the basis of significativeness measures defined on the 4-fold contingency table of a rule. The second one, a spatial approach, is based on extracting geographic areas with predominant presence of new customers.Source: Emerging Intelligent Technologies in Industry, edited by Dominik Ry?ko, Henryk Rybi?ski, Piotr Gawrysiak, Marzena Kryszkiewicz, pp. 307. Berlin/Heidelberg: Springer-Verlag, 2011
DOI: 10.1007/978-3-642-22732-5_25Metrics:
See at:
www.di.unipi.it
| doi.org
| link.springer.com
| CNR ExploRA
2011
Conference article
Open Access
k-NN as an implementation of situation testing for discrimination discovery and prevention
Luong Binh Thanh, Ruggieri Salvatore, Turini FrancoWith the support of the legally-grounded methodology of situation testing, we tackle the problems of discrimination discovery and prevention from a dataset of historical decisions by adopting a variant of k-NN classifi cation. A tuple is labeled as discriminated if we can observe a signi ficant di erence of treatment among its neighbors belonging to a protected-by-law group and its neighbors not belonging to it. Discrimination discovery boils down to extracting a classi fication model from the labeled tuples. Discrimination prevention is tackled by changing the decision value for tuples labeled as discriminated before training a classi fier. The approach of this paper overcomes legal weaknesses and technical limitations of existing proposals.Source: 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '11, pp. 502–510, San Diego, California, USA, August 21-24 2011
DOI: 10.1145/2020408.2020488Metrics:
See at:
www.di.unipi.it
| doi.org
| CNR ExploRA
2013
Journal article
Open Access
Discrimination discovery in scientific project evaluation: a case study
Romei A., Ruggieri S., Turini F.Discovering contexts of unfair decisions in a dataset of historical decision records is a non-trivial problem. It requires the design of ad hoc methods and techniques of analysis, which have to comply with existing laws and with legal argumentations. While some data mining techniques have been adapted to the purpose, the state-of-the-art of research still needs both methodological refinements, the consolidation of a Knowledge Discovery in Databases (KDD) process, and, most of all, experimentation with real data. This paper contributes by presenting a case study on gender discrimination in a dataset of scientific research proposals, and by distilling from the case study a general discrimination discovery process. Gender bias in scientific research is a challenging problem, that has been tackled in the social sciences literature by means of statistical regression. However, this approach is limited to test an hypothesis of discrimination over the whole dataset under analysis. Our methodology couples data mining, for unveiling previously unknown contexts of possible discrimination, with statistical regression, for testing the significance of such contexts, thus obtaining the best of the two worlds. (C) 2013 Elsevier Ltd. All rights reserved.Source: Expert systems with applications 40 (2013): 6064–6079. doi:10.1016/j.eswa.2013.05.016
DOI: 10.1016/j.eswa.2013.05.016Metrics:
See at:
Expert Systems with Applications
| Expert Systems with Applications
| www.sciencedirect.com
| CNR ExploRA
2018
Report
Open Access
Assessing the stability of interpretable models
Guidotti R., Ruggieri S.Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process, which, in particular, comprises data collection and filtering. Selection bias in data collection or in data pre-processing may affect the model learned. Although model induction algorithms are designed to learn to generalize, they pursue optimization of predictive accuracy. It remains unclear how interpretability is instead impacted. We conduct an experimental analysis to investigate whether interpretable models are able to cope with data selection bias as far as interpretability is concerned.Source: ISTI Technical reports, 2018
Project(s): SoBigData ![via OpenAIRE](/components/com_dnetindexclient/img/openaire_logo.png)
See at:
arxiv.org
| ISTI Repository
| CNR ExploRA