36 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
more
Rights operator: and / or
2004 Conference article Closed Access
YaDT: Yet another Decision Tree builder
Salvatore Ruggieri
YaDT is a from-scratch main-memory implementation of the C4.5-like decision tree algorithm. Our presentation will be focused on the design principles that allowed for obtaining an extremely efficient system. Experimental results are reported comparing YaDT withWeka, dti, Xelopes and (E)C4.5.Source: 16th International Conference on Tools with Artificial Intelligence (ICTAI 2004), pp. 260–265, Boca Raton, FL, USA, 15-17 November 2004
DOI: 10.1109/ictai.2004.123
Metrics:


See at: doi.org Restricted | www.computer.org Restricted | CNR ExploRA


2010 Conference article Open Access OPEN
Frequent regular itemset mining
Ruggieri S.
Concise representations of frequent itemsets sacrifice readability and direct interpretability by a data analyst of the concise patterns extracted. In this paper, we introduce an extension of itemsets, called regular, with an immediate semantics and interpretability, and a conciseness comparable to closed itemsets. Regular itemsets allow for specifying that an item may or may not be present; that any subset of an itemset may be present; and that any non-empty subset of an itemset may be present. We devise a procedure, called {\bf RegularMine}, for mining a set of regular itemsets that is a concise representation of frequent itemsets. The procedure computes a covering, in terms of regular itemsets, of the frequent itemsets in the class of equivalence of a closed one. We report experimental results on several standard dense and sparse datasets that validate the proposed approach.Source: 16th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2010), pp. 263–272, Washington D.C., USA, 25-28 July 2010
DOI: 10.1145/1835804.1835840
Metrics:


See at: www.di.unipi.it Open Access | dl.acm.org Restricted | doi.org Restricted | CNR ExploRA


2012 Journal article Restricted
A complexity perspective on entailment of parameterized linear constraints
Eirinakis P., Ruggieri S., Subramani K., Wojciechowski P.
Extending linear constraints by admitting parameters allows for more abstract problem modeling and reasoning. A lot of focus has been given to conducting research that demonstrates the usefulness of parameterized linear constraints and implementing tools that utilize their modeling strength. However, there is no approach that considers basic theoretical tools related to such constraints that allow for reasoning over them. Hence, in this paper we introduce satisf iability with respect to polyhedral sets and entailment for the class of parameterized linear constraints. In order to study the computational complexities of these problems, we relate them to classes of quantified linear implications. The problem of satisfiability with respect to polyhedral sets is then shown to be co-NP hard. The entailment problem is also shown to be co-NP hard in its general form. Nevertheless, we characterize some subclasses for which this problem is in P. Furthermore, we examine a weakening and a strengthening extension of the entailment problem. The weak entailment problem is proved to be NP complete. On the other hand, the strong entailment problem is shown to be co-NP hard.Source: Constraints (Dordrecht. Online) 17 (2012): 461–487. doi:10.1007/s10601-012-9127-x
DOI: 10.1007/s10601-012-9127-x
Metrics:


See at: Constraints Restricted | link.springer.com Restricted | CNR ExploRA


2012 Conference article Open Access OPEN
Discovering gender discrimination in project funding
Romei A., Ruggieri S., Turini F.
The selection of projects for funding can hide discriminatory decisions. We present a case study investigating gender discrimination in a dataset of scientific research proposals submitted to an Italian national call. The method for the analysis relies on a data mining classification strategy that is inspired by a legal methodology for proving evidence of social discrimination against protected-by-law groups.Source: IEEE, 12th International Conference on Data Mining Workshops, ICDMW 2012., Brussels, Belgium, 10 December 2012
DOI: 10.1109/icdmw.2012.39
Metrics:


See at: www.di.unipi.it Open Access | doi.org Restricted | ieeexplore.ieee.org Restricted | CNR ExploRA


2012 Conference article Restricted
Computational complexities of inclusion queries over polyhedral sets
Eirinakis P., Ruggieri S., Subramani K., Wojciechowski P.
In this paper we discuss the computational complexities of procedures for inclusion queries over polyhedral sets. The polyhedral sets that we consider occur in a wide range of applications, ranging from logistics to program verification. The goal of our study is to establish boundaries between hard and easy problems in this context.Source: International Symposium on Artificial Intelligence and Mathematics, Fort Lauderdale, FL, USA, 9-11 January 2012

See at: www.cs.uic.edu Restricted | CNR ExploRA


2012 Conference article Restricted
Deciding membership in a class of polyhedra
Ruggieri S.
Parameterized linear systems allow for modelling and reasoning over classes of polyhedra. Collections of squares, rectangles, polytopes, and so on can readily be defined by means of linear systems with parameters in constant terms. In this paper, we consider the membership problem of deciding whether a given polyhedron belongs to the class defined by a parameterized linear system. As an example, we are interested in questions such as: "does a given polytope belong to the class of hypercubes?" We show that the membership problem is NP-complete, even when restricting to the 2-dimensional plane. Despite the negative result, the constructive proof allows us to devise a concise decision procedure using constraint logic programming over the reals, namely CLP(R), which searches for a characterization of all instances of a parameterized system that are equivalent to a given polyhedron.Source: 20th European Conference on Artificial Intelligence, pp. 702–707, Montpellier, Francia, 27-31 agosto 2012
DOI: 10.3233/978-1-61499-098-7-702
Metrics:


See at: ebooks.iospress.nl Restricted | CNR ExploRA


2012 Conference article Restricted
Subtree replacement in decision tree simplication
Ruggieri S.
The current availability of efficient algorithms for decision tree induction makes intricate post-processing tech- niques worth to be investigated both for eciency and effectiveness. We study the simplification operator of subtree replacement, also known as grafting, originally implemented in the C4.5 system. We present a parametric bottom-up algorithm integrating grafting with the standard pruning operator, and analyze its complexity in terms of the number of nodes visited. Immediate instances of the parametric algorithm include extensions of error based, reduced error, minimum error, and pessimistic error pruning. Experimental results show that the computational cost of grafting is paid of by statis- tically significant smaller trees without accuracy loss.Source: 12th SIAM Conference on Data Mining, pp. 379–390, Anaheim, California USA, 26,28 April 2012

See at: siam.omnibooksonline.com Restricted | CNR ExploRA


2013 Contribution to book Restricted
Discrimination Data Analysis: A Multi-disciplinary Bibliography
Romei A., Ruggieri S.
Discrimination data analysis has been investigated for the last fifty years in a large body of social, legal, and economic studies. Recently, discrimination discovery and prevention has become a blooming research topic in the knowledge discovery community. This chapter provides a multi-disciplinary annotated bibliography of the literature on discrimination data analysis, with the intended objective to provide a common basis to researchers from a multi-disciplinary perspective. We cover legal, sociological, economic and computer science referencesSource: Discrimination and Privacy in the Information Society, edited by Custers, Bart and Calders, Toon and Schermer, Bart and Zarsky, Tal, pp. 109–135, 2013
DOI: 10.1007/978-3-642-30487-3_6
Metrics:


See at: doi.org Restricted | CNR ExploRA


2010 Journal article Open Access OPEN
Typing linear constraints
Ruggieri S., Mesnard F.
We present a type system for linear constraints over the reals intended for reasoning about the input-output directionality of variables. Types model the properties of definiteness, range width or approximation, lower and upper bounds of variables in a linear constraint. Several proof procedures are presented for inferring the type of a variable and for checking validity of type assertions. We rely on theory and tools for linear programming problems, linear algebra, parameterized polyhedra and negative constraints. An application of the type system is proposed in the context of the static analysis of constraint logic programs. Type assertions are at the basis of the extension of wellmoding from pure logic programming. The proof procedures (both for type assertion validity and for well-moding) are implemented and their computational omplexity is discussed. We report experimental results demonstrating the efficiency in practice of the proposed approach. © 2010 ACM.Source: ACM transactions on programming languages and systems (Online) 32 (2010): 21–42. doi:10.1145/1749608.1749610
DOI: 10.1145/1749608.1749610
Metrics:


See at: ACM Transactions on Programming Languages and Systems Open Access | dl.acm.org Restricted | ACM Transactions on Programming Languages and Systems Restricted | Hyper Article en Ligne Restricted | CNR ExploRA


2013 Conference article Open Access OPEN
Data anonimity meets non-discrimination
Ruggieri S.
We investigate the relation between t-closeness, a well-known model of data anonymization, and alpha-protection, a model of data discrimination. We show that t-closeness implies bd(t)-protection, for a bound function bd() depending on the discrimination measure at hand. This allows us to adapt an inference control method, the Mondrian multidimensional generalization technique, to the purpose of non-discrimination data protection. The parallel between the two analytical models raises intriguing issues on the interplay between data anonymization and nondiscrimination research in data mining.Source: ICDMW 2013 - IEEE 13th International Conference on Data Mining Workshops, pp. 875–882, Dallas, Texas, USA, 7-10 December 2013
DOI: 10.1109/icdmw.2013.56
Metrics:


See at: www.di.unipi.it Open Access | doi.org Restricted | ieeexplore.ieee.org Restricted | CNR ExploRA


2013 Conference article Open Access OPEN
Learning from polyhedral sets
Ruggieri S.
Parameterized linear systems allow for modelling and reasoning over classes of polyhedra. Collections of squares, rectangles, polytopes, and so on, can readily be defined by means of linear systems with parameters. In this paper, we investigate the problem of learning a parameterized linear system whose class of polyhedra includes a given set of example polyhedral sets and it is minimal.Source: IJCAI 2013 - Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1069–1075, Beijing, China, 3-9 August 2013

See at: ijcai.org Open Access | CNR ExploRA


2014 Journal article Open Access OPEN
A multidisciplinary survey on discrimination analysis
Romei A., Ruggieri S.
The collection and analysis of observational and experimental data represent the main tools for assessing the presence, the extent, the nature, and the trend of discrimination phenomena. Data analysis techniques have been proposed in the last 50 years in the economic, legal, statistical, and, recently, in the data mining literature. This is not surprising, since discrimination analysis is a multidisciplinary problem, involving sociological causes, legal argumentations, economic models, statistical techniques, and computational issues. The objective of this survey is to provide a guidance and a glue for researchers and anti-discrimination data analysts on concepts, problems, application areas, datasets, methods, and approaches from a multidisciplinary perspective. We organize the approaches according to their method of data collection as observational, quasi-experimental, and experimental studies. A fourth line of recently blooming research on knowledge discovery based methods is also covered. Observational methods are further categorized on the basis of their application context: labor economics, social profiling, consumer markets, and others.Source: Knowledge engineering review (Print) 29 (2014): 582–638. doi:10.1017/S0269888913000039
DOI: 10.1017/s0269888913000039
Metrics:


See at: The Knowledge Engineering Review Open Access | The Knowledge Engineering Review Restricted | journals.cambridge.org Restricted | CNR ExploRA


2017 Conference article Open Access OPEN
Efficiently clustering very large attributed graphs
Baroni A., Conte A., Patrignani M., Ruggieri S.
Attributed graphs model real networks by enriching their nodes with attributes accounting for properties. Several techniques have been proposed for partitioning these graphs into clusters that are homogeneous with respect to both semantic attributes and to the structure of the graph. However, time and space complexities of state of the art algorithms limit their scalability to medium-sized graphs. We propose SToC (for Semantic-Topological Clustering), a fast and scalable algorithm for partitioning large attributed graphs. The approach is robust, being compatible both with categorical and with quantitative attributes, and it is tailorable, allowing the user to weight the semantic and topological components. Further, the approach does not require the user to guess in advance the number of clusters. SToC relies on well known approximation techniques such as bottom-k sketches, traditional graph-theoretic concepts, and a new perspective on the composition of heterogeneous distance measures. Experimental results demonstrate its ability to efficiently compute high-quality partitions of large scale attributed graphs.Source: 2017 IEEE/ACM: International Conference on Advances in Social Networks Analysis and Mining 2017, July-August 2017
DOI: 10.1145/3110025.3110030
DOI: 10.48550/arxiv.1703.08590
Metrics:


See at: arXiv.org e-Print Archive Open Access | arxiv.org Open Access | ISTI Repository Open Access | dl.acm.org Restricted | doi.org Restricted | doi.org Restricted | CNR ExploRA


2004 Journal article Open Access OPEN
Bounded nondeterminism of logic programs
Pedreschi D., Ruggieri S.
We introduce the notion of bounded nondeterminism for logic programs and queries. A program and a query have bounded nondeterminism if there are finitely many refutations for them via any selection rule. We offer a declarative characterization of the class of programs and queries that have bounded nondeterminism by defining bounded programs and queries. The characterization is provided in terms of Herbrand interpretations and level mappings, in the style of existing characterizations of universal termination. A direct application of the theoretical framework is concerned with the automatic generation of a terminating control. We present a transformational approach that given a bounded program and a bounded query yields a terminating program and query with the same set of refutations. Concerning the issue of automating the approach, by means of an example we sketch how an automatic method for proving left termination can be adapted to the purpose of inferring boundedness. Such an adaption reveals that the method does not scale up to medium/large sized programs due to scarce modularity of the required proof obligations. We provide then a modular refinement of boundedness for the significant class of well-moded programs and queries.Source: Annals of mathematics and artificial intelligence 42 (2004): 313–343. doi:10.1023/B:AMAI.0000038175.53999.3a
DOI: 10.1023/b:amai.0000038175.53999.3a
Metrics:


See at: Annals of Mathematics and Artificial Intelligence Open Access | Annals of Mathematics and Artificial Intelligence Restricted | www.springerlink.com Restricted | CNR ExploRA


2011 Contribution to book Open Access OPEN
Who/where are my new customers?
Rinzivillo Salvatore, Ruggieri Salvatore
We present a knowledge discovery case study on customer classification having the objective of mining the distinctive characteristics of new customers of a service of tax return. Two general approaches are described. The first one, a symbolic approach, is based on extracting and ranking classification rules on the basis of significativeness measures defined on the 4-fold contingency table of a rule. The second one, a spatial approach, is based on extracting geographic areas with predominant presence of new customers.Source: Emerging Intelligent Technologies in Industry, edited by Dominik Ry?ko, Henryk Rybi?ski, Piotr Gawrysiak, Marzena Kryszkiewicz, pp. 307. Berlin/Heidelberg: Springer-Verlag, 2011
DOI: 10.1007/978-3-642-22732-5_25
Metrics:


See at: www.di.unipi.it Open Access | doi.org Restricted | link.springer.com Restricted | CNR ExploRA


2011 Conference article Open Access OPEN
k-NN as an implementation of situation testing for discrimination discovery and prevention
Luong Binh Thanh, Ruggieri Salvatore, Turini Franco
With the support of the legally-grounded methodology of situation testing, we tackle the problems of discrimination discovery and prevention from a dataset of historical decisions by adopting a variant of k-NN classifi cation. A tuple is labeled as discriminated if we can observe a signi ficant di erence of treatment among its neighbors belonging to a protected-by-law group and its neighbors not belonging to it. Discrimination discovery boils down to extracting a classi fication model from the labeled tuples. Discrimination prevention is tackled by changing the decision value for tuples labeled as discriminated before training a classi fier. The approach of this paper overcomes legal weaknesses and technical limitations of existing proposals.Source: 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '11, pp. 502–510, San Diego, California, USA, August 21-24 2011
DOI: 10.1145/2020408.2020488
Metrics:


See at: www.di.unipi.it Open Access | doi.org Restricted | CNR ExploRA


2013 Journal article Open Access OPEN
Discrimination discovery in scientific project evaluation: a case study
Romei A., Ruggieri S., Turini F.
Discovering contexts of unfair decisions in a dataset of historical decision records is a non-trivial problem. It requires the design of ad hoc methods and techniques of analysis, which have to comply with existing laws and with legal argumentations. While some data mining techniques have been adapted to the purpose, the state-of-the-art of research still needs both methodological refinements, the consolidation of a Knowledge Discovery in Databases (KDD) process, and, most of all, experimentation with real data. This paper contributes by presenting a case study on gender discrimination in a dataset of scientific research proposals, and by distilling from the case study a general discrimination discovery process. Gender bias in scientific research is a challenging problem, that has been tackled in the social sciences literature by means of statistical regression. However, this approach is limited to test an hypothesis of discrimination over the whole dataset under analysis. Our methodology couples data mining, for unveiling previously unknown contexts of possible discrimination, with statistical regression, for testing the significance of such contexts, thus obtaining the best of the two worlds. (C) 2013 Elsevier Ltd. All rights reserved.Source: Expert systems with applications 40 (2013): 6064–6079. doi:10.1016/j.eswa.2013.05.016
DOI: 10.1016/j.eswa.2013.05.016
Metrics:


See at: Expert Systems with Applications Open Access | Expert Systems with Applications Restricted | www.sciencedirect.com Restricted | CNR ExploRA


2014 Journal article Open Access OPEN
Decision tree building on multi-core using FastFlow
Aldinucci M., Ruggieri S., Torquati M.
The whole computer hardware industry embraced the multi-core. The extreme optimisation of sequential algorithms is then no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an in-depth study of the parallelisation of an implementation of the C4.5 algorithm for multi-core architectures. We characterise elapsed time lower bounds for the forms of parallelisations adopted and achieve close to optimal performance. Our implementation is based on the FastFlow parallel programming environment, and it requires minimal changes to the original sequential code. Copyright © 2013 John Wiley & Sons, Ltd. Copyright © 2013 John Wiley & Sons, Ltd.Source: Concurrency and computation 26 (2014): 800–820. doi:10.1002/cpe.3063
DOI: 10.1002/cpe.3063
Metrics:


See at: Concurrency and Computation Practice and Experience Open Access | Concurrency and Computation Practice and Experience Restricted | onlinelibrary.wiley.com Restricted | CNR ExploRA


2018 Report Open Access OPEN
Assessing the stability of interpretable models
Guidotti R., Ruggieri S.
Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process, which, in particular, comprises data collection and filtering. Selection bias in data collection or in data pre-processing may affect the model learned. Although model induction algorithms are designed to learn to generalize, they pursue optimization of predictive accuracy. It remains unclear how interpretability is instead impacted. We conduct an experimental analysis to investigate whether interpretable models are able to cope with data selection bias as far as interpretability is concerned.Source: ISTI Technical reports, 2018
Project(s): SoBigData via OpenAIRE

See at: arxiv.org Open Access | ISTI Repository Open Access | CNR ExploRA


2020 Journal article Open Access OPEN
Causal inference for social discrimination reasoning
Qureshi B., Kamiran F., Karim A., Ruggieri S., Pedreschi D.
The discovery of discriminatory bias in human or automated decision making is a task of increasing importance and difficulty, exacerbated by the pervasive use of machine learning and data mining. Currently, discrimination discovery largely relies upon correlation analysis of decisions records, disregarding the impact of confounding biases. We present a method for causal discrimination discovery based on propensity score analysis, a statistical tool for filtering out the effect of confounding variables. We introduce causal measures of discrimination which quantify the effect of group membership on the decisions, and highlight causal discrimination/favoritism patterns by learning regression trees over the novel measures. We validate our approach on two real world datasets. Our proposed framework for causal discrimination has the potential to enhance the transparency of machine learning with tools for detecting discriminatory bias both in the training data and in the learning algorithms.Source: Journal of intelligent information systems 54 (2020): 425–437. doi:10.1007/s10844-019-00580-x
DOI: 10.1007/s10844-019-00580-x
DOI: 10.48550/arxiv.1608.03735
Metrics:


See at: arXiv.org e-Print Archive Open Access | Journal of Intelligent Information Systems Open Access | ISTI Repository Open Access | Journal of Intelligent Information Systems Restricted | doi.org Restricted | link.springer.com Restricted | CNR ExploRA