2020
Contribution to book
Open Access
Explaining multi-label black-box classifiers for health applications
Panigutti C., Guidotti R., Monreale A., Pedreschi D.Today the state-of-the-art performance in classification is achieved by the so-called âEURoeblack boxesâEUR, i.e. decision-making systems whose internal logic is obscure. Such models could revolutionize the health-care system, however their deployment in real-world diagnosis decision support systems is subject to several risks and limitations due to the lack of transparency. The typical classification problem in health-care requires a multi-label approach since the possible labels are not mutually exclusive, e.g. diagnoses. We propose MARLENA, a model-agnostic method which explains multi-label black box decisions. MARLENA explains an individual decision in three steps. First, it generates a synthetic neighborhood around the instance to be explained using a strategy suitable for multi-label decisions. It then learns a decision tree on such neighborhood and finally derives from it a decision rule that explains the black box decision. Our experiments show that MARLENA performs well in terms of mimicking the black box behavior while gaining at the same time a notable amount of interpretability through compact decision rules, i.e. rules with limited length.Source: Precision Health and Medicine. A Digital Revolution in Healthcare, edited by Arash Shaban-Nejad, Martin Michalowski, pp. 97–110, 2020
DOI: 10.1007/978-3-030-24409-5_9Metrics:
See at:
media.springer.com
| doi.org
| link.springer.com
| CNR ExploRA
2019
Journal article
Open Access
A survey of methods for explaining black box models
Guidotti R., Monreale A., Ruggieri S., Turini F., Giannotti F., Pedreschi D.In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.Source: ACM computing surveys 51 (2019). doi:10.1145/3236009
DOI: 10.1145/3236009DOI: 10.48550/arxiv.1802.01933Project(s): SoBigData
Metrics:
See at:
arXiv.org e-Print Archive
| Archivio istituzionale della Ricerca - Scuola Normale Superiore
| dl.acm.org
| ACM Computing Surveys
| Archivio della Ricerca - Università di Pisa
| ISTI Repository
| CNR ExploRA
| ACM Computing Surveys
| doi.org
2019
Journal article
Open Access
PRIMULE: Privacy risk mitigation for user profiles
Pratesi F., Gabrielli L., Cintia P., Monreale A., Giannotti F.The availability of mobile phone data has encouraged the development of different data-driven tools, supporting social science studies and providing new data sources to the standard official statistics. However, this particular kind of data are subject to privacy concerns because they can enable the inference of personal and private information. In this paper, we address the privacy issues related to the sharing of user profiles, derived from mobile phone data, by proposing PRIMULE, a privacy risk mitigation strategy. Such a method relies on PRUDEnce (Pratesi et al., 2018), a privacy risk assessment framework that provides a methodology for systematically identifying risky-users in a set of data. An extensive experimentation on real-world data shows the effectiveness of PRIMULE strategy in terms of both quality of mobile user profiles and utility of these profiles for analytical services such as the Sociometer (Furletti et al., 2013), a data mining tool for city users classification.Source: Data & knowledge engineering 125 (2019). doi:10.1016/j.datak.2019.101786
DOI: 10.1016/j.datak.2019.101786Project(s): SoBigData
Metrics:
See at:
ISTI Repository
| Archivio istituzionale della Ricerca - Scuola Normale Superiore
| Data & Knowledge Engineering
| CNR ExploRA
| www.sciencedirect.com
2019
Conference article
Closed Access
Exploring students eating habits through individual profiling and clustering analysis
Natilli M., Monreale A., Guidotti R., Pappalardo L.Individual well-being strongly depends on food habits, therefore it is important to educate the general population, and especially young people, to the importance of a healthy and balanced diet. To this end, understanding the real eating habits of people becomes fundamental for a better and more effective intervention to improve the students' diet. In this paper we present two exploratory analyses based on centroid-based clustering that have the goal of understanding the food habits of university students. The first clustering analysis simply exploits the information about the students' food consumption of specific food categories, while the second exploratory analysis includes the temporal dimension in order to capture the information about when the students consume specific foods. The second approach enables the study of the impact of the time of consumption on the choice of the food.Source: PAP 2018 - The 2nd International Workshop on Personal Analytics and Privacy, pp. 156–171, Dublin, Ireland, 10-14 September 2018
DOI: 10.1007/978-3-030-13463-1_12Project(s): SoBigData
Metrics:
See at:
doi.org
| link.springer.com
| CNR ExploRA
2018
Contribution to book
Open Access
How data mining and machine learning evolved from relational data base to data science
Amato G., Candela L., Castelli D., Esuli A., Falchi F., Gennaro C., Giannotti F., Monreale A., Nanni M., Pagano P., Pappalardo L., Pedreschi D., Pratesi F., Rabitti F., Rinzivillo S., Rossetti G., Ruggieri S., Sebastiani F., Tesconi M.During the last 35 years, data management principles such as physical and logical independence, declarative querying and cost-based optimization have led to profound pervasiveness of relational databases in any kind of organization. More importantly, these technical advances have enabled the first round of business intelligence applications and laid the foundation for managing and analyzing Big Data today.Source: A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years, edited by Sergio Flesca, Sergio Greco, Elio Masciari, Domenico Saccà, pp. 287–306, 2018
DOI: 10.1007/978-3-319-61893-7_17Metrics:
See at:
arpi.unipi.it
| ISTI Repository
| doi.org
| link.springer.com
| CNR ExploRA
2018
Journal article
Open Access
Discovering temporal regularities in retail customers' shopping behavior
Guidotti R., Gabrielli L., Monreale A., Pedreschi D., Giannotti F.In this paper we investigate the regularities characterizing the temporal purchasing behavior of the customers of a retail market chain. Most of the literature studying purchasing behavior focuses on what customers buy while giving few importance to the temporal dimension. As a consequence, the state of the art does not allow capturing which are the temporal purchasing patterns of each customers. These patterns should describe the customer's temporal habits highlighting when she typically makes a purchase in correlation with information about the amount of expenditure, number of purchased items and other similar aggregates. This knowledge could be exploited for different scopes: set temporal discounts for making the purchases of customers more regular with respect the time, set personalized discounts in the day and time window preferred by the customer, provide recommendations for shopping time schedule, etc. To this aim, we introduce a framework for extracting from personal retail data a temporal purchasing profile able to summarize whether and when a customer makes her distinctive purchases. The individual profile describes a set of regular and characterizing shopping behavioral patterns, and the sequences in which these patterns take place. We show how to compare different customers by providing a collective perspective to their individual profiles, and how to group the customers with respect to these comparable profiles. By analyzing real datasets containing millions of shopping sessions we found that there is a limited number of patterns summarizing the temporal purchasing behavior of all the customers, and that they are sequentially followed in a finite number of ways. Moreover, we recognized regular customers characterized by a small number of temporal purchasing behaviors, and changing customers characterized by various types of temporal purchasing behaviors. Finally, we discuss on how the profiles can be exploited both by customers to enable personalized services, and by the retail market chain for providing tailored discounts based on temporal purchasing regularity.Source: EPJ 7 (2018): 6. doi:10.1140/epjds/s13688-018-0133-0
DOI: 10.1140/epjds/s13688-018-0133-0Project(s): SoBigData
Metrics:
See at:
EPJ Data Science
| epjdatascience.springeropen.com
| EPJ Data Science
| Archivio della Ricerca - Università di Pisa
| EPJ Data Science
| ISTI Repository
| CNR ExploRA
2018
Report
Open Access
Local rule-based explanations of black box decision systems
Guidotti R., Monreale A., Ruggieri S., Pedreschi D., Turini F., Giannotti F.The recent years have witnessed the rise of accurate but obscure decision systems which hide the logic of their internal decision processes to the users. The lack of explanations for the decisions of black box systems is a key ethical issue, and a limitation to the adoption of machine learning components in socially sensitive and safety-critical contexts.% Therefore, we need explanations that reveals the reasons why a predictor takes a certain decision. In this paper we focus on the problem of black box outcome explanation, ie, explaining the reasons of the decision taken on a specific instance. We propose LORE, an agnostic method able to provide interpretable and faithful explanations. LORE first leans a local interpretable predictor on a synthetic neighborhood generated by a genetic algorithm. Then it derives from the logic of the local interpretable predictor a meaningful explanation consisting of: a decision rule, which explains the reasons of the decision; and a set of counterfactual rules, suggesting the changes in the instance's features that lead to a different outcome. Wide experiments show that LORE outperforms existing methods and baselines both in the quality of explanations and in the accuracy in mimicking the black box.Source: ISTI Technical reports, 2018
Project(s): SoBigData 
See at:
arxiv.org
| ISTI Repository
| CNR ExploRA
2018
Journal article
Open Access
Gastroesophageal reflux symptoms among Italian university students: epidemiology and dietary correlates using automatically recorded transactions
Martinucci I., Natilli M., Lorenzoni V., Pappalardo L., Monreale A., Turchetti G., Pedreschi D., Marchi S., Barale R., De Bortoli N.Gastroesophageal reflux disease (GERD) is one of the most common gastrointestinal disorders worldwide, with relevant impact on the quality of life and health care costs.The aim of our study is to assess the prevalence of GERD based on self-reported symptoms among university students in central Italy. The secondary aim is to evaluate lifestyle correlates, particularly eating habits, in GERD students using automatically recorded transactions through cashiers at university canteen.Source: BMC gastroenterology (Online) 18 (2018): 116. doi:10.1186/s12876-018-0832-9
DOI: 10.1186/s12876-018-0832-9Project(s): SoBigData
Metrics:
See at:
bmcgastroenterol.biomedcentral.com
| BMC Gastroenterology
| BMC Gastroenterology
| BMC Gastroenterology
| Archivio della ricerca della Scuola Superiore Sant'Anna
| DOAJ-Articles
| ISTI Repository
| CNR ExploRA
2018
Report
Open Access
Open the black box data-driven explanation of black box decision systems
Pedreschi D., Giannotti F., Guidotti R., Monreale A., Pappalardo L., Ruggieri S., Turini F.Black box systems for automated decision making, often based on machine learning over (big) data, map a user's features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases hidden in the algorithms, due to human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. We introduce the local-to-global framework for black box explanation, a novel approach with promising early results, which paves the road for a wide spectrum of future developments along three dimensions:(i) the language for expressing explanations in terms of highly expressive logic-based rules, with a statistical and causal interpretation;(ii) the inference of local explanations aimed at revealing the logic of the decision adopted for a specific instance by querying and auditing the black box in the vicinity of the target instance;(iii), the bottom-up generalization of the many local explanations into simple global ones, with algorithms that optimize the quality and comprehensibility of explanations.Source: ISTI Technical reports, 2018
Project(s): SoBigData 
See at:
arxiv.org
| ISTI Repository
| CNR ExploRA
2017
Conference article
Open Access
Clustering individual transactional data for masses of users
Guidotti R., Monreale A., Nanni M., Giannotti F., Pedreschi D.Mining a large number of datasets recording human activities for making sense of individual data is the key enabler of a new wave of personalized knowledge-based services. In this paper we focus on the problem of clustering individual transactional data for a large mass of users. Transactional data is a very pervasive kind of information that is collected by several services, often involving huge pools of users. We propose txmeans, a parameter-free clustering algorithm able to efficiently partitioning transactional data in a completely automatic way. Txmeans is designed for the case where clustering must be applied on a massive number of different datasets, for instance when a large set of users need to be analyzed individually and each of them has generated a long history of transactions. A deep experimentation on both real and synthetic datasets shows the practical effectiveness of txmeans for the mass clustering of different personal datasets, and suggests that txmeans outperforms existing methods in terms of quality and efficiency. Finally, we present a personal cart assistant application based on txmeans.Source: International Conference on Knowledge Discovery and Data Mining, pp. 195–204, Halifax, Canada, 13-17/08/2017
DOI: 10.1145/3097983.3098034Project(s): SoBigData
Metrics:
See at:
arpi.unipi.it
| Archivio della Ricerca - Università di Pisa
| ISTI Repository
| dl.acm.org
| doi.org
| CNR ExploRA
2017
Contribution to book
Open Access
Personal Analytics and Privacy. An Individual and Collective Perspective
Guidotti R., Monreale A., Pedreschi D., Abiteboul S.The First International Workshop on Personal Analytics and Privacy (PAP) was held in Skopje, Macedonia, on September 18, 2017. The purpose of the workshop is to encourage principled research that will lead to the advancement of personal data analytics, personal services development, privacy, data protection, and privacy risk assessment with the intent of bringing together researchers and practitioners interested in personal analytics and privacy. The workshop, collocated with the conference ECML/PKDD 2017, sought top-quality submissions addressing important issues related to personal analytics, personal data mining, and privacy in the context where real individual data (spatio temporal data, call details records, tweets, mobility data, transactional data, social networking data, etc.) are used for developing data-driven services, for realizing social studies aimed at understanding nowadays society, and for publication purposes.Source: Personal Analytics and Privacy. An Individual and Collective Perspective First International Workshop, PAP 2017, Held in Conjunction with ECML PKDD 2017, Skopje, Macedonia, September 18, 2017, Revised Selected Papers, edited by Guidotti, R.; Monreale, A.; Pedreschi, D.; Abiteboul, S., pp. V–VI, 2017
DOI: 10.1007/978-3-319-71970-2Project(s): SoBigData
Metrics:
See at:
ISTI Repository
| doi.org
| link.springer.com
| CNR ExploRA
2017
Conference article
Restricted
Fast estimation of privacy risk in human mobility data
Pellungrini R., Pappalardo L., Pratesi F., Monreale A.Mobility data are an important proxy to understand the patterns of human movements, develop analytical services and design models for simulation and prediction of human dynamics. Unfortunately mobility data are also very sensitive, since they may contain personal information about the individuals involved. Existing frameworks for privacy risk assessment enable the data providers to quantify and mitigate privacy risks, but they suffer two main limitations: (i) they have a high computational complexity; (ii) the privacy risk must be re-computed for each new set of individuals, geographic areas or time windows. In this paper we explore a fast and flexible solution to estimate privacy risk in human mobility data, using predictive models to capture the relation between an individual's mobility patterns and her privacy risk. We show the effectiveness of our approach by experimentation on a real-world GPS dataset and provide a comparison with traditional methods.Source: SAFECOMP 2017 - International Conference on Computer Safety, Reliability, and Security, pp. 415–426, Trento, Italy, 12 September 2017
DOI: 10.1007/978-3-319-66284-8_35Project(s): SoBigData
Metrics:
See at:
Lecture Notes in Computer Science
| link.springer.com
| CNR ExploRA
2017
Contribution to book
Open Access
Personal Analytics and Privacy. An Individual and Collective Perspective: First International Workshop, PAP 2017, Held in Conjunction with ECML PKDD 2017, Skopje, Macedonia, September 18, 2017, Revised Selected Papers
Guidotti R., Monreale A., Pedreschi D., Abiteboul S.This book constitutes the thoroughly refereed post-conference proceedings of the First International Workshop on Personal Analytics and Privacy, PAP 2017, held in Skopje, Macedonia, in September 2017. The 14 papers presented together with 2 invited talks in this volume were carefully reviewed and selected for inclusion in this book and handle topics such as personal analytics, personal data mining and privacy in the context where real individual data are used for developing a data-driven service, for realizing a social study aimed at understanding nowadays society, and for publication purposes.DOI: 10.1007/978-3-319-71970-2Project(s): SoBigData
Metrics:
See at:
ISTI Repository
| doi.org
| CNR ExploRA
| www.springer.com
2016
Journal article
Open Access
Big data research in Italy: a perspective
Bergamaschi S., Carlini E., Ceci M., Furletti B., Giannotti F., Malerba D., Mezzanzanica M., Monreale A., Pasi G., Pedreschi D., Perego R., Ruggieri S.The aim of this article is to synthetically describe the research projects that a selection of Italian universities is undertaking in the context of big data. Far from being exhaustive, this article has the objective of offering a sample of distinct applications that address the issue of managing huge amounts of data in Italy, collected in relation to diverse domains.Source: Engineering (Beijing) 2 (2016): 163–170. doi:10.1016/J.ENG.2016.02.011
DOI: 10.1016/j.eng.2016.02.011Metrics:
See at:
doi.org
| ISTI Repository
| CNR ExploRA
| Engineering
2015
Conference article
Open Access
Quantification in social networks
Milli L., Monreale A., Rossetti G., Pedreschi D., Giannotti F., Sebastiani F.In many real-world applications there is a need to monitor the distribution of a population across different classes, and to track changes in this distribution over time. As an example, an important task is to monitor the percentage of unemployed adults in a given region. When the membership of an individual in a class cannot be established deterministically, a typical solution is the classification task. However, in the above applications the final goal is not determining which class the individuals belong to, but estimating the prevalence of each class in the unlabeled data. This task is called quantification. Most of the work in the literature addressed the quantification problem considering data presented in conventional attribute format. Since the ever-growing availability of web and social media we have a flourish of network data representing a new important source of information and by using quantification network techniques we could quantify collective behavior, i.e., the number of users that are involved in certain type of activities, preferences, or behaviors. In this paper we exploit the homophily effect observed in many social networks in order to construct a quantifier for networked data. Our experiments show the effectiveness of the proposed approaches and the comparison with the existing state-of-the-art quantification methods shows that they are more accurate.Source: IEEE International Conference on Data Science and Advanced Analytics, Paris, France, 19-21/10/2015
DOI: 10.1109/dsaa.2015.7344845Project(s): CIMPLEX
Metrics:
See at:
ISTI Repository
| doi.org
| ieeexplore.ieee.org
| CNR ExploRA
2015
Contribution to book
Open Access
Retrieving points of interest from human systematic movements
Guidotti R., Monreale A., Rinzivillo S., Pedreschi D., Giannotti F.Human mobility analysis is emerging as a more and more fundamental task to deeply understand human behavior. In the last decade these kind of studies have become feasible thanks to the massive increase in availability of mobility data. A crucial point, for many mobility applications and analysis, is to extract interesting locations for people. In this paper, we propose a novel methodology to retrieve efficiently significant places of interest from movement data. Using car drivers' systematic movements we mine everyday interesting locations, that is, places around which people life gravitates. The outcomes show the empirical evidence that these places capture nearly the whole mobility even though generated only from systematic movements abstractions.Source: Software Engineering and Formal Methods, edited by Carlos Canal, Akram Idani, pp. 294–308, 2015
DOI: 10.1007/978-3-319-15201-1_19Project(s): PETRA
Metrics:
See at:
ISTI Repository
| doi.org
| link.springer.com
| CNR ExploRA