Page 1 of 5

2015 Conference article Restricted

Social or green? A datadriven approach for more enjoyable carpooling
Guidotti R, Sassi A, Berlingerio M, Pascale A
Carpooling, i.e. the sharing of vehicles to reach common destinations, is often performed to reduce costs and pollution. Recent works on carpooling and journey planning take into account, besides mobility match, also social aspects and, more generally, non-monetary rewards. In line with this, we presenta data-driven methodology for a more enjoyable carpooling. We introduce a measure of enjoyability based on people's interests,social links, and tendency to connect to people with similar or dissimilar interests. We devise a methodology to compute enjoyability from crowd-sourced data, and we show how this can be used on real world datasets to optimize for both mobility and enjoyability. Our methodology was tested on real data from Rome and San Francisco. We compare the results of an optimization model minimizing the number of cars, and a greedy approach maximizing the enjoyability. We evaluate them in terms of cars saved, and average enjoyability of the system. We present also the results of a user study, with more than 200 users reporting an interest of 39% in the enjoyable solution. Moreover, 24%of people declared that sharing the car with interesting people would be the primary motivation for carpooling.DOI: 10.1109/itsc.2015.142
Project(s): PETRA via OpenAIRE

Metrics:

See at: doi.org Restricted | CNR IRIS | ieeexplore.ieee.org | CNR IRIS | CNR IRIS

2016 Conference article Open Access

Where is my next friend? Recommending enjoyable profiles in location based services
Guidotti R, Berlingerio M
How many of your friends, with whom you enjoy spending some time, live close by? How many people are at your reach, with whom you could have a nice conversation? We introduce a measure of enjoyability that may be the basis for a new class of location-based services aimed at maximizing the likelihood that two persons, or a group of people, would enjoy spending time together. Our enjoyability takes into account both topic similarity between two users and the users' tendency to connect to people with similar or dissimilar interest. We computed the enjoyability on two datasets of geo-located tweets, and we reasoned on the applicability of the obtained results for producing friend recommendations. We aim at suggesting couples of users which are not friends yet, but which are frequently co-located and maximize our enjoyability measure. By taking into account the spatial dimension, we show how 50% of users may find at least one enjoyable person within 10km of their two most visited locations. Our results are encouraging, and open the way for a new class of recommender systems based on enjoyability.Source: STUDIES IN COMPUTATIONAL INTELLIGENCE (PRINT), vol. 644, pp. 65-78. Dijion, France, 23-25 March, 2016
DOI: 10.1007/978-3-319-30569-1_5
Project(s): PETRA via OpenAIRE

Metrics:

See at: www.springer.com Open Access | doi.org Restricted | CNR IRIS | CNR IRIS | link.springer.com

2017 Journal article Open Access

The GRAAL of carpooling: GReen And sociAL optimization from crowd-sourced data
Berlingerio M, Ghaddar B, Guidotti R, Pascale A, Sassi A
Carpooling, i.e. the sharing of vehicles to reach common destinations, is often performed to reduce costs and pollution. Recent work on carpooling takes into account, besides mobility matches, also social aspects and, more generally, non-monetary incentives. In line with this, we present GRAAL, a data-driven methodology for GReen And sociAL carpooling. GRAAL optimizes a carpooling system not only by minimizing the number of cars needed at the city level, but also by maximizing the enjoyability of people sharing a trip. We introduce a measure of enjoyability based on people's interests, social links, and tendency to connect to people with similar or dissimilar interests. GRAAL computes the enjoyability within a set of users from crowd-sourced data, and then uses it on real world datasets to optimize a weighted linear combination of number of cars and enjoyability. To tune this weight, and to investigate the users' interest on the social aspects of carpooling, we conducted an online survey on potential carpooling users. We present the results of applying GRAAL on real world crowd-sourced data from the cities of Rome and San Francisco. Computational results are presented from both the city and the user perspective. Using the crowd-sourced weight, GRAAL is able to significantly reduce the number of cars needed, while keeping a high level of enjoyability on the tested data-set. From the user perspective, we show how the entire per-car distribution of enjoyability is increased with respect to the baselines.Source: TRANSPORTATION RESEARCH. PART C, EMERGING TECHNOLOGIES, vol. 80, pp. 20-36
DOI: 10.1016/j.trc.2017.02.025
Project(s): PETRA via OpenAIRE

Metrics:

2017 Other Restricted

Personal Data Analytics: Capturing Human Behavior to Improve Self-Awareness and Personal Services through Individual and Collective Knowledge
Guidotti R
In the era of Big Data, every single user of our hyper-connected world leaves behind a myriad of digital breadcrumbs while performing her daily activities. It is sufficient to think of a simple smartphone that enables each one of us to browse the Web, listen to music on online musical services, post messages on social networks, perform online shopping sessions, acquire images and videos and record our geographical locations. This enormous amount of personal data could be exploited to improve the lifestyle of each individual by extracting, analyzing and exploiting user's behavioral patterns like the items frequently purchased, the routinary movements, the favorite sequence of songs listened, etc. However, even though some user-centric models for data management named Personal Data Store are emerging, currently there is still a significant lack in terms of algorithms and models specifically designed to extract and capture knowledge from personal data. This thesis proposes an extension to the idea of Personal Data Store through Personal Data Analytics. In practice, we describe parameter-free algorithms that do not need to be tuned by experts and are able to automatically extract the patterns from the user's data. We define personal data models to characterize the user profile which are able to capture and collect the users' behavioral patterns. In addition, we propose individual and collective services exploiting the knowledge extracted with Personal Data Analytics algorithm and models. The services are provided for the users which are organized in a Personal Data Ecosystem in form of a peer distributed network, and are available to share part of their own patterns as a return of the service providing. We show how the sharing with the collectivity enables or improves, the services analyzed. The sharing enhances the level of the service for individuals, for example by providing to the user an invaluable opportunity for having a better perception of her self-awareness. Moreover, at the same time, knowledge sharing can lead to forms of collective gain, like the reduction of the number of circulating cars. To prove the feasibility of Personal Data Analytics in terms of algorithms, models and services proposed we report an extensive experimentation on real world data.Project(s): CIMPLEX via OpenAIRE

, PETRA

, SoBigData via OpenAIRE

See at: CNR IRIS Restricted | CNR IRIS

2018 Conference article Open Access

On the Equivalence Between Community Discovery and Clustering
Guidotti R, Coscia M
Clustering is the subset of data mining techniques used to agnostically classify entities by looking at their attributes. Clustering algorithms specialized to deal with complex networks are called community discovery. Notwithstanding their common objectives, there are crucial assumptions in community discovery edge sparsity and only one node type, among others which makes its mapping to clustering non trivial. In this paper, we propose a community discovery to clustering mapping, by focusing on transactional data clustering. We represent a network as a transactional dataset, and we find communities by grouping nodes with common items (neighbors) in their baskets (neighbor lists). By comparing our results with ground truth communities and state of the art community discovery methods, we show that transactional clustering algorithms are a feasible alternative to community discovery, and that a complete mapping of the two problems is possible.Source: LECTURE NOTES OF THE INSTITUTE FOR COMPUTER SCIENCES, SOCIAL INFORMATICS AND TELECOMMUNICATIONS ENGINEERING, pp. 342-352. Pisa, Italy, 29-30/11/2017
DOI: 10.1007/978-3-319-76111-4_34
Project(s): SoBigData via OpenAIRE

Metrics:

See at: CNR IRIS Open Access | link.springer.com | ISTI Repository | Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Restricted | CNR IRIS

2018 Conference article Open Access

Explaining successful docker images using pattern mining analysis
Guidotti R, Soldani J, Neri D, Brogi A
Docker is on the rise in today's enterprise IT. It permits shipping applications inside portable containers, which run from so-called Docker images. Docker images are distributed in public registries, which also monitor their popularity. The popularity of an image directly impacts on its usage, and hence on the potential revenues of its developers. In this paper, we present a frequent pattern mining-based approach for understanding how to improve an image to increase its popularity. The results in this work can provide valuable insights to Docker image providers, helping them to design more competitive software products.DOI: 10.1007/978-3-030-04771-9_9
Project(s): SoBigData via OpenAIRE

Metrics:

2019 Conference article Open Access

Investigating neighborhood generation methods for explanations of obscure image classifiers
Guidotti R, Monreale A, Cariaggi L
Given the wide use of machine learning approaches based on opaque prediction models, understanding the reasons behind decisions of black box decision systems is nowadays a crucial topic. We address the problem of providing meaningful explanations in the widely-applied image classification tasks. In particular, we explore the impact of changing the neighborhood generation function for a local interpretable model-agnostic explanator by proposing four different variants. All the proposed methods are based on a grid-based segmentation of the images, but each of them proposes a different strategy for generating the neighborhood of the image for which an explanation is required. A deep experimentation shows both improvements and weakness of each proposed approach.DOI: 10.1007/978-3-030-16148-4_5
Project(s): Track and Know via OpenAIRE

, SoBigData via OpenAIRE

Metrics:

See at: arpi.unipi.it Open Access | Lecture Notes in Computer Science Restricted | CNR IRIS | CNR IRIS | link.springer.com

2019 Conference article Open Access

Privacy risk for individual basket patterns
Pellungrini R, Monreale A, Guidotti R
Retail data are of fundamental importance for businesses and enterprises that want to understand the purchasing behaviour of their customers. Such data is also useful to develop analytical services and for marketing purposes, often based on individual purchasing patterns. However, retail data and extracted models may also provide very sensitive information to possible malicious third parties. Therefore, in this paper we propose a methodology for empirically assessing privacy risk in the releasing of individual purchasing data. The experiments on real-world retail data show that although individual patterns describe a summary of the customer activity, they may be successful used for the customer re-identifiation.DOI: 10.1007/978-3-030-13463-1_11
Project(s): SoBigData via OpenAIRE

Metrics:

See at: arpi.unipi.it Open Access | Lecture Notes in Computer Science Restricted | CNR IRIS | CNR IRIS | link.springer.com

2019 Conference article Restricted

Helping your docker images to spread based on explainable models
Guidotti R, Soldani J, Neri D, Brogi A, Pedreschi D
Docker is on the rise in today's enterprise IT. It permits shipping applications inside portable containers, which run from so-called Docker images. Docker images are distributed in public registries, which also monitor their popularity. The popularity of an image impacts on its actual usage, and hence on the potential revenues for its developers. In this paper, we present a solution based on interpretable decision tree and regression trees for estimating the popularity of a given Docker image, and for understanding how to improve an image to increase its popularity. The results presented in this work can provide valuable insights to Docker developers, helping them in spreading their images. Code related to this paper is available at: https://github.com/di-unipi-socc/DockerImageMiner.DOI: 10.1007/978-3-030-10997-4_13
Project(s): SoBigData via OpenAIRE

Metrics:

See at: doi.org Restricted | CNR IRIS | CNR IRIS | link.springer.com

2020 Conference article Open Access

Black box explanation by learning image exemplars in the latent feature space
Guidotti R, Monreale A, Matwin S, Pedreschi D
We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first generates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to become counter-factuals by "morphing" into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Besides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability.DOI: 10.1007/978-3-030-46150-8_12
DOI: 10.48550/arxiv.2002.03746
Project(s): AI4EU via OpenAIRE

, Track and Know via OpenAIRE

, PRO-RES

, SoBigData via OpenAIRE

Metrics:

2020 Conference article Restricted

Global explanations with local scoring
Setzu M, Guidotti R, Monreale A, Turini F
Artificial Intelligence systems often adopt machine learning models encoding complex algorithms with potentially unknown behavior. As the application of these "black box" models grows, it is our responsibility to understand their inner working and formulate them in human-understandable explanations. To this end, we propose a rule-based model-agnostic explanation method that follows a local-to-global schema: it generalizes a global explanation summarizing the decision logic of a black box starting from the local explanations of single predicted instances. We define a scoring system based on a rule relevance score to extract global explanations from a set of local explanations in the form of decision rules. Experiments on several datasets and black boxes show the stability, and low complexity of the global explanations provided by the proposed solution in comparison with baselines and state-of-the-art global explainers.Source: COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE (PRINT), pp. 159-171. Würzburg, Germany, 16-20 September, 2019
DOI: 10.1007/978-3-030-43823-4_14
Project(s): AI4EU via OpenAIRE

, Track and Know via OpenAIRE

, PRO-RES

, XAI

, SoBigData via OpenAIRE

Metrics:

See at: Communications in Computer and Information Science Restricted | CNR IRIS | CNR IRIS | link.springer.com

2023 Journal article Open Access

Solving imbalanced learning with outlier detection and features reduction
Lusito S, Pugnana A, Guidotti R
A critical problem for several real world applications is class imbalance. Indeed, in contexts like fraud detection or medical diagnostics, standard machine learning models fail because they are designed to handle balanced class distributions. Existing solutions typically increase the rare class instances by generating synthetic records to achieve a balanced class distribution. However, these procedures generate not plausible data and tend to create unnecessary noise. We propose a change of perspective where instead of relying on resampling techniques, we depend on unsupervised features engineering approaches to represent records with a combination of features that will help the classifier capturing the differences among classes, even in presence of imbalanced data. Thus, we combine a large array of outlier detection, features projection, and features selection approaches to augment the expressiveness of the dataset population. We show the effectiveness of our proposal in a deep and wide set of benchmarking experiments as well as in real case studies.Source: MACHINE LEARNING
DOI: 10.1007/s10994-023-06448-0
Project(s): SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: CNR IRIS Open Access | link.springer.com | ISTI Repository | CNR IRIS Restricted

2023 Conference article Restricted

Explaining black-boxes in federated learning
Corbucci L, Guidotti R, Monreale A
Federated Learning has witnessed increasing popularity in the past few years for its ability to train Machine Learning models in critical contexts, using private data without moving them. Most of the work in the literature proposes algorithms and architectures for training neural networks, which although they present high performance in different predicting tasks and are easy to be learned with a cooperative mechanism, their predictive reasoning is obscure. Therefore, in this paper, we propose a variant of SHAP, one of the most widely used explanation methods, tailored to Horizontal server-based Federated Learning. The basic idea is having the possibility to explain an instance's prediction performed by the trained Machine Leaning model as an aggregation of the explanations provided by the clients participating in the cooperation. We empirically test our proposal on two different tabular datasets, and we observe interesting and encouraging preliminary results.Source: COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE (PRINT), pp. 151-163. Lisbon, Portugal, 26-28/07/2023
DOI: 10.1007/978-3-031-44067-0_8
Project(s): TAILOR via OpenAIRE

, XAI

, SoBigData-PlusPlus via OpenAIRE

, Humane AI via OpenAIRE

Metrics:

See at: doi.org Restricted | CNR IRIS | CNR IRIS | link.springer.com

2022 Conference article Restricted

Effect of different encodings and distance functions on quantum instance-based classifiers
Berti A., Bernasconi A., Del Corso G. M., Guidotti R.
In the last years, we have witnessed the increasing usage of machine learning technologies. In parallel, we have observed the raise of quantum computing, a paradigm for computing making use of quantum theory. Quantum computing can empower machine learning with theoretical properties allowing to overcome the limitations of classical computing. The translation of classical algorithms into their quantum counter-part is not trivial and hides many difficulties. We illustrate and implement alternatives for the quantum nearest neighbor classifier focusing on the challenges related to data preparation and their effect on the performance. We show that, with certain data preparation strategies, quantum algorithms are comparable with the classic version, yet allowing for a theoretical reduction of the complexity for distances calculation.Source: LECTURE NOTES IN ARTIFICIAL INTELLIGENCE, vol. 13281, pp. 96-108. CHENGDU, CHINA, 16-19/05/2022
DOI: 10.1007/978-3-031-05936-0_8
Metrics:

2021 Conference article Restricted

Explaining any time series classifier
Guidotti R., Monreale A., Spinnato F., Pedreschi D., Giannotti F.
We present a method to explain the decisions of black box models for time series classification. The explanation consists of factual and counterfactual shapelet-based rules revealing the reasons for the classification, and of a set of exemplars and counter-exemplars highlighting similarities and differences with the time series under analysis. The proposed method first generates exemplar and counter-exemplar time series in the latent feature space and learns a local latent decision tree classifier. Then, it selects and decodes those respecting the decision rules explaining the decision. Finally, it learns on them a shapelet-tree that reveals the parts of the time series that must, and must not, be contained for getting the returned outcome from the black box. A wide experimentation shows that the proposed method provides faithful, meaningful and interpretable explanations.DOI: 10.1109/cogmi50398.2020.00029
Project(s): AI4EU via OpenAIRE

, TAILOR

, XAI

, SoBigData-PlusPlus via OpenAIRE

Metrics:

2020 Journal article Open Access

Evaluating local explanation methods on ground truth
Guidotti R.
Evaluating local explanation methods is a difficult task due to the lack of a shared and universally accepted definition of explanation. In the literature, one of the most common ways to assess the performance of an explanation method is to measure the fidelity of the explanation with respect to the classification of a black box model adopted by an Artificial Intelligent system for making a decision. However, this kind of evaluation only measures the degree of adherence of the local explainer in reproducing the behavior of the black box classifier with respect to the final decision. Therefore, the explanation provided by the local explainer could be different in the content even though it leads to the same decision of the AI system. In this paper, we propose an approach that allows to measure to which extent the explanations returned by local explanation methods are correct with respect to a synthetic ground truth explanation. Indeed, the proposed methodology enables the generation of synthetic transparent classifiers for which the reason for the decision taken, i.e., a synthetic ground truth explanation, is available by design. Experimental results show how the proposed approach allows to easily evaluate local explanations on the ground truth and to characterize the quality of local explanation methods . (c) 2020 The Author. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).Source: ARTIFICIAL INTELLIGENCE, vol. 291
DOI: 10.1016/j.artint.2020.103428
Project(s): AI4EU via OpenAIRE

, TAILOR

, HumanE-AI-Net via OpenAIRE

, XAI

, SoBigData-PlusPlus via OpenAIRE

Metrics:

2020 Conference article Open Access

Interpretable next basket prediction boosted with representative recipes
Guidotti R., Viotto S.
Food is an essential element of our lives, cultures, and a crucial part of human experience. The study of food purchases can drive the design of practical services such as next basket predictor and shopping list reminder. Current approaches aimed at realizing these services do not exploit a contextual dimension involving food, i.e., recipes. To this aim, we design a next basket predictor based on representative recipes able to exploit the interest of customers towards certain ingredients when making the recommendation. The proposed method first identifies the representative recipes of a customer by analyzing her purchases and then estimates the rating of the items for the prediction. The ratings are based on both the purchases and the ingredients of the representative recipes. In addition, through our method, it is easy to justify why a specific set of items is predicted while such explanations are often not easily available in many other effective but opaque recommenders. Experimentation on a real-world dataset shows that the usage of recipes leverages the performance of existing next basket predictors.DOI: 10.1109/cogmi50398.2020.00018
Project(s): AI4EU via OpenAIRE

, TAILOR

, SoBigData-PlusPlus via OpenAIRE

Metrics:

2020 Conference article Open Access

Explaining image classifiers generating exemplars and counter-exemplars from latent representations
Guidotti R., Monreale A., Matwin S., Pedreschi D.
We present an approach to explain the decisions of black box image classifiers through synthetic exemplar and counterexemplar learnt in the latent feature space. Our explanation method exploits the latent representations learned through an adversarial autoencoder for generating a synthetic neighborhood of the image for which an explanation is required. A decision tree is trained on a set of images represented in the latent space, and its decision rules are used to generate exemplar images showing how the original image can be modified to stay within its class. Counterfactual rules are used to generate counter-exemplars showing how the original image can "morph"into another class. The explanation also comprehends a saliency map highlighting the areas that contribute to its classification, and areas that push it into another class. A wide and deep experimental evaluation proves that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability, besides providing the most useful and interpretable explanations.DOI: 10.1609/aaai.v34i09.7116
Project(s): AI4EU via OpenAIRE

, PRO-RES

, SoBigData via OpenAIRE

, Humane AI via OpenAIRE

Metrics:

See at: CNR IRIS Open Access | ojs.aaai.org | CNR IRIS Restricted

2024 Conference article Open Access

Generative model for decision trees
Guidotti R., Monreale A., Setzu M., Volpi G.
Decision trees are among the most popular supervised models due to their interpretability and knowledge representation resembling human reasoning. Commonly-used decision tree induction algorithms are based on greedy top-down strategies. Although these approaches are known to be an efficient heuristic, the resulting trees are only locally optimal and tend to have overly complex structures. On the other hand, optimal decision tree algorithms attempt to create an entire decision tree at once to achieve global optimality. We place our proposal between these approaches by designing a generative model for decision trees. Our method first learns a latent decision tree space through a variational architecture using pre-trained decision tree models. Then, it adopts a genetic procedure to explore such latent space to find a compact decision tree with good predictive performance. We compare our proposal against classical tree induction methods, optimal approaches, and ensemble models. The results show that our proposal can generate accurate and shallow, i.e., interpretable, decision trees.Source: PROCEEDINGS OF THE ... AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, vol. 38 (issue 19), pp. 21116-21124. Vancouver, Canada, 20-27/02/2024
DOI: 10.1609/aaai.v38i19.30104
Project(s): TAILOR via OpenAIRE

, Future Artificial Intelligence Research, HumanE-AI-Net via OpenAIRE

, TANGO

, MIMOSA, XAI via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

, Strengthening the Italian RI for Social Mining and Big Data Analytics
Metrics:

2022 Journal article Restricted

Exploiting auto-encoders for explaining black-box classifiers
Guidotti R.
Recent years have witnessed the rise of accurate but obscure classification models that hide the logic of their internal decision processes. In this paper, we present a framework to locally explain any type of black-box classifiers working on any data type through a rule-based model. In the literature already exists local explanation approaches able to accomplish this task. However, they suffer from a significant limitation that implies representing data as a binary vectors and constraining the local surrogate model to be trained on synthetic instances that are not representative of the real world. We overcome these deficiencies by using autoencoder-based approaches. The proposed framework first allows to generate synthetic instances in the latent feature space and learn a latent decision tree classifier. After that, it selects and decodes the synthetic instances respecting local decision rules. Independently from the data type under analysis, such synthetic instances belonging to different classes can unveil the reasons for the classification. Also, depending on the data type, they can be exploited to provide the most useful kind of explanation. Experiments show that the proposed framework advances the state-of-the-art towards a comprehensive and widely usable approach that is able to successfully guarantee various properties besides interpretability.Source: INTELLIGENZA ARTIFICIALE, vol. 16 (issue 1), pp. 115-129
DOI: 10.3233/ia-220139
Metrics:

See at: content.iospress.com Restricted | Intelligenza Artificiale | Archivio della Ricerca - Università di Pisa | IRIS Cnr | CNR IRIS