2020
Journal article
Open Access
An ethico-legal framework for social data science
Forgó N, Hänold S, Van Den Hoven J, Krügel T, Lishchuk I, Mahieu R, Monreale A, Pedreschi D, Pratesi F, Van Putten DThis paper presents a framework for research infrastructures enabling ethically sensitive and legally compliant data science
in Europe. Our goal is to describe how to design and implement an open platform for big data social science, including, in
particular, personal data. To this end, we discuss a number of infrastructural, organizational and methodological principles to
be developed for a concrete implementation. These include not only systematically tools and methodologies that effectively
enable both the empirical evaluation of the privacy risk and data transformations by using privacy-preserving approaches, but
also the development of training materials (a massive open online course) and organizational instruments based on legal and
ethical principles. This paper provides, by way of example, the implementation that was adopted within the context of the
SoBigData Research Infrastructure.Source: INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, vol. 11, pp. 377-390
DOI: 10.1007/s41060-020-00211-7Project(s): SoBigData 
,
SoBigData-PlusPlus
Metrics:
See at:
Vrije Universiteit Brussel Research Portal
| CNR IRIS
| ISTI Repository
| NARCIS
| CNR IRIS
2022
Book
Open Access
IAIL 2022 - Imagining the AI Landscape after the AI Act
Dushi D, Naretto F, Panigutti C, Pratesi FWe summarize the first Workshop on Imagining the AI Landscape after the AI Act (IAIL 2022), co-located with 1st International Conference on Hybrid Human-Artificial Intelligence (HHAI 2022), held on June 13, 2022 in Amsterdam, Netherlands.Source: CEUR WORKSHOP PROCEEDINGS, vol. 3221
Project(s): CoHuBiCoL 
,
TAILOR 
,
HumanE-AI-Net 
,
SoBigData-PlusPlus 
See at:
ceur-ws.org
| CNR IRIS
| ISTI Repository
| CNR IRIS
2013
Conference article
Restricted
Privacy-aware distributed mobility data analytics
Pratesi F, Monreale A, Wang H, Rinzivillo S, Pedreschi D, Andrienko G, Andrienko NWe propose an approach to preserve privacy in an analytical process- ing within a distributed setting, and tackle the problem of obtaining aggregated information about vehicle traffic in a city from movement data collected by in- dividual vehicles and shipped to a central server. Movement data are sensitive because they may describe typical movement behaviors and therefore be used for re-identification of individuals in a database. We provide a privacy-preserving framework for movement data aggregation based on trajectory generalization in a distributed environment. The proposed solution, based on the differential pri- vacy model and on sketching techniques for efficient data compression, provides a formal data protection safeguard. Using real-life data, we demonstrate the ef- fectiveness of our approach also in terms of data utility preserved by the data transformation.Project(s): LIFT 
See at:
CNR IRIS
| CNR IRIS
2013
Contribution to book
Restricted
Privacy-preserving Distributed Movement Data Aggregation
Monreale A, Wang Wh, Pratesi F, Rinzivillo S, Pedreschi D, Andrienko G, Andrienko NWe propose a novel approach to privacy-preserving analytical processing within a distributed setting, and tackle the problem of obtaining aggregated information about vehicle traffic in a city from movement data collected by individual vehicles and shipped to a central server. Movement data are sensitive because people's whereabouts have the potential to reveal intimate personal traits, such as religious or sexual preferences, and may allow re-identification of individuals in a database. We provide a privacy-preserving framework for movement data aggregation based on trajectory generalization in a distributed environment. The proposed solution, based on the differential privacy model and on sketching techniques for efficient data compression, provides a formal data protection safeguard. Using real-life data, we demonstrate the effectiveness of our approach also in terms of data utility preserved by the data transformation.Source: LECTURE NOTES IN GEOINFORMATION AND CARTOGRAPHY, pp. 225-245
DOI: 10.1007/978-3-319-00615-4_13Project(s): DATA SIM
Metrics:
See at:
doi.org
| CNR IRIS
| CNR IRIS
| link.springer.com
2013
Other
Restricted
Differential privacy in distributed mobility analytics
Monreale A, Wang Wh, Pratesi F, Rinzivillo S, Pedreschi D, Andrienko G, Andrienko NMovement data are sensitive, because people's whereabouts may allow re- identification of individuals in a de-identified database and thus can potentially reveal intimate personal traits, such as religious or sexual preferences. In this paper, we focus on a distributed setting in which movement data from individual vehicles are collected and aggregated by a centralized station. We propose a novel approach to privacy-preserving analytical processing within such a distributed setting, and tackle the problem of obtaining aggregated traffic information while preventing privacy leakage from data collection and aggregation. We study and analyze three different solutions based on the differential privacy model and on sketching techniques for efficient data compression. Each solution achieves different trade-off between privacy protection and utility of the transformed data. Using real-life data, we demonstrate the effectiveness of our approaches in terms of data utility preserved by the data transformation, thus bringing empirical evidence to the fact that the "privacy-by-design" paradigm in big data analytics has the potential of delivering high data protection combined with high quality even in massively distributed techno-social systems.
See at:
CNR IRIS
| CNR IRIS
2018
Conference article
Open Access
Privacy Preserving Multidimensional Profiling
Pratesi F, Monreale A, Giannotti F, Pedreschi DRecently, big data had become central in the analysis of human behavior and the development of innovative services. In particular, a new class of services is emerging, taking advantage of different sources of data, in order to consider the multiple aspects of human beings. Unfortunately, these data can lead to re-identification problems and other privacy leaks, as diffusely reported in both scientific literature and media. The risk is even more pressing if multiple sources of data are linked together since a potential adversary could know information related to each dataset. For this reason, it is necessary to evaluate accurately and mitigate the individual privacy risk before releasing personal data. In this paper, we propose a methodology for the first task, i.e., assessing privacy risk, in a multidimensional scenario, defining some possible privacy attacks and simulating them using real-world datasets.Source: LECTURE NOTES OF THE INSTITUTE FOR COMPUTER SCIENCES, SOCIAL INFORMATICS AND TELECOMMUNICATIONS ENGINEERING, pp. 142-152. Pisa, Italy, 29-30/11/2017
DOI: 10.1007/978-3-319-76111-4_15Project(s): SoBigData
Metrics:
See at:
CNR IRIS
| link.springer.com
| ISTI Repository
| Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
| CNR IRIS
| Archivio istituzionale della Ricerca - Scuola Normale Superiore
2022
Journal article
Open Access
Where do migrants and natives belong in a community: a Twitter case study and privacy risk analysis
Kim J, Pratesi F, Rossetti G, Sîrbu A, Giannotti FToday, many users are actively using Twitter to express their opinions and to share information. Thanks to the availability of the data, researchers have studied behaviours and social networks of these users. International migration studies have also benefited from this social media platform to improve migration statistics. Although diverse types of social networks have been studied so far on Twitter, social networks of migrants and natives have not been studied before. This paper aims to fill this gap by studying characteristics and behaviours of migrants and natives on Twitter. To do so, we perform a general assessment of features including profiles and tweets, and an extensive network analysis on the network. We find that migrants have more followers than friends. They have also tweeted more despite that both of the groups have similar account ages.
More interestingly, the assortativity scores showed that users tend to connect based on nationality more than country of residence, and this is more the case for migrants than natives. Furthermore, both natives and migrants tend to connect mostly with natives. The homophilic behaviours of users are also well reflected in the communities that we detected. Our additional privacy risk analysis showed that Twitter data can be safely used without exposing sensitive information of the users, and minimise risk of re-identification, while respecting GDPR.Source: SOCIAL NETWORK ANALYSIS AND MINING, vol. 13 (issue 15)
DOI: 10.1007/s13278-022-01017-0Project(s): SoBigData-PlusPlus
Metrics:
See at:
CNR IRIS
| link.springer.com
| ISTI Repository
| CNR IRIS
2022
Contribution to book
Open Access
Ethics in smart information systems
Pratesi F, Trasarti R, Giannotti FThis chapter analyses some of the ethical implications of recent developments in artificial intelligence (AI), data mining, machine learning and robotics. In particular, we start summarising the more consolidated issues and solutions related to privacy in data management systems, moving towards the novel concept of explainability. The chapter reviews the development of the right to privacy and the right to explanation, culminated in the General Data Protection Regulation. However, the new kinds of big data (such as internet logs or GPS tracking) require a different approach to managing privacy requirements. Several solutions have been developed and will be reviewed here. Our view is that generally data protection must be considered from the beginning as novel AI solutions are developing using the Privacy-by-Design paradigm. This involves a shift in perspective away from remedying problems to trying to prevent them, instead. We conclude by covering the main requirements necessary to achieve a trustworthy scenario, as advised also by the European Commission. A step in the direction towards Trustworthy AI was achieved in the Ethics Guidelines for Trustworthy Artificial Intelligence produced by an expert group for the European Commission. The key elements in these guidelines will reviewed in this chapter. To ensure European independence and leadership, we must invest wisely by bundling, connecting and opening our AI resources while also having in mind ethical priorities, such as transparency and fairness.DOI: 10.51952/9781447363972.ch009DOI: 10.56687/9781447363972-012DOI: 10.2307/j.ctv2tbwqd5.14Project(s): TAILOR 
,
PRO-RES 
,
SoBigData-PlusPlus
Metrics:
See at:
bristoluniversitypressdigital.com
| doi.org
| doi.org
| CNR IRIS
| ISTI Repository
| doi.org
| CNR IRIS
2024
Conference article
Open Access
Operationalizing the fundamental rights impact assessment for AI systems: the FRIA project
Savella R., Pratesi F., Trasarti R., Gatt L., Gaeta M. C., Caggiano I. A., Aulino L., Troisi E., Izzo L.This paper presents the FRIA Project, a multidisciplinary research study which connects the legal and ethical aspects related to the impact on fundamental rights of Artificial Intelligence systems and the technical issues that arise in the creation of an automated tool for the Fundamental Rights Impact Assessment, which is the ultimate objective of this work.Project(s): SoBigData-PlusPlus 
See at:
CNR IRIS
| ital-ia2024.it
| CNR IRIS
2013
Other
Restricted
Privacy-by-design in big data analytics and social mining
Monreale A, Rinzivillo S, Pratesi F, Giannotti F, Pedreschi DPrivacy is ever-growing concern in our society: the lack of reliable privacy safeguards in many current services and devices is the basis of a diffusion that is often more limited than expected. Moreover, people feel reluctant to provide true personal data, unless it is absolutely necessary. Thus, privacy is becoming a fundamental aspect to take into account when one wants to use, publish and analyze data involving sensitive information. Unfortunately, it is increasingly hard to transform the data in a way that it protects sensitive information: we live in the era of big data characterized by unprecedented opportunities to sense, store and analyze social data describing human activities in great detail and resolution. As a result privacy preservation simply cannot be accomplished by de-identification. In this paper, we propose the privacy-by-design paradigm to develop technological frameworks for countering the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities of social mining and big data analytical technologies. Our main idea is to inscribe privacy protection into the knowledge discovery technology by design, so that the analysis incorporates the relevant privacy requirements from the start.
See at:
CNR IRIS
| CNR IRIS
2016
Other
Restricted
PRISQUIT: a system for assessing privacy risk versus quality in data sharing
Pratesi F, Monreale A, Trasarti R, Giannotti F, Pedreschi D, Yanagihara TData describing human activities are an important source of knowledge useful for understanding individual and collective behavior and for developing a wide range of user services. Unfortunately, this kind of data is sensitive, because people's whereabouts may allow re-identification of individuals in a de-identified database. Therefore, Data Providers, before sharing those data, must apply any sort of anonymization to lower the privacy risks, but they must be aware and capable of controlling also the data quality, since these two factors are often a trade-off. In this paper we propose PRISQUIT (Privacy RISk versus QUalITy), a system enabling a privacy-aware ecosystem for sharing personal data. It is based on a methodology for assessing both the empirical (not theoretical) privacy risk associated to users represented in the data, and the data quality guaranteed only with users not at risk. Our proposal is able to support the Data Provider in the exploration of a repertoire of possible data transformations with the aim of selecting one specific transformation that yields an adequate trade-off between data quality and privacy risk. We study the practical effectiveness of our proposal over three data formats underlying many services, defined on real mobility data, i.e., presence data, trajectory data and road segment data.Project(s): SoBigData 
See at:
CNR IRIS
| CNR IRIS
2017
Journal article
Open Access
A data mining approach to assess privacy risk in human mobility data
Pellungrini R, Pappalardo L, Pratesi F, Monreale AHuman mobility data are an important proxy to understand human mobility dynamics, develop analytical services, and design mathematical models for simulation and what-if analysis. Unfortunately mobility data are very sensitive since they may enable the re-identification of individuals in a database. Existing frameworks for privacy risk assessment provide data providers with tools to control and mitigate privacy risks, but they suffer two main shortcomings: (i) they have a high computational complexity; (ii) the privacy risk must be recomputed every time new data records become available and for every selection of individuals, geographic areas, or time windows. In this article, we propose a fast and flexible approach to estimate privacy risk in human mobility data. The idea is to train classifiers to capture the relation between individual mobility patterns and the level of privacy risk of individuals. We show the effectiveness of our approach by an extensive experiment on real-world GPS data in two urban areas and investigate the relations between human mobility patterns and the privacy risk of individuals.Source: ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY (PRINT), vol. 9 (issue 3), pp. 31:1-31:27
DOI: 10.1145/3106774Project(s): SoBigData
Metrics:
See at:
ACM Transactions on Intelligent Systems and Technology
| doi.acm.org
| Archivio della Ricerca - Università di Pisa
| CNR IRIS
| ISTI Repository
| ACM Transactions on Intelligent Systems and Technology
| CNR IRIS