2013
Contribution to book
Restricted
Anonymity: a comparison between the legal and computer science perspectives
Mascetti S, Monreale A, Ricci A, Gerino APrivacy preservation has emerged as a major challenge in ICT. One possible solution for enforcing privacy is to guarantee anonymity. Indeed, ac- cording to international regulations, no restriction is applied to the handling of anonymous data. Consequently, in the past years the notion of anonymity has been extensively studied by two different communities: Law researchers and professionals that propose definitions of privacy regulations, and Computer Scientists attempting to provide technical solutions for enforcing the legal re- quirements. In this contribution we address the problem with an interdisciplinary approach, in the aim to encourage the reciprocal understanding and collaboration between researchers in the two areas. To achieve this, we compare the different notions of anonymity provided in the European data protection Law with the formal models proposed in Computer Science. This analysis allows us to identify the main similarities and differences between the two points of view, hence high- lighting the need for a joint research effort.
See at:
CNR IRIS | CNR IRIS | CNR IRIS | link.springer.com
2010
Contribution to book
Restricted
Anonymity technologies for privacy-preserving data publishing and mining
Monreale A, Pedreschi D, Pensa R GData mining is gaining momentum in society, due to the ever increasing availability of large amounts of data, easily gathered by a variety of collection technologies and stored via computer systems. Data mining is the key step in the process of Knowledge Discovery in Databases, the so-called KDD pro- cess. The knowledge discovered in data by means of sophisticated data mining techniques is leading to a new generation of personalized intelligent services. The dark side of this story is that the very same collection technologies gather personal, often sensitive, data, so that the opportunities of discovering knowl- edge increase hand in hand with the risks of privacy violation.
See at:
CNR IRIS | CNR IRIS | www.crcpress.com
2014
Journal article
Restricted
Anonymity preserving sequential pattern mining
Monreale A, Pedreschi D, Pensa Rg, Pinelli FThe increasing availability of personal data of a sequential nature, such as time-stamped transaction or location data, enables increasingly sophisticated sequential pattern mining techniques. However, privacy is at risk if it is possible to reconstruct the identity of individuals from sequential data. Therefore, it is important to develop privacy-preserving techniques that support publishing of really anonymous data, without altering the analysis results significantly. In this paper we propose to apply the Privacy-by-design paradigm for designing a technological framework to counter the threats of undesirable, unlawful effects of privacy violation on sequence data, without obstructing the knowledge discovery opportunities of data mining technologies. First, we introduce a k-anonymity framework for sequence data, by defining the sequence linking attack model and its associated countermeasure, a k-anonymity notion for sequence datasets, which provides a formal protection against the attack. Second, we instantiate this framework and provide a specific method for constructing the k-anonymous version of a sequence dataset, which preserves the results of sequential pattern mining, together with several basic statistics and other analytical properties of the original data, including the clustering structure. A comprehensive experimental study on realistic datasets of process-logs, web-logs and GPS tracks is carried out, which empirically shows how, in our proposed method, the protection of privacy meets analytical utility. © 2014 Springer Science+Business Media Dordrecht.Source: ARTIFICIAL INTELLIGENCE AND LAW (DORDR., PRINT), vol. 22 (issue 2), pp. 141-173
See at:
CNR IRIS | CNR IRIS | link.springer.com
2017
Contribution to book
Open Access
Personal Analytics and Privacy. An Individual and Collective Perspective: First International Workshop, PAP 2017, Held in Conjunction with ECML PKDD 2017, Skopje, Macedonia, September 18, 2017, Revised Selected Papers
Guidotti R, Monreale A, Pedreschi D, Abiteboul SThis book constitutes the thoroughly refereed post-conference proceedings of the First International Workshop on Personal Analytics and Privacy, PAP 2017, held in Skopje, Macedonia, in September 2017. The 14 papers presented together with 2 invited talks in this volume were carefully reviewed and selected for inclusion in this book and handle topics such as personal analytics, personal data mining and privacy in the context where real individual data are used for developing a data-driven service, for realizing a social study aimed at understanding nowadays society, and for publication purposes.Project(s): SoBigData
See at:
CNR IRIS | ISTI Repository | www.springer.com | CNR IRIS | CNR IRIS
2022
Conference article
Restricted
Uncovering student temporal learning patterns
Rotelli D., Monreale A., Guidotti R.Because of the flexibility of online learning courses, students organise and manage their own learning time by choosing where, what, how, and for how long they study. Each individual has their unique learning habits that characterise their behaviours and distinguish them from others. Nonetheless, to the best of our knowledge, the temporal dimension of student learning has received little attention on its own. Typically, when modelling trends, a chosen configuration is set to capture various habits, and a cluster analysis is undertaken. However, the selection of variables to observe and the algorithm used to conduct the analysis is a subjective process that reflects the researcher's thoughts and ideas. To explore how students behave over time, we present alternative ways of modelling student temporal behaviour. Our real-world data experiments reveal that the generated clusters may or may not differ based on the selected profile and unveil different student learning patterns.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 13450, pp. 340-353. Tolouse, France, 12-16/09/2022
See at:
CNR IRIS | CNR IRIS | link.springer.com
2021
Conference article
Restricted
Designing shapelets for interpretable data-agnostic classification
Guidotti R., Monreale A.Time series shapelets are discriminatory subsequences which are representative of a class, and their similarity to a time series can be used for successfully tackling the time series classification problem. The literature shows that Artificial Intelligence (AI) systems adopting classification models based on time series shapelets can be interpretable, more accurate, and significantly fast. Thus, in order to design a data-agnostic and interpretable classification approach, in this paper we first extend the notion of shapelets to different types of data, i.e., images, tabular and textual data. Then, based on this extended notion of shapelets we propose an interpretable data-agnostic classification method. Since the shapelets discovery can be time consuming, especially for data types more complex than time series, we exploit a notion of prototypes for finding candidate shapelets, and reducing both the time required to find a solution and the variance of shapelets. A wide experimentation on datasets of different types shows that the data-agnostic prototype-based shapelets returned by the proposed method empower an interpretable classification which is also fast, accurate, and stable. In addition, we show and we prove that shapelets can be at the basis of explainable AI methods.Project(s): SoBigData-PlusPlus
See at:
dl.acm.org | CNR IRIS | CNR IRIS
2020
Conference article
Restricted
Data-agnostic local neighborhood generation
Guidotti R., Monreale A.Synthetic data generation has been widely adopted in software testing, data privacy, imbalanced learning, machine learning explanation, etc. In such contexts, it is important to generate data samples located within 'local' areas surrounding specific instances. Local synthetic data can help the learning phase of predictive models, and it is fundamental for methods explaining the local behavior of obscure classifiers. The contribution of this paper is twofold. First, we introduce a method based on generative operators allowing the synthetic neighborhood generation by applying specific perturbations on a given input instance. The key factor consists in performing a data transformation that makes applicable to any type of data, i.e., data-agnostic. Second, we design a framework for evaluating the goodness of local synthetic neighborhoods exploiting both supervised and unsupervised methodologies. A deep experimentation shows the effectiveness of the proposed method.Project(s): SoBigData-PlusPlus
See at:
CNR IRIS | ieeexplore.ieee.org | CNR IRIS
2024
Journal article
Open Access
Efficiency boosts in human mobility data privacy risk assessment: advancements within the PRUDEnce framework
Gomes F. O., Pellungrini R., Monreale A., Renso C., Martina J. E.With the exponential growth of mobility data generated by IoT, social networks, and mobile devices, there is a pressing need to address privacy concerns. Our work proposes methods to reduce the computation of privacy risk evaluation on mobility datasets, focusing on reducing background knowledge configurations and matching functions, and enhancing code performance. Leveraging the unique characteristics of trajectory data, we aim to minimize the size of combination sets and directly evaluate risk for trajectories with distinct values. Additionally, we optimize efficiency by storing essential information in memory to eliminate unnecessary computations. These approaches offer a more efficient and effective means of identifying and addressing privacy risks associated with diverse mobility datasets.Source: APPLIED SCIENCES, vol. 14 (issue 17)
Project(s): Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—, PNRR-M4C2-Investimento 1.3, Partenariato Esteso PE00000013-“FAIR-Future Artificial Intelligence Research”-Spoke 1 “Human-centered AI”, funded by the, SoBigData.it
See at:
CNR IRIS | www.mdpi.com | CNR IRIS
2010
Conference article
Restricted
As time goes by: discovering eras in evolving social networks
Berlingerio M, Coscia M, Giannotti F, Monreale A, Pedreschi DWithin the large body of research in complex network analysis, an important topic is the temporal evolution of networks. Existing approaches aim at analyzing the evolution on the global and the local scale, extracting properties of either the entire network or local patterns. In this paper, we focus instead on detecting clusters of temporal snapshots of a network, to be interpreted as eras of evolution. To this aim, we introduce a novel hierarchical clustering methodology, based on a dissimilarity measure (derived from the Jaccard coefficient) between two temporal snapshots of the network. We devise a framework to discover and browse the eras, either in top-down or a bottom-up fashion, supporting the exploration of the evolution at any level of temporal resolution. We show how our approach applies to real networks, by detecting eras in an evolving co-authorship graph extracted from a bibliographic dataset; we illustrate how the discovered temporal clustering highlights the crucial moments when the network had profound changes in its structure. Our approach is finally boosted by introducing a meaningful labeling of the obtained clusters, such as the characterizing topics of each discovered era, thus adding a semantic dimension to our analysis.
See at:
CNR IRIS | CNR IRIS | www.springerlink.com
2010
Journal article
Restricted
Movement data anonymity through generalization
Monreale A, Andrienko G, Andrienko N, Giannotti F, Pedreschi D, Rinzivillo S, Wrobel SWireless networks and mobile devices, such as mobile phones and GPS receivers, sense and track the movements of people and vehicles, producing society-wide mobility databases. This is a challenging scenario for data analysis and mining. On the one hand, exciting opportunities arise out of discovering new knowledge about human mobile behavior, and thus fuel intelligent info-mobility applications. On other hand, new privacy concerns arise when mobility data are published. The risk is particularly high for GPS trajectories, which represent movement of a very high precision and spatio-temporal resolution: the de-identification of such trajectories (i.e., forgetting the ID of their associated owners) is only a weak protection, as generally it is possible to re-identify a person by ob- serving her routine movements. In this paper we propose a method for achieving true anonymity in a dataset of published trajectories, by defining a transformation of the original GPS trajectories based on spatial generalization and k-anonymity. The proposed method offers a formal data protection safeguard, quantified as a theoretical upper bound to the probability of re-identification. We conduct a thorough study on a real-life GPS trajectory dataset, and provide strong empirical evidence that the proposed anonymity techniques achieve the conflicting goals of data utility and data privacy. In practice, the achieved anonymity protection is much stronger than the theoretical worst case, while the quality of the cluster analysis on the trajectory data is preserved.Source: TRANSACTIONS ON DATA PRIVACY, vol. 3, pp. 91-121
See at:
CNR IRIS | CNR IRIS | www.tdp.cat
2008
Conference article
Restricted
Location prediction within the mobility data analysis environment Daedalus
Trasarti R, Monreale A, Pinelli F, Giannotti FIn this paper we propose a method to predict the next lo- cation of a moving object based on two recent results in GeoPKDD project: DAEDALUS, a mobility data analysis environment and Trajectory Pattern, a sequential pattern mining algorithm with temporal annotation integrated in DAEDALUS. The first one is a DMQL environment for mov- ing objects, where both data and patterns can be repre- sented. The second one extracts movement patterns as se- quences of movements between locations with typical travel times. This paper proposes a prediction method which uses the lo- cal models extracted by Trajectory Pattern to build a global model called Prediction Tree. The future location of a mov- ing object is predicted visiting the tree and calculating the best matching function. The integration within DAEDALUS system supports an in- teractive construction of the predictor on the top of a set of spatio-temporal patterns. Others proposals in literature base the definition of predic- tion methods for future location of a moving object on pre- viously extracted frequent patterns. They use the recent history of movements of the object itself and often use time only to order the events. Our work uses the movements of all moving objects in a certain area to learn a classifier built on the mined trajectory patterns, which are intrinsi- cally equipped with temporal information.
See at:
dl.acm.org | CNR IRIS | CNR IRIS
2009
Conference article
Restricted
WhereNext: a location predictor on trajectory pattern mining
Monreale A, Pinelli F, Trasarti R, Giannotti FThe pervasiveness of mobile devices and location based set-vices is leading to an increasing volume of mobility data. This side effect provides the opportunity for innovative methods that analyse the behaviors of movements. In this paper we propose WhereNext, which is a method aimed at predicting with a certain level of accuracy the next location of a moving object. The prediction uses previously extracted movement patterns named Trajectory Patterns, which are a concise representation of behaviors of moving objects as sequences of regions frequently visited with a typical travel time. A decision tree. named T-pattern Tree, is built and evaluated with a formal training and test process. The tree is learned from the Trajectory Patterns that hold a certain area and it may be used as a predictor of the next location of a new trajectory finding the best matching path in the tree. Three different best matching methods to classify a new moving object are proposed and their impact on the quality of prediction is studied extensively. Using Trajectory Patterns as predictive rules has the following implications: (I) the learning depends on the movement of all available objects in a certain area instead of on the individual history of an object; (II) the prediction tree intrinsically contains the spatio-temporal properties that have emerged from the data and this allows us to define matching methods that striclty depend on the properties of such Movements. In addition, we propose a set of other measures, that evaluate a, priori the predictive power of a set of Trajectory Patterns. This measures were tuned on a real life case study. Finally, all exhaustive set of experiments and results on the real dataset are presented.
See at:
dl.acm.org | CNR IRIS | CNR IRIS
2010
Conference article
Restricted
Location prediction through trajectory pattern mining
Monreale A, Pinelli F, Trasarti R, Giannotti FThe pervasiveness of mobile devices and location based services produces as side effects an increasing volume of mobility data which in turn create the opportunity for a novel generation of analysis methods of movements behaviors. In this paper, we propose a method WhereNext aimed at predicting with a certain accuracy the next location of a moving object. The prediction uses previously extracted movement patterns named Trajectory Pattern which are a concise representation of behaviors of moving objects as sequences of regions frequently visited with typical travel time. A decision tree, named T-pattern Tree, is built and evaluated with a formal training and test process. Using Trajectory Patterns as predictive rules has the following implications: (I) the learning depends by the movement of all available objects in a certain area instead by the individual history of an object; (II) the prediction tree intrinsically contains the spatio-temporal properties emerged from the data and this allows to define matching methods strongly depending on such movement properties. Finally an exhaustive set of experiments and results on the real dataset are presented.
See at:
CNR IRIS | CNR IRIS
2010
Conference article
Restricted
Towards discovery of eras in social networks
Berlingerio M, Coscia M, Giannotti F, Monreale A, Pedreschi DIn the last decades, much research has been devoted in topics related to Social Network Analysis. One important direction in this area is to analyze the temporal evolution of a network. So far, previous approaches analyzed this setting at both the global and the local level. In this paper, we focus on finding a way to detect temporal eras in an evolving network. We pose the basis for a general framework that aims at helping the analyst in browsing the temporal clusters both in a top-down and bottom-up way, exploring the network at any level of temporal details. We show the effectiveness of our approach to real data, by applying our proposed methodology to a co-authorship network extracted from a bibliographic dataset. Our first results are encouraging, and open the way for the definition and implementation of a general framework for discovering eras in evolving social networks.
See at:
CNR IRIS | CNR IRIS
2010
Conference article
Restricted
Preserving privacy in semantic-rich trajectories of human mobility
Monreale A, Trasarti R, Renso C, Pedreschi D, Bogorny VThe increasing abundance of data about the trajectories of personal movement is opening up new opportunities for an- alyzing and mining human mobility, but new risks emerge since it opens new ways of intruding into personal privacy. Representing the personal movements as sequences of places visited by a person during her/his movements - semantic trajectory - poses even greater privacy threats w.r.t. raw geometric location data. In this paper we propose a pri- vacy model defining the attack model of semantic trajectory linking, together with a privacy notion, called c-safety. This method provides an upper bound to the probability of in- ferring that a given person, observed in a sequence of non- sensitive places, has also stopped in any sensitive location. Coherently with the privacy model, we propose an algorithm for transforming any dataset of semantic trajectories into a c-safe one. We report a study on a real-life GPS trajec- tory dataset to show how our algorithm preserves interesting quality/utility measures of the original trajectories, such as sequential pattern mining results.
See at:
dl.acm.org | CNR IRIS | CNR IRIS
2010
Conference article
Restricted
Discovering Eras in Evolving Social Networks
Berlingerio M, Coscia M, Giannotti F, Monreale A, Pedreschi DAn important topic in complex network research is the temporal evolution of networks. Existing approaches aim at analyzing the evolution extracting properties of either the entire network or local patterns. In this paper, we focus on detecting clusters of temporal snapshots of a network, to be interpreted as eras of evolution. To this aim, we introduce a novel hierarchical clustering methodology, based on a dissimilarity measure between two temporal snapshots of the network. We devise a framework to discover and browse the eras, supporting the exploration of the evolution at any level of temporal resolution. We show how our approach applies to real networks, by detecting eras in an evolving co-authorship graph; we illustrate how the discovered temporal clustering highlights the crucial moments when the network had profound changes in its structure. Our approach is finally boosted by introducing a meaningful labeling of the obtained clusters, such as the characterizing topics of each discovered era, thus adding a semantic dimension to our analysis.
See at:
CNR IRIS | CNR IRIS
2009
Other
Open Access
Analysis of hubs in large multidimensional networks
Berlingerio M, Coscia M, Giannotti F, Monreale A, Pedreschi DHubs in complex networks are important nodes in terms of their connectivity to the whole network. In a mono-dimensional network, i.e., where only one kind of interaction is possible among nodes, the concept of hub has been widely studied, and it is at the basis of many important applications such as web search and epidemic outbreaks. However, in real world scenarios, networks are multidimensional, i.e., several possible kinds of connections exist among the nodes. In this setting, the concept of a hub should take into account the multiple dimensions, that can have varying influence on the connectivity of each node, and whose interplay can be relevant to assess the importance of an entity. In this paper, we tackle the problem of analyzing the relevance of dimensions for node connectivity, and how this relevance analysis can highlight hubs with peculiar, interesting behaviors in a large network. To this end, we consider the multidimensional generalization of the degree, namely the number of neighbors of a node, and a newly introduced class of measures, that we call Dimension Relevance. We show how to efficiently compute these simple measures on one of the possible representations of a multidimensional network, the multigraph. Moreover, we illustrate the usage of our new measures on two different real world networks: a word-word graph built on a search engine query log, and a popular large online social network, Flickr. In both cases, our proposed measures allow us to discover hubs for which one specific dimension is of high relevance and ensures a high connectivity of that node within the network. We advocate that the presented methodology covers a wide range of possible applications, from search engines to computer networks, from biological to social net works, where the interplay among different dimensions can really make the difference for the behavior of specific important entities.
See at:
ISTI Repository | CNR IRIS | CNR IRIS
2009
Other
Open Access
Privacy preserving outsourcing of association rule mining
Giannotti F, Lakshmanan L V, Monreale A, Pedreschi D, Wang HSpurred by developments such as cloud computing, there has been considerable recent interest in the paradigmof datamining-as-service. A company (data owner) lacking in expertise or computational resources can outsource its mining needs to a third party service provider (server). However, both the items in the outsourced database and the patterns of items that can be mined from the database, are considered as the corporate privacy of the data owner. To protect the corporate privacy, the data owner transforms its data and ships it to the server. The server sends extracted patterns to the owner in response to the latters mining queries. The owner recovers the true patterns from the extracted patterns received. In this paper, we study the problem of outsourcing the association rule mining task within a corporate privacy-preserving framework. We propose an attack model based on background knowledge and devise two schemes, namely Frugal and RobFrugal , for privacy-preserving outsourced mining, based on the concept of k-anonymity. The protection against the privacy violation attack comes from ensuring that each transformed item (itemset) is indistinguishable, w.r.t. the attacker's background knowledge, from at least k-1 other transformed items (itemsets). We show that the owner can recover the true patterns as well as their support by maintaining a compact synopsis. Finally, we empirically demonstrate using comprehensive experiments on a real transaction database, that our techniques and ideas are effective, scalable, and protect privacy.
See at:
CNR IRIS | ISTI Repository | CNR IRIS