2005
Conference article
Restricted
Speeding-up hierarchical agglomerative clustering in presence of expensive metrics
Nanni MIn several contexts and domains, hierarchical agglomerative clustering (HAC) offers best-quality results, but at the price of a high complexity which reduces the size of datasets which can be handled. In some contexts, in particular, computing distances between objects is the most expensive task. In this paper we propose a pruning heuristics aimed at improving performances in these cases, which is well integrated in all the phases of the HAC process and can be applied to two HAC variants: single-linkage and complete-linkage. After describing the method, we provide some theoretical evidence of its pruning power, followed by an empirical study of its effectiveness over different data domains, with a special focus on dimensionality issues.
See at:
CNR IRIS
| CNR IRIS
2010
Journal article
Open Access
Anonymization of moving objects databases by clustering and perturbation
Abul O, Bonchi F, Nanni MPreserving individual privacy when publishing data is a problem that is receiving increasing attention. Thanks to its simplicity the concept of k-anonymity, introduced by Samarati and Sweeney [1], established itself as one fundamental principle for privacy preserving data publishing. According to the k-anonymity principle, each release of data must be such that each individual is indistinguishable from at least k − 1 other individuals. In this article we tackle the problem of anonymization of moving objects databases. We propose a novel concept of k-anonymity based on co-localization, that exploits the inherent uncertainty of the moving object's whereabouts. Due to sampling and imprecision of the positioning systems (e.g., GPS) , the trajectory of a moving object is no longer a polyline in a three-dimensional space, instead it is a cylindrical volume, where its radius delta represents the possible location imprecision: we know that the trajectory of the moving object is within this cylinder, but we do not know exactly where. If another object moves within the same cylinder they are indistinguishable from each other. This leads to the definition of (k, delta)-anonymity for moving objects databases. We first characterize the (k, delta)-anonymity problem, then we recall NWA (Never Walk Alone), a method that we introduced in [2] based on clustering and spatial perturbation. Starting from a discussion on the limits of NWA we develop a novel clustering method that, being based on EDR distance [3], has the important feature of being time-tolerant. As a consequence it perturbs trajectories both in space and time. The novel method, named W4M(Wait for Me), is empirically shown to produce higher quality anonymization than NWA, at the price of higher computational requirements. Therefore, in order to make W4M scalable to large datasets, we introduce two variants based on a novel (and computationally cheaper) time-tolerant distance function, and on chunking. All the variants of W4M are empirically evaluated in terms of data quality and efficiency, and thoroughly compared to their predecessor NWA. Data quality is assessed both by means of objective measures of information distortion, and by more usability oriented measure, i.e., by comparing the results of (i) spatio-temporal range queries and (ii) frequent pattern mining, executed on the original database and on the (k,delta)-anonymized one. Experimental results over both real-world and synthetic mobility data confirm that, for a wide range of values of d and k, the relative distortion introduced by our anonymization methods is kept low. Moreover, the techniques introduced to make W4M scalable to large datasets, achieve their goal without giving up data quality in the anonymization process.Source: INFORMATION SYSTEMS, vol. 35 (issue 8), pp. 884-910
DOI: 10.1016/j.is.2010.05.003Project(s): Veri Yayınlamada Hassas Bilgi Gizleme
Metrics:
See at:
Aperta - TÜBİTAK Açık Arşivi
| Information Systems
| CNR IRIS
| CNR IRIS
| www.sciencedirect.com
2004
Conference article
Restricted
Mining literary texts using domain ontologies
Baglioni M, Nanni M, Giovannetti EThis paper describes a query system on texts and literary material with advanced information retrieval tools. As a test bed we chose the electronic version of Dante's Inferno, manually tagged using XML, enriched with a domain ontology describing the historical, social and cultural context represented as a separate XML document.
See at:
CNR IRIS
| CNR IRIS
2008
Other
Open Access
A constraint-based approach for multispace clustering
Pensa R G, Nanni MIn many applications, a set of objects can be represented by different points of view (universes). Beside numeric, ordinal and nominal features, objects may be represented using spatio-temporal information, sequences, and more complex structures (e.g., graphs). Learning from all these different spaces is challenging, since often di erent algorithms and metrics are needed. In the case of data clustering, a partitional, hierarchical or density-based algorithm is often well suited for a speci c type of data, but not for other ones. In this work we present a preliminary study on a framework that tries to link different clustering results by exploiting pairwise similarity constraints. We propose two algorithmic settings, and we present an application to a real-world dataset of trajectories.
See at:
CNR IRIS
| ISTI Repository
| CNR IRIS
2006
Software
Metadata Only Access
MiSTA v2.1
Mirco NanniAlgoritmo di estrazione di pattern sequenziali con annotazioni temporali (tempi tipici di transizione) basato sull'integrazione stretta di metodi prefix-projection-based per pattern sequenziali e metodi di stima di densità basati su kernel.
See at:
CNR IRIS
2006
Software
Metadata Only Access
See at:
CNR IRIS
2005
Other
Open Access
Hierarchical agglomerative clustering in presence of expensive metrics (Extended Tech. Rep.)
Nanni MIn several contexts and domains, hierarchical agglomerative clustering (HAC) offers best-quality results, but at the price of a high complexity which reduces the size of datasets which can be handled. In some contexts, in particular, computing distances between objects is the most expensive task. In all such situations the standard approach to HAC, which first computes all object-to-object distances and then performs the real clustering process, quickly yields high computational costs and large running times. One of the key means for containing such problem naturally lies in methods that can save a significant portion of distance computations, resulting in a smaller complexity. In this paper we propose a pruning heuristics well integrated in all the phases of the HAC process, developed for two HAC variants: single-linkage and complete-linkage. After describing the method, we provide some theoretical evidence of its pruning power, followed by an empirical study of its effectiveness over different data domains, with a special focus on dimensionality issues.
See at:
CNR IRIS
| ISTI Repository
| CNR IRIS
2006
Journal article
Open Access
Time-focused clustering of trajectories of moving objects
Nanni M, Pedreschi DSpatio-temporal, geo-referenced datasets are growing rapidly, and will be more in the near future, due to both technological and social/commercial reasons. From the data mining viewpoint, spatio-temporal trajectory data introduce new dimensions and, correspondingly, novel issues in performing the analysis tasks. In this paper, we consider the clustering problem applied to the trajectory data domain. In particular, we propose an adaptation of a density-based clustering algorithm to trajectory data based on a simple notion of distance between trajectories. Then, a set of experiments on synthesized data is performed in order to test the algorithm and to compare it with other standard clustering approaches. Finally, a new approach to the trajectory clustering problem, called temporal focussing, is sketched, having the aim of exploiting the intrinsic semantics of the temporal dimension to improve the quality of trajectory clustering.Source: JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, vol. 27 (issue 3), pp. 267-289
DOI: 10.1007/s10844-006-9953-7Metrics:
See at:
CNR IRIS
| ISTI Repository
| www.springerlink.com
| Journal of Intelligent Information Systems
| CNR IRIS
| CNR IRIS
2019
Journal article
Open Access
Car telematics big data analytics for insurance and innovative mobility services
Longhi L, Nanni MCar telematics is a large and growing business sector aiming to collect mobility-related data (mainly private and commercial vehicles) and to develop services of various nature both for individual citizens and other companies. Such services and applications include information systems to support car insurances, info-mobility services, ad hoc studies for planning purposes, etc. In this work we report and discuss some of the key challenges that a car telematics pilot application is facing within the EU project "Track and Know". The paper introduces the overall context, the main business goals identified as potentially beneficial of big data solutions and the type of data sources that such applications can rely on (in particular, those available within the project for experimental studies), then discusses initial results of the solutions developed so far and ongoing lines of research. In particular, the discussion will focus on the most relevant applications identified for the project purposes, namely new services for car insurance, electric vehicles mobility and car- and ride-sharing.Source: JOURNAL OF AMBIENT INTELLIGENCE & HUMANIZED COMPUTING (PRINT), vol. 11 (issue 10), pp. 3989-3999
DOI: 10.1007/s12652-019-01632-4Project(s): Track and Know
Metrics:
See at:
CNR IRIS
| link.springer.com
| ISTI Repository
| Journal of Ambient Intelligence and Humanized Computing
| CNR IRIS
| CNR IRIS
2020
Journal article
Open Access
Ranking places in attributed temporal urban mobility networks
Nanni M, Tortosa L, Vicent Jf, Yeghikyan GDrawing on the recent advances in complex network theory, urban mobility flow patterns, typically encoded as origin-destination (OD) matrices, can be represented as weighted directed graphs, with nodes denoting city locations and weighted edges the number of trips between them. Such a graph can further be augmented by node attributes denoting the various socio-economic characteristics at a particular location in the city. In this paper, we study the spatio-temporal characteristics of "hotspots"of different types of socio-economic activities as characterized by recently developed attribute-augmented network centrality measures within the urban OD network. The workflow of the proposed paper comprises the construction of temporal OD networks using two custom data sets on urban mobility in Rome and London, the addition of socio-economic activity attributes to the OD network nodes, the computation of network centrality measures, the identification of "hotspots"and, finally, the visualization and analysis of measures of their spatio-temporal heterogeneity. Our results show structural similarities and distinctions between the spatial patterns of different types of human activity in the two cities. Our approach produces simple indicators thus opening up opportunities for practitioners to develop tools for real-time monitoring and visualization of interactions between mobility and economic activity in cities.Source: PLOS ONE, vol. 15
DOI: 10.1371/journal.pone.0239319Project(s): Track and Know 
,
Track and Know
Metrics:
See at:
PLoS ONE
| PLoS ONE
| PLoS ONE
| Recolector de Ciencia Abierta, RECOLECTA
| CNR IRIS
| PLoS ONE
| journals.plos.org
| ISTI Repository
| CNR IRIS
2020
Conference article
Open Access
Crash prediction and risk assessment with individual mobility networks
Guidotti R, Nanni MThe massive and increasing availability of mobility data enables the study and the prediction of human mobility behavior and activities at various levels. In this paper, we address the problem of building a data-driven model for predicting car drivers' risk of experiencing a crash in the long-Term future, for instance, in the next four weeks. Since the raw mobility data, although potentially large, typically lacks any explicit semantics or clear structure to help understanding and predicting such rare and difficult-To-grasp events, our work proposes to build concise representations of individual mobility, that highlight mobility habits, driving behaviors and other factors deemed relevant for assessing the propensity to be involved in car accidents. The suggested approach is mainly based on a network representation of users' mobility, called Individual Mobility Networks, jointly with the analysis of descriptive features of the user's driving behavior related to driving style (e.g., accelerations) and characteristics of the mobility in the neighborhood visited by the user. The paper presents a large experimentation over a real dataset, showing comparative performances against baselines and competitors, and a study of some typical risk factors in the areas under analysis through the adoption of state-of-Art model explanation techniques. Preliminary results show the effectiveness and usability of the proposed predictive approach.DOI: 10.1109/mdm48529.2020.00030Project(s): Track and Know 
,
Track and Know
Metrics:
See at:
CNR IRIS
| ieeexplore.ieee.org
| ISTI Repository
| doi.org
| CNR IRIS
| CNR IRIS