Page 1 of 9

2005 Conference article Restricted

Speeding-up hierarchical agglomerative clustering in presence of expensive metrics
Nanni M
In several contexts and domains, hierarchical agglomerative clustering (HAC) offers best-quality results, but at the price of a high complexity which reduces the size of datasets which can be handled. In some contexts, in particular, computing distances between objects is the most expensive task. In this paper we propose a pruning heuristics aimed at improving performances in these cases, which is well integrated in all the phases of the HAC process and can be applied to two HAC variants: single-linkage and complete-linkage. After describing the method, we provide some theoretical evidence of its pruning power, followed by an empirical study of its effectiveness over different data domains, with a special focus on dimensionality issues.

See at: CNR IRIS Restricted | CNR IRIS

2007 Contribution to book Open Access

Extracting trees of quantitative serial episodes
Nanni M, Rigotti C
Among the family of the local patterns, episodes are commonly used when mining a single or multiple sequences of discrete events. An episode reflects a qualitative relation is-followed-by over event types, and the refinement of episodes to incorporate quantitative temporal information is still an on going research, with many application opportunities. In this paper, focusing on serial episodes, we design such a refinement called quantitative episodes and give a corresponding extraction algorithm. The three most salient features of these quantitative episodes are: (1) their ability to characterize main groups of homogeneous behaviors among the occurrences, according to the duration of the is-followed-by steps, and providing quantitative bounds of these durations organized in a tree structure; (2) the possibility to extract them in a complete way; and (3) to perform such extractions at the cost of a limited overhead with respect to the extraction of standard episodes.DOI: 10.1007/978-3-540-75549-4_11
Metrics:

2010 Journal article Open Access

Anonymization of moving objects databases by clustering and perturbation
Abul O, Bonchi F, Nanni M
Preserving individual privacy when publishing data is a problem that is receiving increasing attention. Thanks to its simplicity the concept of k-anonymity, introduced by Samarati and Sweeney [1], established itself as one fundamental principle for privacy preserving data publishing. According to the k-anonymity principle, each release of data must be such that each individual is indistinguishable from at least k − 1 other individuals. In this article we tackle the problem of anonymization of moving objects databases. We propose a novel concept of k-anonymity based on co-localization, that exploits the inherent uncertainty of the moving object's whereabouts. Due to sampling and imprecision of the positioning systems (e.g., GPS) , the trajectory of a moving object is no longer a polyline in a three-dimensional space, instead it is a cylindrical volume, where its radius delta represents the possible location imprecision: we know that the trajectory of the moving object is within this cylinder, but we do not know exactly where. If another object moves within the same cylinder they are indistinguishable from each other. This leads to the definition of (k, delta)-anonymity for moving objects databases. We first characterize the (k, delta)-anonymity problem, then we recall NWA (Never Walk Alone), a method that we introduced in [2] based on clustering and spatial perturbation. Starting from a discussion on the limits of NWA we develop a novel clustering method that, being based on EDR distance [3], has the important feature of being time-tolerant. As a consequence it perturbs trajectories both in space and time. The novel method, named W4M(Wait for Me), is empirically shown to produce higher quality anonymization than NWA, at the price of higher computational requirements. Therefore, in order to make W4M scalable to large datasets, we introduce two variants based on a novel (and computationally cheaper) time-tolerant distance function, and on chunking. All the variants of W4M are empirically evaluated in terms of data quality and efficiency, and thoroughly compared to their predecessor NWA. Data quality is assessed both by means of objective measures of information distortion, and by more usability oriented measure, i.e., by comparing the results of (i) spatio-temporal range queries and (ii) frequent pattern mining, executed on the original database and on the (k,delta)-anonymized one. Experimental results over both real-world and synthetic mobility data confirm that, for a wide range of values of d and k, the relative distortion introduced by our anonymization methods is kept low. Moreover, the techniques introduced to make W4M scalable to large datasets, achieve their goal without giving up data quality in the anonymization process.Source: INFORMATION SYSTEMS, vol. 35 (issue 8), pp. 884-910
DOI: 10.1016/j.is.2010.05.003
Project(s): Veri Yayınlamada Hassas Bilgi Gizleme via OpenAIRE

Metrics:

See at: Aperta - TÜBİTAK Açık Arşivi Open Access | Information Systems Restricted | CNR IRIS | CNR IRIS | www.sciencedirect.com

2010 Journal article Restricted

Dealing with interaction for complex systems modelling and prediction
Quattrociocchi W, Latorre D, Lodi E, Nanni M
The increasing complexity of problems in the context of system modelling is leading to a new epistemological approach able to provide a representation which allows from one hand, to model complex phenomena with the support of mathematical and computational instruments, and on the other hand able to capture the global system description. In this paper is presented a methodology for complex dynamical systems modelling which is an extension of the supervised learning paradigm. The theoretical aspects of our methodology are introduced and then two different and heterogeneous case studies are presented.DOI: 10.4018/jalr.2010102101
Metrics:

See at: International Journal of Artificial Life Research Restricted | CNR IRIS | CNR IRIS | www.igi-global.com

2004 Conference article Restricted

Mining literary texts using domain ontologies
Baglioni M, Nanni M, Giovannetti E
This paper describes a query system on texts and literary material with advanced information retrieval tools. As a test bed we chose the electronic version of Dante's Inferno, manually tagged using XML, enriched with a domain ontology describing the historical, social and cultural context represented as a separate XML document.

See at: CNR IRIS Restricted | CNR IRIS

2006 Other Open Access

Quantitative episode trees
Nanni M, Rigotti C
Among the family of the local patterns, episodes are com- monly used when mining a single or multiple sequences of discrete events. An episode re°ects a qualitative relation is-followed-by over event types, and the re ̄nement of episodes to incorporate quantitative temporal in- formation is still an on going research, with many application opportu- nities. In this paper, focusing on serial episodes, we design such a re ̄ne- ment called quantitative episodes and give a corresponding extraction algorithm. The three most salient features of these quantitative episodes are: (1) their ability to characterize main groups of homogeneous behav- iors among the occurrences, according to the duration of the is-followed- by steps, and providing quantitative bounds of these durations organized in a tree structure; (2) the possibility to extract them in a complete way; and (3) to perform such extractions at the cost of a limited overhead with respect to the extraction of standard episodes.

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted

2008 Other Open Access

A constraint-based approach for multispace clustering
Pensa R G, Nanni M
In many applications, a set of objects can be represented by different points of view (universes). Beside numeric, ordinal and nominal features, objects may be represented using spatio-temporal information, sequences, and more complex structures (e.g., graphs). Learning from all these different spaces is challenging, since often di erent algorithms and metrics are needed. In the case of data clustering, a partitional, hierarchical or density-based algorithm is often well suited for a speci c type of data, but not for other ones. In this work we present a preliminary study on a framework that tries to link different clustering results by exploiting pairwise similarity constraints. We propose two algorithmic settings, and we present an application to a real-world dataset of trajectories.

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted

2010 Contribution to book Restricted

Forecast analysis for sales in large-scale retail trade
Nanni M, Spinsanti L
In large-scale retail trade, a very significant problem consists in analyzing the response of clients to product promotions. The aim of the project described in this work is the extraction of forecasting models able to estimate the volume of sales involving a product under promotion, together with a prediction of the risk of out of stock events, in which case the sales forecast should be considered potentially underestimated. Our approach consists in developing a multi-class classifier with ordinal classes (lower classes represent smaller numbers of items sold) as opposed to more traditional approaches that translate the problem to a binary-class classification. In order to do that, a proper discretization of sales values is studied, and ad hoc quality measures are provided in order to evaluate the accuracy of forecast models taking into consideration the order of classes. Finally, an overall system for end users is sketched, where the forecasting functionality are organized in an integrated dashboard.DOI: 10.4018/978-1-60566-906-9.ch012
Metrics:

See at: doi.org Restricted | CNR IRIS | CNR IRIS | www.igi-global.com

2006 Software Metadata Only Access

MiSTA v2.1
Mirco Nanni
Algoritmo di estrazione di pattern sequenziali con annotazioni temporali (tempi tipici di transizione) basato sull'integrazione stretta di metodi prefix-projection-based per pattern sequenziali e metodi di stima di densità basati su kernel.

See at: CNR IRIS Restricted

2006 Software Metadata Only Access

TF-OPTICS: Time-focused density based clustering of trajectories
Margherita D'Auria, Mirco Nanni
Algoritmo di clustering density-based di traiettorie, con ricerca automatica dell'intervallo ottimale su cui focalizzare l'analisi.

See at: CNR IRIS Restricted

2005 Other Open Access

Hierarchical agglomerative clustering in presence of expensive metrics (Extended Tech. Rep.)
Nanni M
In several contexts and domains, hierarchical agglomerative clustering (HAC) offers best-quality results, but at the price of a high complexity which reduces the size of datasets which can be handled. In some contexts, in particular, computing distances between objects is the most expensive task. In all such situations the standard approach to HAC, which first computes all object-to-object distances and then performs the real clustering process, quickly yields high computational costs and large running times. One of the key means for containing such problem naturally lies in methods that can save a significant portion of distance computations, resulting in a smaller complexity. In this paper we propose a pruning heuristics well integrated in all the phases of the HAC process, developed for two HAC variants: single-linkage and complete-linkage. After describing the method, we provide some theoretical evidence of its pruning power, followed by an empirical study of its effectiveness over different data domains, with a special focus on dimensionality issues.

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted

2006 Journal article Open Access

Time-focused clustering of trajectories of moving objects
Nanni M, Pedreschi D
Spatio-temporal, geo-referenced datasets are growing rapidly, and will be more in the near future, due to both technological and social/commercial reasons. From the data mining viewpoint, spatio-temporal trajectory data introduce new dimensions and, correspondingly, novel issues in performing the analysis tasks. In this paper, we consider the clustering problem applied to the trajectory data domain. In particular, we propose an adaptation of a density-based clustering algorithm to trajectory data based on a simple notion of distance between trajectories. Then, a set of experiments on synthesized data is performed in order to test the algorithm and to compare it with other standard clustering approaches. Finally, a new approach to the trajectory clustering problem, called temporal focussing, is sketched, having the aim of exploiting the intrinsic semantics of the temporal dimension to improve the quality of trajectory clustering.Source: JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, vol. 27 (issue 3), pp. 267-289
DOI: 10.1007/s10844-006-9953-7
Metrics:

2013 Contribution to book Restricted

Mobility data mining
Nanni M

See at: CNR IRIS Restricted | CNR IRIS

2019 Journal article Open Access

Car telematics big data analytics for insurance and innovative mobility services
Longhi L, Nanni M
Car telematics is a large and growing business sector aiming to collect mobility-related data (mainly private and commercial vehicles) and to develop services of various nature both for individual citizens and other companies. Such services and applications include information systems to support car insurances, info-mobility services, ad hoc studies for planning purposes, etc. In this work we report and discuss some of the key challenges that a car telematics pilot application is facing within the EU project "Track and Know". The paper introduces the overall context, the main business goals identified as potentially beneficial of big data solutions and the type of data sources that such applications can rely on (in particular, those available within the project for experimental studies), then discusses initial results of the solutions developed so far and ongoing lines of research. In particular, the discussion will focus on the most relevant applications identified for the project purposes, namely new services for car insurance, electric vehicles mobility and car- and ride-sharing.Source: JOURNAL OF AMBIENT INTELLIGENCE & HUMANIZED COMPUTING (PRINT), vol. 11 (issue 10), pp. 3989-3999
DOI: 10.1007/s12652-019-01632-4
Project(s): Track and Know via OpenAIRE

Metrics:

2016 Journal article Open Access

Guest editors' introduction to the EcmlPkdd 2016 journal track special issue of Machine Learning
Gartner T, Nanni M, Passerini A, Robardet C
Source: DATA MINING AND KNOWLEDGE DISCOVERY, vol. 30 (issue 5), pp. 995-997
DOI: 10.1007/s10618-016-0476-8
DOI: 10.1007/s10994-016-5587-3
Metrics:

2016 Contribution to book Open Access

Partition-based clustering using constraint optimization
Grossi V, Guns T, Monreale A, Nanni M
Partition-based clustering is the task of partitioning a dataset in a number of groups of examples, such that examples in each group are similar to each other. Many criteria for what constitutes a good clustering have been identified in the literature; furthermore, the use of additional constraints to find more useful clusterings has been proposed. In this chapter, it will be shown that most of these clustering tasks can be formalized using optimization criteria and constraints. We demonstrate how a range of clustering tasks can be modelled in generic constraint programming languages with these constraints and optimization criteria. Using the constraint-based modeling approach we also relate the DBSCAN method for density-based clustering to the label propagation technique for community discovery.DOI: 10.1007/978-3-319-50137-6_11
Metrics:

2016 Journal article Open Access

Guest editors' introduction to the EcmlPkdd 2016 journal track special issue of Machine Learning
Gartner T, Nanni M, Passerini A, Robardet C
Source: MACHINE LEARNING, vol. 104 (issue 2-3), pp. 149-150
DOI: 10.1007/s10994-016-5587-3
DOI: 10.1007/s10618-016-0476-8
Metrics:

2018 Conference article Restricted

Advancements in mobility data analysis
Nanni M
Some recent advancements in the area of Mobility Data Analysis are discussed, a field in which data mining and machine learning methods are applied to infer descriptive patterns and predictive models from digital traces of (human) movement.Source: ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING, pp. 11-16. Rome, Italy, 25-26/10/2017
DOI: 10.1007/978-3-319-75608-0_2
Metrics:

See at: doi.org Restricted | CNR IRIS | CNR IRIS | link.springer.com

2020 Journal article Open Access

Ranking places in attributed temporal urban mobility networks
Nanni M, Tortosa L, Vicent Jf, Yeghikyan G
Drawing on the recent advances in complex network theory, urban mobility flow patterns, typically encoded as origin-destination (OD) matrices, can be represented as weighted directed graphs, with nodes denoting city locations and weighted edges the number of trips between them. Such a graph can further be augmented by node attributes denoting the various socio-economic characteristics at a particular location in the city. In this paper, we study the spatio-temporal characteristics of "hotspots"of different types of socio-economic activities as characterized by recently developed attribute-augmented network centrality measures within the urban OD network. The workflow of the proposed paper comprises the construction of temporal OD networks using two custom data sets on urban mobility in Rome and London, the addition of socio-economic activity attributes to the OD network nodes, the computation of network centrality measures, the identification of "hotspots"and, finally, the visualization and analysis of measures of their spatio-temporal heterogeneity. Our results show structural similarities and distinctions between the spatial patterns of different types of human activity in the two cities. Our approach produces simple indicators thus opening up opportunities for practitioners to develop tools for real-time monitoring and visualization of interactions between mobility and economic activity in cities.Source: PLOS ONE, vol. 15
DOI: 10.1371/journal.pone.0239319
Project(s): Track and Know via OpenAIRE

, Track and Know via OpenAIRE

Metrics:

2020 Conference article Open Access

Crash prediction and risk assessment with individual mobility networks
Guidotti R, Nanni M
The massive and increasing availability of mobility data enables the study and the prediction of human mobility behavior and activities at various levels. In this paper, we address the problem of building a data-driven model for predicting car drivers' risk of experiencing a crash in the long-Term future, for instance, in the next four weeks. Since the raw mobility data, although potentially large, typically lacks any explicit semantics or clear structure to help understanding and predicting such rare and difficult-To-grasp events, our work proposes to build concise representations of individual mobility, that highlight mobility habits, driving behaviors and other factors deemed relevant for assessing the propensity to be involved in car accidents. The suggested approach is mainly based on a network representation of users' mobility, called Individual Mobility Networks, jointly with the analysis of descriptive features of the user's driving behavior related to driving style (e.g., accelerations) and characteristics of the mobility in the neighborhood visited by the user. The paper presents a large experimentation over a real dataset, showing comparative performances against baselines and competitors, and a study of some typical risk factors in the areas under analysis through the adoption of state-of-Art model explanation techniques. Preliminary results show the effectiveness and usability of the proposed predictive approach.DOI: 10.1109/mdm48529.2020.00030
Project(s): Track and Know via OpenAIRE

, Track and Know via OpenAIRE

Metrics: