2012
Conference article
Restricted
Mega-modeling for big data analytics
Ceri S, Della Valle E, Pedreschi D, Trasarti RThe availability of huge amounts of data ("big data") is changing our attitude towards science, which is moving from specialized to massive experi- ments and from very focused to very broad research questions. Models of all kinds, from analytic to numeric, from exact to stochastic, from simulative to predictive, from behavioral to ontological, from patterns to laws, enable mas- sive data analysis and mining, often in real time. Scientific discovery in most cases stems from complex pipelines of data analysis and data mining methods on top of "big" experimental data, confronted and contrasted with state-of-art knowledge. In this setting, we propose mega-modelling as a new holistic data and model management system for the acquisition, composition, integration, management, querying and mining of data and models, capable of mastering the co-evolution of data and models and of supporting the creation of what-if anal- yses, predictive analytics and scenario explorations.DOI: 10.1007/978-3-642-34002-4_1Metrics:
See at:
doi.org
| CNR IRIS
| CNR IRIS
| link.springer.com
2019
Journal article
Open Access
Computational modelling and data-driven techniques for systems analysis
Matwin S, Tesei L, Trasarti RThis JIIS Special Issue aimed at bringing together contributions from academia, industry and research institutions interested in the combined application of computational modelling methods with data-driven techniques from the areas of knowledge management, data mining and machine learning. Modelling methodologies of interest included automata, agents, Petri nets, process algebras and rewriting systems. Application domains included social systems, ecology, biology, medicine, smart cities, governance, education, software engineering, and any other field that deals with complex systems and large amounts of data.Source: JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, vol. 52 (issue 3), pp. 473-475
DOI: 10.1007/s10844-019-00554-zProject(s): SoBigData
Metrics:
See at:
Journal of Intelligent Information Systems
| CNR IRIS
| Journal of Intelligent Information Systems
| ISTI Repository
| CNR IRIS
2019
Journal article
Open Access
Finding roles of players in football using automatic particle swarm optimization-clustering algorithm
Behravan I, Zahiri Sh, Razavi Sm, Trasarti RRecently, professional team sport organizations have invested their resources to analyze their own and opponents' performance. So, developing methods and algorithms for analyzing team sports has become one of the most popular topics among data scientists. Analyzing football is hard because of its complexity, number of events in each match, and constant flow of circulation of the ball. Finding roles of players with the purpose of analyzing the performance of a team or making a meaningful comparison between players is crucial. In this article, an automatic big data clustering method, based on a swarm intelligence algorithm, is proposed to automatically cluster the data set of players' performance centers in different matches and extract different kinds of roles in football. The proposed method created using particle swarm optimization algorithm has two phases. In the first phase, the algorithm searches the solution space to find the number of clusters and, in the second phase, it finds the positions of the centroids. To show the effectiveness of the algorithm, it is tested on six synthetic data sets and its performance is compared with two other conventional clustering methods. After that, the algorithm is used to find clusters of a data set containing 93,000 objects, which are the centers of players' performance in about 4900 matches in different European leagues.Source: BIG DATA, vol. 7 (issue 1), pp. 35-56
DOI: 10.1089/big.2018.0069Metrics:
See at:
CNR IRIS
| ISTI Repository
| www.liebertpub.com
| Big Data
| CNR IRIS
| CNR IRIS
2023
Conference article
Open Access
Dataspaces: concepts, architectures and initiatives
Atzori M, Ciaramella A, Diamantini C, Di Martino B, Distefano S, Facchinetti T, Montecchiani F, Nocera A, Ruffo G, Trasarti RDespite not being a new concept, dataspaces have become a prominent topic due to
the increasing availability of data and the need for efficient management and utilization
of diverse data sources. In simple terms, a dataspace refers to an environment where
data from various sources, formats, and domains can be integrated, shared, and
analyzed. It aims to provide a unified view of heterogeneous data by bridging the gap
between different data silos, enabling interoperability. The concept of dataspaces
promotes the idea that data should be treated as a cohesive entity, rather than being
fragmented across different systems and applications.
Dataspaces often involve the integration of structured and unstructured data, including
databases, documents, sensor data, social media feeds, and more. The goal is to
enable organizations to harness the full potential of their data assets by facilitating
data discovery, access, and analysis. By bringing together diverse data sources,
dataspaces can offer new insights, support decision-making processes, and drive
innovation.
In the context of European Commission-funded research projects, dataspaces are
often explored as part of initiatives focused on data management, data sharing, and
the development of data-driven technologies. These projects aim to address
challenges related to data integration, data privacy, data governance, and scalability.
The goal is to advance the state of the art in data management and enable
organizations to leverage data more effectively for societal, economic, and scientific
advancements.
It is important to notice that while dataspaces offer potential benefits, they also come
with challenges. These challenges include data quality assurance, data privacy and
security, semantic interoperability, scalability, and the need for appropriate data
governance frameworks.
Overall, dataspaces represent an approach to managing and utilizing data that
emphasizes integration, interoperability, and accessibility. The concept is being
explored and researched to develop innovative solutions that can unlock the value of
data in various domains and sectors.Source: CEUR WORKSHOP PROCEEDINGS. Naples, Italy, 11-13/09/2023
Project(s): SoBigData 
See at:
ceur-ws.org
| CNR IRIS
| ISTI Repository
| CNR IRIS
2009
Conference article
Open Access
Mobility, data mining and privacy: the GeoPKDD paradigm
Trasarti R, Giannotti FThe technologies of mobile communications and ubiquitous computing pervade our society, and wireless networks sense the movement of people and vehicles, generating large volumes of mobility data. Miniaturization, wearability, pervasiveness are producing traces of our mobile activity, with increasing positioning accuracy and semantic richness: Location data from mobile phones (GSM cell positions), GPS tracks from mobile devices receiving geo-positions from satellites, etc. The objective of the GeoPKDD (Geographic Privacy-aware Knowledge Discovery and Delivery) project is to discover useful knowledge about human movement behaviour from mobility data, while preserving the privacy of the people under observation. Pursuing this ambitious objective, the GeoPKDD project has started a new exciting multidisciplinary research area, at the crossroads of mobility, data mining, and privacy. This paper gives a short overview of the envisaged research challenges and the project achievements.
See at:
CNR IRIS
| www.siam.org
| CNR IRIS
2009
Conference article
Restricted
A new technique for sequential pattern mining under regular expressions
Trasarti R, Bonchi F, Goethals BIn this paper we study the problem of mining frequent sequences satisfying a given regular expression. Previous approaches to solve this problem were focusing on its search space, pushing (in some way) the given regular expression to prune unpromising candidate patterns. On the contrary, we focus completely on the given input data and regular expression. We introduce Sequence Mining Automata (SMA), a specialized kind of Petri Net that while reading input sequences, it produces for each sequence all and only the patterns contained in the sequence and that satisfy the given regular expression. Based on this automaton, we develop a family of algorithms. Our thorough experimentation on different datasets and application domains confirms that in many cases our methods outperform the current state of the art of frequent sequence mining algorithms using regular expressions (in some cases of orders of magnitude).
See at:
CNR IRIS
| CNR IRIS
2006
Software
Metadata Only Access
ConQueSt
Bonchi F, Lucchese C, Trasarti RConQueSt: a Constraint-based Querying System for Exploratory Pattern Discovery, è un complesso software di mining sviluppato con l'obiettivo di supportare il processo di pattern discovery in tutte le sue fasi. ConQueSt segue la visione dell'Inductive Database, in cui il mining è visto come una forma più complessa di querying. I pattern estratti tramite query di mining, vengono materializzati
in forma relazionale accanto ai dati da cui sono stati estratti, potendo a loro volta essere oggetto di query. Il software ConQueSt è costruito intorno al motore di mining ExAMinerGEN, attraverso JDBC si può collegare a qualsiasi DBMS commerciale, permettendo di eseguire il processo di scoperta della conoscenza direttamente nei database, senza bisogno di spostare i dati, e sfruttando tutte le funzionalità per la gestione dei dati offerte dai DBMS commerciali; è equipaggiato con un espressivo linguaggio di interrogazione denominato SPQL (Simple Pattern Query Language, un sovrainsieme di SQL per il pattern discovery), che permette di definire la sorgente dei dati per l'estrazione dei pattern, permette di eseguire varie operazioni di pre-processing dei dati, e permette
di definire i vincoli che i pattern devono soddisfare per essere considerati interessanti; è equipaggiato con una interfaccia utente che permette di definire complesse query SPQL tramite un semplice paradigma grafico; permette inoltre di navigare dentro i dati e i pattern estratti, mostrando statistiche.
See at:
CNR IRIS
2010
Conference article
Restricted
Querying and mining trajectories with gaps: a multi-path reconstruction approach (Extended Abstract)
Nanni M, Trasarti RIn this paper we propose a map matching method to overcoming the limitations of standard best-match reconstruction strategies. We use a more flex- ible approach which consider the k-optimal alternative paths to reconstruct the trajectories from the GPS raw data. The preliminary results, obtained on a real dataset of car users in Milan area, suggest that our method leads to beneficial effects on the successive analysis to be performed such as KNN and clustering.
See at:
CNR IRIS
| CNR IRIS
2011
Journal article
Restricted
C-safety: a framework for the anonymization of semantic trajectories
Monreale Anna, Trasarti Roberto, Pedreschi Dino, Renso Chiara, Bogorny VaniaThe increasing abundance of data about the trajectories of personal movement is opening new opportunities for analyzing and mining human mobility. However, new risks emerge since it opens new ways of intruding into personal privacy. Representing the personal movements as se- quences of places visited by a person during her/his movements - semantic trajectory - poses great privacy threats. In this paper we propose a privacy model defining the attack model of semantic tra- jectory linking and a privacy notion, called c-safety based on a generalization of visited places based on a taxonomy. This method provides an upper bound to the probability of inferring that a given person, observed in a sequence of non-sensitive places, has also visited any sensitive location. Co- herently with the privacy model, we propose an algorithm for transforming any dataset of semantic trajectories into a c-safe one. We report a study on two real-life GPS trajectory datasets to show how our algorithm preserves interesting quality/utility measures of the original trajectories, when min- ing semantic trajectories sequential pattern mining results. We also empirically measure how the probability that the attacker's inference succeeds is much lower than the theoretical upper bound established.Source: TRANSACTIONS ON DATA PRIVACY (INTERNET), vol. 4 (issue 2), pp. 73-101
See at:
CNR IRIS
| CNR IRIS
| www.tdp.cat
2013
Other
Restricted
Mob-Warehouse: a semantic approach for mobility analysis with a trajectory data warehouse
Wagner R, De Macedo J A F, Raffaetà A, Renso C, Roncato A, Trasarti RThe effective analysis and understanding of huge amount of mobility data have been a hot research topic in the last few years. Some proposals addressed the definition of Trajectory Data Warehouses (TDW) as a way to represent and aggregate mobility data, where the ba- sic object is the trajectory. In this paper, we introduce Mob-Warehouse, a TDW which goes a step further since it models trajectories enriched with semantics. In Mob-Warehouse, the unit of movement is the (spatio- temporal) point enriched with several non spatio-temporal dimensions including the activity, the transportation means and the mobility pat- tern. This model allows us to answer the classical Why, Who, When, Where, What, How questions providing an aggregated view of different aspects of the user movements, no longer limited to space and time. We briefly present an experiment of Mob-Warehouse on a real dataset.Project(s): SEEK 
See at:
CNR IRIS
| CNR IRIS
2013
Conference article
Restricted
Estimating time-dependent speed functions using a gravity model over road network
Cintia P, Trasarti R, Macedo J A, Almada L, Ferreira CThe availability of inexpensive tracking devices,such as GPS- enabled devices, gives the opportunity to collect large amounts of trajectory data from vehicles. In this context, we are interested in the problem of generating the traffic information in time-dependent networks using this kind of data. This problem is not trivial since several works in liter- ature use strong assumptions on the error distribution we want to drop, proposing a gravitational model method to compute road segment aver- age speed from trajectory data. Furthermore we show how to generate travel-time functions from the computed average speeds useful for time- dependent networks routing systems. Our approach allows creating an accurate picture of the traffic conditions in time and space. The method we present in this paper tackles all this aspect showing how its perfor- mance over a synthetic dataset and a real case.Project(s): SEEK 
See at:
CNR IRIS
| CNR IRIS