Page 1 of 1

2017 Journal article Restricted

Manycore GPU processing of repeated range queries over streams of moving objects observations
Lettich F., Orlando S., Silvestri C., Jensen C. S.
The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised systems to feature low latency and high scalability. In this paper, we focus on a specific data-intensive problem concerning the repeated processing of huge amounts of range queries over massive sets of moving objects, where the spatial extent of queries and objects is continuously modified over time. To tackle this problem and significantly accelerate query processing, we devise a hybrid CPU/GPU pipeline that compresses data output and saves query processing work. The devised system relies on an ad-hoc spatial index leading to a problem decomposition that results in a set of independent data-parallel tasks. The index is based on a point-region quadtree space decomposition and allows to tackle effectively a broad range of spatial object distributions, even those very skewed. Also, to deal with the architectural peculiarities and limitations of the GPUs, we adopt non-trivial GPU data structures that avoid the need of locked memory accesses while favouring coalesced memory accesses, thus enhancing the overall memory throughput. To the best of our knowledge, this is the first work that exploits GPUs to efficiently solve repeated range queries over massive sets of continuously moving objects, possibly characterized by highly skewed spatial distributions. In comparison with state-of-the-art CPU-based implementations, our method highlights significant speedups in the order of 10 − 20×, depending on the dataset.Source: CONCURRENCY AND COMPUTATION, vol. 29 (issue 4)
DOI: 10.1002/cpe.3881
Metrics:

2024 Conference article Open Access

From text to locations: repurposing language models for spatial trajectory similarity assessment
De Melo Wilken C. D., Cruz L. A., Lettich F., Coelho Da Silva T. L., Magalhães R. P.
The proliferation of electronic devices with geopositioning capabilities has significantly increased trajectory data generation, thus opening up novel opportunities in mobility analysis. Our work considers the problem of assessing spatial similarity between trajectories, and focus on deep learning-based approaches that discretize trajectories using a uniform grid to generate their embeddings. In this context, t2vec is the reference approach. Large Language Models (LLMs) show promise in capturing patterns in mobility data. In this paper, we investigate whether an LLM can be repurposed to generate high-quality trajectory embeddings for the considered task. Using two real-world trajectory datasets, we consider repurposing three language models: Word2Vec, Doc2Vec, and BERT. Our results show that BERT, trained on dense trajectory datasets, can generate high-quality embeddings, thus highlighting the potential of LLMs.DOI: 10.5753/sbbd.2024.240212
Project(s): Spoke 1 ”Human-centered AI” of the M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 - ”FAIR - Future Artificial Intelligence Research”
Metrics:

See at: doi.org Open Access | IRIS Cnr | IRIS Cnr | doi.org Restricted | CNR IRIS

2024 Contribution to book Open Access

Message from the 1st GenAI4MoDa 2024 Workshop Chairs
Pinelli F., Lettich F.
-DOI: 10.1109/mdm61037.2024.00010
Metrics:

See at: IRIS Cnr Open Access | IRIS Cnr | IRIS Cnr | doi.org Restricted | CNR IRIS

2020 Journal article Open Access

Leveraging feature selection to detect potential tax fraudsters
Matos T, Macedo Ja, Lettich F, Monteiro Jm, Renso C, Perego R, Nardini Fm
Tax evasion is any act that knowingly or unknowingly, legally or unlawfully, leads to non-payment or underpayment of tax due. Enforcing the correct payment of taxes by taxpayers is fundamental in maintaining investments that are necessary and benefits a society as a whole. Indeed, without taxes it is not possible to guarantee basic services such as health-care, education, sanitation, transportation, infrastructure, among other services essential to the population. This issue is especially relevant in developing countries such as Brazil. In this work we consider a real-world case study involving the Treasury Office of the State of Ceará (SEFAZ-CE, Brazil), the agency in charge of supervising more than 300,000 active taxpayers companies. SEFAZ-CE maintains a very large database containing vast amounts of information concerning such companies. Its enforcement team struggles to perform thorough inspections on taxpayers accounts as the underlying traditional human-based inspection processes involve the evaluation of countless fraud indicators (i.e., binary features), thus requiring burdensome amounts of time and being potentially prone to human errors. On the other hand, the vast amount of taxpayer information collected by fiscal agencies opens up the possibility of devising novel techniques able to tackle fiscal evasion much more effectively than traditional approaches. In this work we address the problem of using feature selection to select the most relevant binary features to improve the classification of potential tax fraudsters. Finding out possible fraudsters from taxpayer data with binary features presents several challenges. First, taxpayer data typically have features with low linear correlation between themselves. Also, tax frauds may originate from intricate illicit tactics, which in turn requires to uncover non-linear relationships between multiple features. Finally, few features may be correlated with the targeted class. In this work we propose Alicia, a new feature selection method based on association rules and propositional logic with a carefully crafted graph centrality measure that attempts to tackle the above challenges while, at the same time, being agnostic to specific classification techniques. Alicia is structured in three phases: first, it generates a set of relevant association rules from a set of fraud indicators (features). Subsequently, from such association rules Alicia builds a graph, which structure is then used to determine the most relevant features. To achieve this Alicia applies a novel centrality measure we call the Feature Topological Importance. We perform an extensive experimental evaluation to assess the validity of our proposal on four different real-world datasets, where we compare our solution with eight other feature selection methods. The results show that Alicia achieves F-measure scores up to 76.88%, and consistently outperforms its competitors.Source: EXPERT SYSTEMS WITH APPLICATIONS, vol. 145 (issue 113128 (n. articolo))
DOI: 10.1016/j.eswa.2019.113128
Metrics:

See at: CNR IRIS Open Access | www.sciencedirect.com | Expert Systems with Applications Restricted | CNR IRIS | CNR IRIS

2022 Conference article Open Access

MAT-Builder: a system to build semantically enriched trajectories
Pugliese C, Lettich F, Renso C, Pinelli F
The notion of multiple aspect trajectory (MAT) has been recently introduced in the literature to represent movement data that is heavily semantically enriched with dimensions (aspects) representing various types of semantic information (e.g., stops, moves, weather, traffic, events, and points of interest). Aspects may be large in number, heterogeneous, or structurally complex. Although there is a growing volume of literature addressing the modelling and analysis of multiple aspect trajectories, the community suffers from a general lack of publicly available datasets. This is due to privacy concerns that make it difficult to publish such type of data, and to the lack of tools that are capable of linking raw spatio-temporal data to different types of semantic contextual data. In this work we aim to address this last issue by presenting MAT-Builder, a system that not only supports users during the whole semantic enrichment process, but also allows the use of a variety of external data sources. Furthermore, MAT-Builder has been designed with modularity and extensibility in mind, thus enabling practitioners to easily add new functionalities. The running example provided towards the end of the paper highlights how MAT-Builder's main features allow users to easily generate multiple aspect trajectories, hence benefiting the mobility data analysis community.Source: CEUR WORKSHOP PROCEEDINGS, pp. 175-182. Tirrenia, Pisa, Italy, 19-22/06/2022
Project(s): MobiDataLab via OpenAIRE

, MASTER

See at: ceur-ws.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted

2022 Conference article Open Access

MAT-Builder: a system to build semantically enriched trajectories
Pugliese C, Lettich F, Renso C, Pinelli F
The notion of multiple aspect trajectory (MAT) has been recently introduced in the literature to represent movement data that is heavily semantically enriched with dimensions (aspects) representing various types of semantic information (e.g., stops, moves, weather, traffic, events, and points of interest). Aspects may be large in number, heterogeneous, or structurally complex. Although there is a growing volume of literature addressing the modelling and analysis of multiple aspect tra-jectories, the community suffers from a general lack of publicly available datasets. This is due to privacy concerns that make it difficult to publish such type of data, and to the lack of tools that are capable of linking raw spatio-temporal data to different types of semantic contextual data. In this work we aim to address this last issue by presenting MAT-BUILDER, a system that not only supports users during the whole semantic enrichment process, but also allows the use of a variety of external data sources. Furthermore, MAT-BUILDER has been designed with modularity and extensibility in mind, thus enabling practitioners to easily add new functionalities to the system and set up their own semantic enrichment process. The demonstration scenario, which will be showcased during the demo session, highlights how MAT-BUILDER's main features allow users to easily generate multiple aspect trajectories, hence benefiting the mobility data analysis community.DOI: 10.1109/mdm55031.2022.00058
Project(s): MobiDataLab via OpenAIRE

, MASTER

Metrics:

See at: CNR IRIS Open Access | ieeexplore.ieee.org | ISTI Repository | CNR IRIS Restricted | CNR IRIS

2023 Conference article Open Access

A general methodology for building multiple aspect trajectories
Lettich F, Pugliese C, Renso C, Pinelli F
The massive use of personal location devices, the Internet of Mobile Things, and Location Based Social Networks, enables the collection of vast amounts of movement data. Such data can be enriched with several semantic dimensions (or aspects), i.e., contextual and heterogeneous information captured in the surrounding environment, leading to the creation of multiple aspect trajectories (MATs). In this work, we present how the MAT-Builder system can be used for the semantic enrichment processing of movement data while being agnostic to aspects and external semantic data sources. This is achieved by integrating MAT-Builder into a methodology which encompasses three design principles and a uniform representation formalism for enriched data based on the Resource Description Framework (RDF) format. An example scenario involving the generation and querying of a dataset of MATs gives a glimpse of the possibilities that our methodology can open up.DOI: 10.1145/3555776.3577832
Project(s): MobiDataLab via OpenAIRE

, MASTER

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: dl.acm.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted | CNR IRIS

2023 Journal article Open Access

Semantic enrichment of mobility data: a comprehensive methodology and the MAT-BUILDER system
Lettich F, Pugliese C, Renso C, Pinelli F
The widespread adoption of personal location devices, the Internet of Mobile Things, and Location Based Social Networks, enables the collection of vast amounts of movement data. This data often needs to be enriched with a variety of semantic dimensions, or aspects, that provide contextual and heterogeneous information about the surrounding environment, resulting in the creation of multiple aspect trajectories (MATs). Common examples of aspects can be points of interest, user photos, transportation means, weather conditions, social media posts, and many more. However, the literature does not currently provide a consensus on how to semantically enrich mobility data with aspects, particularly in dynamic scenarios where semantic information is extracted from numerous and heterogeneous external data sources. In this work, we aim to address this issue by presenting a comprehensive methodology to facilitate end users in instantiating their semantic enrichment processes of movement data. The methodology is agnostic to semantic aspects and external semantic data sources. The vision behind our methodology rests on three pillars: (1) three design principles which we argue are necessary for designing systems capable of instantiating arbitrary semantic enrichment processes; (2) the MAT-Builder system, which embodies these principles; (3) the use of an RDF knowledge graph-based representation to store MATs datasets, thereby enabling uniform querying and analysis of enriched movement data. We qualitatively evaluate the methodology in two complementary example scenarios, where we show both the potential in generating interesting and useful semantically enriched mobility datasets, and the expressive power in querying the resulting RDF trajectories with SPARQL.Source: IEEE ACCESS, vol. 11, pp. 90857-90875
DOI: 10.1109/access.2023.3307824
Project(s): MobiDataLab via OpenAIRE

, MASTER

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: CNR IRIS Open Access | ieeexplore.ieee.org | ISTI Repository | CNR IRIS Restricted

2023 Conference article Open Access

Summarizing trajectories using semantically enriched geographical context
Pugliese C, Lettich F, Pinelli F, Renso C
The proliferation of tracking sensors in today's devices has led to the generation of high-frequency, high-volume streams of mobility data capturing the movements of various objects. These movement data can be enriched with semantic contextual information, such as activities, events, user preferences, and more, generating semantically enriched trajectories. Creating and managing these types of trajectories presents challenges due to the massive data volume and the heterogeneous, complex semantic dimensions. To address these issues, we introduce a novel approach, MAT-Sum, which uses a location-centric enrichment perspective to summarize massive volumes of mobility data while preserving essential semantic information. Our approach enriches geographical areas with semantic aspects to provide the underlying context for trajectories, enabling effective data reduction through trajectory summarization. In the experimental evaluation, we show that MAT-Sum effectively minimizes trajectory volume while retaining a good level of semantic quality, thus presenting a viable solution to the relevant issue of managing massive mobility data.DOI: 10.1145/3589132.3625587
Project(s): MobiDataLab via OpenAIRE

, MASTER

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: dl.acm.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted

2024 Conference article Restricted

Understanding human mobility dynamics: insights from summarized semantic trajectories
Pugliese C., Lettich F., Pinelli F., Renso C.
Mobility data analysis provides insights into human movement patterns, traffic flows, and urban planning strategies. Human dynamics analysis focuses on tracking people to investigate how individuals and groups behave, interact, and evolve. Various mobility data sources, such as GPS, mobile phone records, social media, and transportation logs, are often semantically enriched and used for these analyses. This results in the generation of new, complex datasets that require effective summarization methods to reduce data volume while preserving relevant information. In this work, we aim to demonstrate the effective use of summarized semantic trajectories in analyzing human mobility behaviours. We offer empirical evidence from a case study, showing how this type of trajectory helps in understanding human mobility, especially in distinguishing between routine and non-routine behaviours. Experimental results show that the analysis results are comparable with the results obtained in the original (non summarized) dataset.DOI: 10.1109/mdm61037.2024.00039
Project(s): CAMEO, PRIN 2022 n. 2022ZLL7MW, SoBigData-PlusPlus via OpenAIRE

, Spoke 1 ”Human-centered AI” of the M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 - ”FAIR - Future Artificial Intelligence Research”
Metrics:

See at: doi.org Restricted | IRIS Cnr | IRIS Cnr | CNR IRIS

2016 Conference article Open Access

GPU-based parallelization of QuickScorer to speed-up document ranking with tree ensembles
Lettich F, Lucchese C, Nardini Fm, Orlando S, Perego R, Tonellotto N, Venturini R
Scoring documents with learning-to-rank (LtR) models based on large ensembles of regression trees currently represents one of the most effective solutions to rank query results returned by large scale Information Retrieval systems. However, such scoring models are very complex, and when deployed in real Web Search Engine infrastructures they are constrained within strict time budgets. This calls for very fast and efficient solutions, able to exploit all the computational resources offered by a given system. This paper investigates the opportunities offered by modern graphic cards (GPUs) to efficiently exploit LtR complex models based on trees ensembles to rank documents. To this end we propose GPUScorer, a GPU-based parallelization of the state-of-the-art algorithm QuickScorer to score documents with tree ensembles. GPUScorer takes advantage of the huge computational power of GPUs to perform tree ensemble traversal by evaluating multiple documents simultaneously. We provide a concise experimental evaluation, and show that GPUScorer is able to achieve speedups up to 32x over the sequential version of QuickScorer.Source: CEUR WORKSHOP PROCEEDINGS. Venezia, Italy, 30-31 May 2016

See at: ceur-ws.org Open Access | CNR IRIS | CNR IRIS Restricted

2019 Journal article Open Access

Speed prediction in large and dynamic traffic sensor networks
Magalhaes Rp, Lettich F, Macedo Ja, Nardini Fm, Perego R, Renso C, Trani R
Smart cities are nowadays equipped with pervasive networks of sensors that monitor traffic in real-time and record huge volumes of traffic data. These datasets constitute a rich source of information that can be used to extract knowledge useful for municipalities and citizens. In this paper we are interested in exploiting such data to estimate future speed in traffic sensor networks, as accurate predictions have the potential to enhance decision making capabilities of traffic management systems. Building effective speed prediction models in large cities poses important challenges that stem from the complexity of traffic patterns, the number of traffic sensors typically deployed, and the evolving nature of sensor networks. Indeed, sensors are frequently added to monitor new road segments or replaced/removed due to different reasons (e.g., maintenance). Exploiting a large number of sensors for effective speed prediction thus requires smart solutions to collect vast volumes of data and train effective prediction models. Furthermore, the dynamic nature of real-world sensor networks calls for solutions that are resilient not only to changes in traffic behavior, but also to changes in the network structure, where the cold start problem represents an important challenge. We study three different approaches in the context of large and dynamic sensor networks: local, global, and cluster-based. The local approach builds a specific prediction model for each sensor of the network. Conversely, the global approach builds a single prediction model for the whole sensor network. Finally, the cluster-based approach groups sensors into homogeneous clusters and generates a model for each cluster. We provide a large dataset, generated from ~1.3 billion records collected by up to 272 sensors deployed in Fortaleza, Brazil, and use it to experimentally assess the effectiveness and resilience of prediction models built according to the three aforementioned approaches. The results show that the global and cluster-based approaches provide very accurate prediction models that prove to be robust to changes in traffic behavior and in the structure of sensor networks.Source: INFORMATION SYSTEMS, vol. 98
DOI: 10.1016/j.is.2019.101444
Project(s): BigDataGrapes via OpenAIRE

, MASTER

Metrics:

See at: CNR IRIS Open Access | www.sciencedirect.com | Information Systems Restricted | CNR IRIS | CNR IRIS

2021 Contribution to conference Open Access

Cloud and data federation in MobiDataLab
Carlini E, Dazzi P, Lettich F, Perego R, Renso C
Today's innovative digital services dealing with the mobility of per- sons and goods produce huge amount of data. To propose advanced and efficient mobility services, the collection and aggregation of new sources of data from various producers are necessary. The overall objective of the MobiDataLab H2020 project is to propose to the mobility stakeholders (transport organising authorities, operators, industry, government and innovators) reproducible methodologies and sustainable tools that foster the development of a data-sharing culture in Europe and beyond. This short paper introduces the key concepts driving the design and definition of the Cloud and Data Federation that stands at the basis of MobiDataLab.DOI: 10.1145/3452369.3463819
Project(s): ACCORDION via OpenAIRE

Metrics:

See at: dl.acm.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted | CNR IRIS

2017 Conference article Restricted

Multicore/Manycore parallel traversal of large forests of regression trees
Lettich F, Lucchese C, Nardini Fm, Orlando S, Perego R, Tonellotto N, Venturini R
Machine-learnt models based on additive ensembles of binary regression trees are currently considered one of the best solutions to address complex classification, regression, and ranking tasks. To evaluate these complex models over a continuous stream of data items with high throughput requirements, we need to optimize, and possibly parallelize, the traversal of thousands of trees, each including hundreds of nodes.Document ranking in Web Search is a typical example of this challenging scenario, where complex tree-based models are used to score query-document pairs and finally rank lists of document results for each incoming query (a.k.a. Learning-to-Rank). In this extended abstract, we briefly discuss some preliminary results concerning the parallelization strategies for QUICKSCORER - indeed the state-of-art scoring algorithm that exploits ensembles of decision trees - by using multicore CPUs (with SIMD coprocessors) and manycore GPUs. We show that QUICKSCORER, which transforms the traversal of thousands of decision trees in a linear access to array data structures, can be parallelized very effectively, by achieving very interesting speedups.DOI: 10.1109/hpcs.2017.154
Metrics:

See at: doi.org Restricted | CNR IRIS | ieeexplore.ieee.org | CNR IRIS

2019 Journal article Open Access

Parallel Traversal of Large Ensembles of Decision Trees
Lettich F, Lucchese C, Nardini Fm, Orlando S, Perego R, Tonellotto N, Venturini R
Machine-learnt models based on additive ensembles of regression trees are currently deemed the best solution to address complex classification, regression, and ranking tasks. The deployment of such models is computationally demanding: to compute the final prediction, the whole ensemble must be traversed by accumulating the contributions of all its trees. In particular, traversal cost impacts applications where the number of candidate items is large, the time budget available to apply the learnt model to them is limited, and the users' expectations in terms of quality-of-service is high. Document ranking in web search, where sub-optimal ranking models are deployed to find a proper trade-off between efficiency and effectiveness of query answering, is probably the most typical example of this challenging issue. This paper investigates multi/many-core parallelization strategies for speeding up the traversal of large ensembles of regression trees thus obtaining machine-learnt models that are, at the same time, effective, fast, and scalable. Our best results are obtained by the GPU-based parallelization of the state-of-the-art algorithm, with speedups of up to 102.6x.Source: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (PRINT), vol. 30 (issue 9), pp. 2075-2089
DOI: 10.1109/tpds.2018.2860982
DOI: 10.5281/zenodo.2668378
DOI: 10.5281/zenodo.2668379
Project(s): BigDataGrapes via OpenAIRE

Metrics:

2022 Conference article Open Access

A federated cloud solution for transnational mobility data sharing
Carlini E, Chevalier T, Dazzi P, Lettich F, Perego R, Renso C, Trani S
Nowadays, innovative digital services are massively spreading both in the public and private sectors. In this work we focus on the digital data regarding the mobility of persons and goods, which are experiencing exponential growth thanks to the significant diffusion of telecommunication infrastructures and inexpensive GPS-equipped devices. The volume, velocity, and heterogeneity of mobility data call for advanced and efficient services to collect and integrate various data sources from different data producers. The MobiDataLab H2020 project aims to deal with these challenges by introducing an efficient and highly interoperable digital framework for mobility data sharing. In particular, the project aims to propose to the mobility stakeholders (i.e., transport organising authorities, operators, industry, governments, and innovators) reproducible methodologies and sustainable tools that can foster the development of a data-sharing culture in Europe and beyond. This paper introduces the key concepts driving the design and definition of a cloud-based data-sharing federation we call the Transport Cloud platform, which represents one of the main pillars of the MobiDataLab project. Such platform aims to ensure transnational access to mobility data in a secure, efficient, and seamless way, and to ensure that FAIR principles (i.e., mobility data should be findable, accessible, interoperable, and reusable) are enforced.Source: CEUR WORKSHOP PROCEEDINGS, pp. 586-592. Tirrenia, Pisa, Italy, 19-22/06/2022
Project(s): ACCORDION via OpenAIRE

, MobiDataLab via OpenAIRE

See at: ceur-ws.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted