2020
Journal article
Open Access
(So) Big Data and the transformation of the city
Andrienko G., Andrienko N., Boldrini C., Caldarelli G., Cintia P., Cresci S., Facchini A., Giannotti F., Gionis A., Guidotti R., Mathioudakis M., Muntean C. I., Pappalardo L., Pedreschi D., Pournaras E., Pratesi F., Tesconi M., Trasarti R.The exponential increase in the availability of large-scale mobility data has fueled the vision of smart cities that will transform our lives. The truth is that we have just scratched the surface of the research challenges that should be tackled in order to make this vision a reality. Consequently, there is an increasing interest among different research communities (ranging from civil engineering to computer science) and industrial stakeholders in building knowledge discovery pipelines over such data sources. At the same time, this widespread data availability also raises privacy issues that must be considered by both industrial and academic stakeholders. In this paper, we provide a wide perspective on the role that big data have in reshaping cities. The paper covers the main aspects of urban data analytics, focusing on privacy issues, algorithms, applications and services, and georeferenced data from social media. In discussing these aspects, we leverage, as concrete examples and case studies of urban data science tools, the results obtained in the "City of Citizens" thematic area of the Horizon 2020 SoBigData initiative, which includes a virtual research environment with mobility datasets and urban analytics methods developed by several institutions around Europe. We conclude the paper outlining the main research challenges that urban data science has yet to address in order to help make the smart city vision a reality.Source: International Journal of Data Science and Analytics (Print) 1 (2020). doi:10.1007/s41060-020-00207-3
DOI: 10.1007/s41060-020-00207-3Project(s): SoBigData Metrics:
See at:
Aaltodoc Publication Archive | International Journal of Data Science and Analytics | White Rose Research Online | HELDA - Digital Repository of the University of Helsinki | Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari | link.springer.com | International Journal of Data Science and Analytics | City Research Online | ISTI Repository | Fraunhofer-ePrints | CNR ExploRA
2019
Journal article
Open Access
PRIMULE: Privacy risk mitigation for user profiles
Pratesi F., Gabrielli L., Cintia P., Monreale A., Giannotti F.The availability of mobile phone data has encouraged the development of different data-driven tools, supporting social science studies and providing new data sources to the standard official statistics. However, this particular kind of data are subject to privacy concerns because they can enable the inference of personal and private information. In this paper, we address the privacy issues related to the sharing of user profiles, derived from mobile phone data, by proposing PRIMULE, a privacy risk mitigation strategy. Such a method relies on PRUDEnce (Pratesi et al., 2018), a privacy risk assessment framework that provides a methodology for systematically identifying risky-users in a set of data. An extensive experimentation on real-world data shows the effectiveness of PRIMULE strategy in terms of both quality of mobile user profiles and utility of these profiles for analytical services such as the Sociometer (Furletti et al., 2013), a data mining tool for city users classification.Source: Data & knowledge engineering 125 (2019). doi:10.1016/j.datak.2019.101786
DOI: 10.1016/j.datak.2019.101786Project(s): SoBigData Metrics:
See at:
ISTI Repository | Archivio istituzionale della Ricerca - Scuola Normale Superiore | Data & Knowledge Engineering | www.sciencedirect.com | CNR ExploRA
2019
Software
Unknown
PlayeRank
Cintia P., Pappalardo L.PlayeRank is a data-driven algorithm that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. Playerank is designed to work with soccer-logs, in which a match consists of a sequence of events encoded as a tuple: (id, type, position, timestamp), where id is the identifer of the player that originated/refers to this event, type is the event type (i.e., passes, shots, goals, tackles, etc.), position and timestamp denote the spatio-temporal coordinates of the event over the soccer field. PlayeRank assumes that soccer-logs are stored into a database, which is updated with new events after each soccer match.
An exhaustive description of PlayeRank framework is available in this paper:
Pappalardo, Luca, Cintia, Paolo, Ferragina, Paolo, Massucco, Emanuele, Pedreschi, Dino & Giannotti, Fosca (2019) PlayeRank: Data-driven Performance Evaluation and Player Ranking in Soccer via a Machine Learning Approach. ACM Transactions on Intelligent Systems and Technologies 10(5), DOI:https://doi.org/10.1145/3343172Project(s): SoBigData
See at:
github.com | CNR ExploRA
2019
Master thesis
Unknown
Injury forecasting in soccer utilizing machine learning and multivariate time series
Guerrini L. Relatori: Paolo Ferragina, Luca Pappalardo, Paolo CintiaInjuries have a great impact on professional soccer due to their influence on team performance and considerable costs of rehabilitation for players. In this thesis, we use injury records and workload data describing the training sessions of players in a professional soccer club, spanning two entire seasons, to train and compare three classes of approaches to injury forecasting, i.e., predicting whether or not a player will get injured in next matches or training sessions. The first class of approaches is based on traditional techniques used in sports science and industry, such as the Acute Chronic Workload Ratio. The second class is based on machine learning tools such as decision tree and k-nearest neighbor classifier. The third class of approaches extends the second class by fully exploiting the temporal information present in the data through the usage of a multivariate time series representation of a player's workload history. We demonstrate that machine learning approaches significantly outperform traditional techniques still used in sports industry, moving accuracy prediction from 4% up to 50%, paving the way to a more accurate monitoring of the health status of soccer players.Project(s): SoBigData
See at:
etd.adm.unipi.it | CNR ExploRA
2019
Master thesis
Unknown
Capturing football-teams behavior with a stochastic model
Barbone M. Relatori: Paolo Ferragina, Luca Pappalardo, Paolo CintiaThis thesis aims to capture soccer teams behavior using a stochastic approach on a graph built on top of the Wyscout dataset, a market-leading company in data scouting for soccer. The main contributions of the thesis are twofold: first, it proposes a stochastic representation of a soccer game via a weighted graph properly derived from the Wyscout dataset. Secondly, it analyses every game through a stochastic model to detect the way teams move the ball together with the way they move onto the field and the performance that they achieve.Project(s): SoBigData
See at:
etd.adm.unipi.it | CNR ExploRA
2018
Journal article
Open Access
Effective injury forecasting in soccer with GPS training data and machine learning
Rossi A., Pappalardo L., Cintia P., Iaia F. M., Fernandez J., Medina D.Injuries have a great impact on professional soccer, due to their large influence on team performance and the considerable costs of rehabilitation for players. Existing studies in the literature provide just a preliminary understanding of which factors mostly affect injury risk, while an evaluation of the potential of statistical models in forecasting injuries is still missing. In this paper, we propose a multi-dimensional approach to injury forecasting in professional soccer that is based on GPS measurements and machine learning. By using GPS tracking technology, we collect data describing the training workload of players in a professional soccer club during a season. We then construct an injury forecaster and show that it is both accurate and interpretable by providing a set of case studies of interest to soccer practitioners. Our approach opens a novel perspective on injury prevention, providing a set of simple and practical rules for evaluating and interpreting the complex relations between injury risk and training performance in professional soccer.Source: PloS one 13 (2018): 1–15. doi:10.1371/journal.pone.0201264
DOI: 10.1371/journal.pone.0201264DOI: 10.48550/arxiv.1705.08079Project(s): SoBigData Metrics:
See at:
arXiv.org e-Print Archive | PLoS ONE | PLoS ONE | PLoS ONE | PLoS ONE | ISTI Repository | doi.org | CNR ExploRA
2017
Conference article
Open Access
Who is going to get hurt? Predicting injuries in professional soccer
Rossi A., Pappalardo L., Cintia P., Fernandez J., Iaia F. M., Medina D.Injury prevention has a fundamental role in professional soccer due to the high cost of recovery for players and the strong influence of injuries on a club's performance. In this paper we provide a predictive model to prevent injuries of soccer players using a multidimensional approach based on GPS measurements and machine learning. In an evolutive scenario, where a soccer club starts collecting the data for the first time and updates the predictive model as the season goes by, our approach can detect around half of the injuries, allowing the soccer club to save 70% of a season's economic costs related to injuries. The proposed approach can be a valuable support for coaches, helping the soccer club to reduce injury incidence, save money and increase team performance.Source: MLSA'17 - 4th Workshop on Machine Learning and Data Mining for Sports Analytics, pp. 21–30, Skopje, Macedonia, 18 September 2017
Project(s): SoBigData
See at:
ceur-ws.org | ISTI Repository | CNR ExploRA
2017
Journal article
Open Access
Discovering and understanding city events with big data: the case of Rome
Furletti B., Trasarti R., Cintia P., Gabrielli L.The increasing availability of large amounts of data and digital footprints has given rise to ambitious research challenges in many fields, which spans from medical research, financial and commercial world, to people and environmental monitoring. Whereas traditional data sources and census fail in capturing actual and up-to-date behaviors, Big Data integrate the missing knowledge providing useful and hidden information to analysts and decision makers. With this paper, we focus on the identification of city events by analyzing mobile phone data (Call Detail Record), and we study and evaluate the impact of these events over the typical city dynamics. We present an analytical process able to discover, understand and characterize city events from Call Detail Record, designing a distributed computation to implement Sociometer, that is a profiling tool to categorize phone users. The methodology provides an useful tool for city mobility manager to manage the events and taking future decisions on specific classes of users, i.e., residents, commuters and tourists.Source: Information (Basel) 8 (2017). doi:10.3390/info8030074
DOI: 10.3390/info8030074Metrics:
See at:
Information | ISTI Repository | www.mdpi.com | Information | CNR ExploRA
2016
Contribution to conference
Open Access
Network-based performance indicators for football teams
Pappalardo L., Cintia P.Sports analytics has evolved in recent years in an amazing way, thanks to the sensing technologies that provide data streams extracted from every game. Despite the increasing wealth of data, there is not yet a consolidated repertoire of indicators for the various facets of team and players performance. In this poster we propose two data-driven approaches to measure the performance of football teams and football players.Source: International School and Conference on Network Science (Netsci-x), Wroclaw, Polonia, 11-13/01/2016
See at:
netsci-x.net | ISTI Repository | CNR ExploRA
2016
Report
Unknown
ASAP - Telecommunication Data Analytics (TDA) specification and early prototype
Bertoldi R., Cintia P., Trasarti R.The main objective of this Work Package (WP) is the design and development of an analytics application on WIND Telecommunications customer data, targeted towards tourism and mobility scenarios. The envisaged use cases will be integrated into the ASAP framework and will be evaluated using several measurement methods. At the end of the project's second year (M24) the tasks involved are three: the end of the task T9.2, the task T9.3 and the beginning of task T9.4.Source: Project report, ASAP, Deliverable D9.3, 2016
Project(s): ASAP
See at:
CNR ExploRA
2016
Report
Open Access
ProgettISTI 2016
Banterle F., Barsocchi P., Candela L., Carlini E., Carrara F., Cassarà P., Ciancia V., Cintia P., Dellepiane M., Esuli A., Gabrielli L., Germanese D., Girardi M., Girolami M., Kavalionak H., Lonetti F., Lulli A., Moreo Fernandez A., Moroni D., Nardini F. M., Monteiro De Lira V. C., Palumbo F., Pappalardo L., Pascali M. A., Reggianini M., Righi M., Rinzivillo S., Russo D., Siotto E., Villa A.ProgettISTI research project grant is an award for members of the Institute of Information Science and Technologies (ISTI) to provide support for innovative, original and multidisciplinary projects of high quality and potential. The choice of theme and the design of the research are entirely up to the applicants yet (i) the theme must fall under the ISTI research topics, (ii) the proposers of each project must be of diverse laboratories of the Institute and must contribute different expertise to the project idea, and (iii) project proposals should have a duration of 12 months. This report documents the procedure, the proposals and the results of the 2016 edition of the award. In this edition, ten project proposals have been submitted and three of them have been awarded.Source: ISTI Technical reports, 2016
See at:
ISTI Repository | CNR ExploRA
2016
Conference article
Restricted
The Haka network: Evaluating rugby team performance with dynamic graph analysis
Cintia P., Pappalardo L., Coscia M.Real world events are intrinsically dynamic and analytic techniques have to take into account this dynamism. This aspect is particularly important on complex network analysis when relations are channels for interaction events between actors. Sensing technologies open the possibility of doing so for sport networks, enabling the analysis of team performance in a standard environment and rules. Useful applications are directly related for improving playing quality, but can also shed light on all forms of team efforts that are relevant for work teams, large firms with coordination and collaboration issues and, as a consequence, economic development. In this paper, we consider dynamics over networks representing the interaction between rugby players during a match. We build a pass network and we introduce the concept of disruption network, building a multilayer structure. We perform both a global and a micro-level analysis on game sequences. When deploying our dynamic graph analysis framework on data from 18 rugby matches, we discover that structural features that make networks resilient to disruptions are a good predictor of a team's performance, both at the global and at the local level. Using our features, we are able to predict the outcome of the match with a precision comparable to state of the art bookmaking.Source: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1095–1102, San Francisco, Ca, USA, 18-21 August 2016
DOI: 10.1109/asonam.2016.7752377Project(s): SoBigData Metrics:
See at:
doi.org | ieeexplore.ieee.org | CNR ExploRA
2015
Report
Open Access
An effective time-aware map matching process for low sampling GPS data
Cintia P., Nanni M.In the era of the proliferation of Geo-Spatial Data, induced by the diffusion of GPS devices, the map matching problem still represents an important and valuable challenge. The process of associating a segment of the underlying road network to a GPS point gives us the chance to enrich raw data with the semantic layer provided by the roadmap, with all contextual information associated to it, e.g. the presence of speed limits, attraction points, changes in elevation, etc. Most state-of-art solutions for this classical problem simply look for the shortest or fastest path connecting any pair of consecutive points in a trip. While in some contexts that is reasonable, in this work we argue that the shortest/fastest path assumption can be in general erroneous. Indeed, we show that such approaches can yield travel times that are significantly incoherent with the real ones, and propose a Time-Aware Map matching process that tries to improve the state-of-art by taking into account also such temporal aspect. Our algorithm results to be very efficient, effective on low- sampling data and to outperform existing solutions, as proved by experiments on large datasets of real GPS trajectories. Moreover, our algorithm is parameter-free and does not depend on specific characteristics of the GPS localization error and of the road network (e.g. density of roads, road network topology, etc.).Source: ISTI Technical reports, 2015
See at:
ISTI Repository | CNR ExploRA
2015
Contribution to book
Open Access
Towards a boosted route planner using individual mobility models
Guidotti R., Cintia P.Route planners generally return routes that minimize either the distance covered or the time traveled. However, these routes are rarely considered by people who move in a certain area systematically. Indeed, due to their expertise, they very often prefer different solutions. In this paper we provide an analytic model to study the deviations of the systematic movements from the paths proposed by a route planner. As proxy of human mobility we use real GPS traces and we analyze a set of users which act in Pisa and Florence province. By using appropriate mobility data mining techniques, we extract the GPS systematic movements and we transform them into sequences of road segments. Finally, we calculate the shortest and fastest path from the origin to the destination of each systematic movement and we compare them with the routes mapped on the road network. Our results show that about 30-35% of the systematic movements follow the shortest paths, while the others follow routes which are on average 7 km longer. In addition, we divided the area object of study in cells and we analyzed the deviations in the flows of systematic movements. We found that, these deviations are not only driven by individual mobility behaviors but are a signal of an existing common sense that could be exploited by a route planner.Source: Software Engineering and Formal Methods, edited by Domenico Bianculli, Radu Calinescu, Bernhard Rumpe, pp. 108–123. Berlin Heidelberg: Springer, 2015
DOI: 10.1007/978-3-662-49224-6_10Project(s): PETRA Metrics:
See at:
ISTI Repository | doi.org | link.springer.com | CNR ExploRA
2015
Conference article
Open Access
A network-based approach to evaluate the performance of football teams
Cintia P., Pappalardo L., Rinzivillo S.The striking proliferation of sensing technologies that provide high-fidelity data streams extracted from every game, induced an amazing evolution of football statistics. Nowadays professional statistical analysis firms like ProZone and Opta provide data to football clubs, coaches and leagues, who are starting to analyze these data to monitor their players and improve team strategies. Standard approaches in evaluating and predicting team performance are based on history-related factors such as past victories or defeats, record in qualification games and margin of victory in past games. In contrast with traditional models, in this paper we propose a model based on the observation of players' behavior on the pitch. We model a the game of a team as a network and extract simple network measures, showing the value of our approach on predicting the outcomes of a long-running tournament such as Italian major league.Source: Workshop on Machine Learning and Data Mining for Sports Analytics, pp. 46–54, Porto, Portugal, 11/09/2015
See at:
ceur-ws.org | CNR ExploRA
2015
Conference article
Restricted
The harsh rule of the goals: Data-driven performance indicators for football teams
Cintia P., Pappalardo L., Pedreschi D., Giannotti F., Malvaldi M.Sports analytics in general, and football (soccer in USA) analytics in particular, have evolved in recent years in an amazing way, thanks to automated or semi-automated sensing technologies that provide high-fidelity data streams extracted from every game. In this paper we propose a data-driven approach and show that there is a large potential to boost the understanding of football team performance. From observational data of football games we extract a set of pass-based performance indicators and summarize them in the H indicator. We observe a strong correlation among the proposed indicator and the success of a team, and therefore perform a simulation on the four major European championships (78 teams, almost 1500 games). The outcome of each game in the championship was replaced by a synthetic outcome (win, loss or draw) based on the performance indicators computed for each team. We found that the final rankings in the simulated championships are very close to the actual rankings in the real championships, and show that teams with high ranking error show extreme values of a defense/attack efficiency measure, the Pezzali score. Our results are surprising given the simplicity of the proposed indicators, suggesting that a complex systems' view on football data has the potential of revealing hidden patterns and behavior of superior quality.Source: IEEE International Conference on Data Science and Advanced Analytics, Paris, France, 19-21/10/2015
DOI: 10.1109/dsaa.2015.7344823Project(s): CIMPLEX Metrics:
See at:
doi.org | ieeexplore.ieee.org | CNR ExploRA
2015
Master thesis
Open Access
Storia e struttura del data Journalism
Locci P.Gli obbiettivi di questa tesi sono di analizzare la nascita e lo sviluppo del data journalism a partire dalle inchieste giornalistiche che hanno determinato la sua evoluzione, analizzando il metodo di lavoro di tre premi Pulitzer, Philip Meyer, Bill Dedman e Stephen K. Doig. Esaminare quali sono i metodi di lavoro e gli strumenti più utilizzati dalle redazioni che sono più attente al data journalism, per arrivare alla creazione di un vero articolo di data journalism, "La crisi economica e il declino del calcio italiano", nel quale vengono messi in relazione i dati che riguardano la crisi economica e i dati che riguardano il declino del calcio Italiano, il quale dal 2010 non è stato all'altezza della propria tradizione calcistica. In questo periodo, in Italia, è stato registrato un vero e proprio crollo dal punto di vista dei risultati, da imputare ad un calo degli investimenti che non è stato riscontrato negli altri campionati europei, nei quali, a dispetto della crisi, gli investimenti sono aumentati.
See at:
etd.adm.unipi.it | ISTI Repository | CNR ExploRA
2014
Contribution to book
Restricted
Mobility profiling
Nanni M., Trasarti R., Cintia P., Furletti B., Gabrielli L., Rinzivillo S., Giannotti F.An abstract is not availableSource: Data Science and Simulation in Transportation Research, edited by Davy Janssens, Ansar-Ul-Haque Yasar, Luk Knapen, pp. 1–29. Hershey: IGI Global, 2014
DOI: 10.4018/978-1-4666-4920-0.ch001Metrics:
See at:
www.igi-global.com | www.igi-global.com | CNR ExploRA
2014
Conference article
Open Access
Mining efficient training patterns of non-professional cyclists (Discussion Paper)
Cintia P., Pappalardo L., Pedreschi D.The recent emergence of the so called online social fitness open up new scenarios for fascinating challenges in the field of data sci- ence. Through these platforms, users can collect, monitor and share with friends their sport performance, with interesting details about heartrate, watt consumption and calories burned. The availability of this data, col- lected among a large number of users, gives us the possibility to explore new data mining applications. In the current work, we present the results of a study conducted on a sample of 29; 284 cyclists downloaded via APIs from the social fitness platform Strava.com. We defined two basic metrics: A measure of train- ing effort, that is how much a cyclist struggled during the workout; and a measure of training performance indicating the results achieved during the training. Although the average effort is weakly correlated with the average performance, by deeply investigating workouts time evolution and cyclists' training characteristics interesting findings came out. We found that athletes that better improve their performance follow precise training patterns usually referred as overcompensation theory, with alter- nation of stress peaks and rest periods. Studies and experiments related to such theory, up to now, have always been conducted by sports doctors on a few dozen professionals athletes. To the best of our knowledge, our study is the first corroboration on large scale of this theory.Source: SEBD 2014 - 22nd Italian Symposium on Advanced Database Systems, pp. 1–8, Sorrento Coast, Italy, 16-18 June 2014
See at:
toc.proceedings.com | CNR ExploRA