2013
Conference article
Restricted
"Engine matters": a first large scale data driven study on cyclists' performance
Cintia P, Pappalardo L, Pedreschi DThe recent emergence of the so called online social fitness constitutes a good proxy to study the patterns underlying success in sport. Through these platforms, users can collect, monitor and share with friends their sport performance, diet, and even burned calories, giving an unprecedented opportunity to answer very fascinating questions: What are the main factors that shape sport performance? What are the characteristics that distinguish successful sportsmen? Can we characterize the role of social influence on fitness behavior? In the current work, we present the results of a study conducted on a sample of 29, 284 cyclists downloaded via APIs from the social fitness platform Strava.com. We defined two basic metrics: a measure of training effort, that is how much a cyclist struggled during the workout; and a measure of training performance indicating the results achieved during the training. Analyzing the relationship between these two metrics, an interesting result immediately emerges: at a global level, there is no correlation between effort and performance. This means that, in general, the performance is not simply a function of training: two athletes with the same level of training have different performance. However, by deeply investigating workouts time evolution and cyclists' training characteristics, we found that athletes that better improve their performance follow precise training patterns usually referred as overcompensation theory, with alternation of stress peaks and rest periods. Studies and experiments related to such theory, up to now, have always been conducted by sports doctors on a few dozen professionals athletes. To the best of our knowledge, our study is the first corroboration on large scale of this theory, mainly confirming that "engine matters", but tuning is fundamental.DOI: 10.1109/icdmw.2013.41Metrics:
See at:
doi.org
| CNR IRIS
| CNR IRIS
| www.dataminingcasestudies.com
2013
Conference article
Restricted
Estimating time-dependent speed functions using a gravity model over road network
Cintia P, Trasarti R, Macedo J A, Almada L, Ferreira CThe availability of inexpensive tracking devices,such as GPS- enabled devices, gives the opportunity to collect large amounts of trajectory data from vehicles. In this context, we are interested in the problem of generating the traffic information in time-dependent networks using this kind of data. This problem is not trivial since several works in liter- ature use strong assumptions on the error distribution we want to drop, proposing a gravitational model method to compute road segment aver- age speed from trajectory data. Furthermore we show how to generate travel-time functions from the computed average speeds useful for time- dependent networks routing systems. Our approach allows creating an accurate picture of the traffic conditions in time and space. The method we present in this paper tackles all this aspect showing how its perfor- mance over a synthetic dataset and a real case.Project(s): SEEK 
See at:
CNR IRIS
| CNR IRIS
2015
Other
Open Access
An effective time-aware map matching process for low sampling GPS data
Cintia P, Nanni MIn the era of the proliferation of Geo-Spatial Data, induced by the diffusion of GPS devices, the map matching problem still represents an important and valuable challenge. The process of associating a segment of the underlying road network to a GPS point gives us the chance to enrich raw data with the semantic layer provided by the roadmap, with all contextual information associated to it, e.g. the presence of speed limits, attraction points, changes in elevation, etc. Most state-of-art solutions for this classical problem simply look for the shortest or fastest path connecting any pair of consecutive points in a trip. While in some contexts that is reasonable, in this work we argue that the shortest/fastest path assumption can be in general erroneous. Indeed, we show that such approaches can yield travel times that are significantly incoherent with the real ones, and propose a Time-Aware Map matching process that tries to improve the state-of-art by taking into account also such temporal aspect. Our algorithm results to be very efficient, effective on low- sampling data and to outperform existing solutions, as proved by experiments on large datasets of real GPS trajectories. Moreover, our algorithm is parameter-free and does not depend on specific characteristics of the GPS localization error and of the road network (e.g. density of roads, road network topology, etc.).
See at:
CNR IRIS
| ISTI Repository
| CNR IRIS
2016
Other
Restricted
ASAP - Telecommunication Data Analytics (TDA) specification and early prototype
Bertoldi R, Cintia P, Trasarti RThe main objective of this Work Package (WP) is the design and development of an analytics application on WIND Telecommunications customer data, targeted towards tourism and mobility scenarios. The envisaged use cases will be integrated into the ASAP framework and will be evaluated using several measurement methods. At the end of the project's second year (M24) the tasks involved are three: the end of the task T9.2, the task T9.3 and the beginning of task T9.4.Project(s): ASAP 
See at:
CNR IRIS
| CNR IRIS
2016
Conference article
Restricted
The Haka network: Evaluating rugby team performance with dynamic graph analysis
Cintia P, Pappalardo L, Coscia MReal world events are intrinsically dynamic and analytic techniques have to take into account this dynamism. This aspect is particularly important on complex network analysis when relations are channels for interaction events between actors. Sensing technologies open the possibility of doing so for sport networks, enabling the analysis of team performance in a standard environment and rules. Useful applications are directly related for improving playing quality, but can also shed light on all forms of team efforts that are relevant for work teams, large firms with coordination and collaboration issues and, as a consequence, economic development. In this paper, we consider dynamics over networks representing the interaction between rugby players during a match. We build a pass network and we introduce the concept of disruption network, building a multilayer structure. We perform both a global and a micro-level analysis on game sequences. When deploying our dynamic graph analysis framework on data from 18 rugby matches, we discover that structural features that make networks resilient to disruptions are a good predictor of a team's performance, both at the global and at the local level. Using our features, we are able to predict the outcome of the match with a precision comparable to state of the art bookmaking.DOI: 10.1109/asonam.2016.7752377Project(s): SoBigData
Metrics:
See at:
doi.org
| CNR IRIS
| ieeexplore.ieee.org
| CNR IRIS
2017
Conference article
Open Access
Who is going to get hurt? Predicting injuries in professional soccer
Rossi A, Pappalardo L, Cintia P, Fernandez J, Iaia Fm, Medina DInjury prevention has a fundamental role in professional soccer due to the high cost of recovery for players and the strong influence of injuries on a club's performance. In this paper we provide a predictive model to prevent injuries of soccer players using a multidimensional approach based on GPS measurements and machine learning. In an evolutive scenario, where a soccer club starts collecting the data for the first time and updates the predictive model as the season goes by, our approach can detect around half of the injuries, allowing the soccer club to save 70% of a season's economic costs related to injuries. The proposed approach can be a valuable support for coaches, helping the soccer club to reduce injury incidence, save money and increase team performance.Source: CEUR WORKSHOP PROCEEDINGS, pp. 21-30. Skopje, Macedonia, 18 September 2017
Project(s): SoBigData 
See at:
ceur-ws.org
| CNR IRIS
| ISTI Repository
| CNR IRIS
2018
Journal article
Open Access
Effective injury forecasting in soccer with GPS training data and machine learning
Rossi A, Pappalardo L, Cintia P, Iaia F M, Fernandez J, Medina DInjuries have a great impact on professional soccer, due to their large influence on team performance and the considerable costs of rehabilitation for players. Existing studies in the literature provide just a preliminary understanding of which factors mostly affect injury risk, while an evaluation of the potential of statistical models in forecasting injuries is still missing. In this paper, we propose a multi-dimensional approach to injury forecasting in professional soccer that is based on GPS measurements and machine learning. By using GPS tracking technology, we collect data describing the training workload of players in a professional soccer club during a season. We then construct an injury forecaster and show that it is both accurate and interpretable by providing a set of case studies of interest to soccer practitioners. Our approach opens a novel perspective on injury prevention, providing a set of simple and practical rules for evaluating and interpreting the complex relations between injury risk and training performance in professional soccer.Source: PLOS ONE, vol. 13 (issue 7), pp. 1-15
DOI: 10.1371/journal.pone.0201264DOI: 10.48550/arxiv.1705.08079Project(s): SoBigData
Metrics:
See at:
arXiv.org e-Print Archive
| PLoS ONE
| PLoS ONE
| PLoS ONE
| CNR IRIS
| PLoS ONE
| ISTI Repository
| doi.org
| CNR IRIS
2019
Software
Metadata Only Access
PlayeRank
Cintia P, Pappalardo LPlayeRank is a data-driven algorithm that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. Playerank is designed to work with soccer-logs, in which a match consists of a sequence of events encoded as a tuple: (id, type, position, timestamp), where id is the identifer of the player that originated/refers to this event, type is the event type (i.e., passes, shots, goals, tackles, etc.), position and timestamp denote the spatio-temporal coordinates of the event over the soccer field. PlayeRank assumes that soccer-logs are stored into a database, which is updated with new events after each soccer match.
An exhaustive description of PlayeRank framework is available in this paper:
Pappalardo, Luca, Cintia, Paolo, Ferragina, Paolo, Massucco, Emanuele, Pedreschi, Dino & Giannotti, Fosca (2019) PlayeRank: Data-driven Performance Evaluation and Player Ranking in Soccer via a Machine Learning Approach. ACM Transactions on Intelligent Systems and Technologies 10(5), DOI:https://doi.org/10.1145/3343172Project(s): SoBigData 
See at:
github.com
| CNR IRIS
2020
Other
Metadata Only Access
Predicting soccer game evolution through AI-based tracking data analysis
Quasso E., Pappalardo L., Cintia P.Nowadays, technology is increasingly used in soccer. An open challenge is how to use the massive data produced by technology to create a framework to simulate different match situations and help trainers understand the dynamics on the field better. This thesis aims to extrapolate logical patterns that describe how the ball moves on the field in different game situations. We use tracking and event data of several matches to extract players and ball positions on the field. Then, we build two machine learning approaches. The first approach involves the use of handmade features passed to a Random Forest classifier. The second approach is a Convolutional Neural Network that automatically highlights valuable features to make a prediction. We show that the Random Forest provides a better understanding of the rules governing the movement of the ball than the Convolutional Neural Network. This result emphasizes that conditional control statements based on the position of the object on the field alongside handmade features work better than an automated feature extraction method based on deep learning.Project(s): SoBigData 
See at:
etd.adm.unipi.it
| CNR IRIS
2019
Other
Metadata Only Access
Injury forecasting in soccer utilizing machine learning and multivariate time series
Guerrini L Laureando Relatori Paolo Ferragina, Luca Pappalardo, Paolo CintiaInjuries have a great impact on professional soccer due to their influence on team performance and considerable costs of rehabilitation for players. In this thesis, we use injury records and workload data describing the training sessions of players in a professional soccer club, spanning two entire seasons, to train and compare three classes of approaches to injury forecasting, i.e., predicting whether or not a player will get injured in next matches or training sessions. The first class of approaches is based on traditional techniques used in sports science and industry, such as the Acute Chronic Workload Ratio. The second class is based on machine learning tools such as decision tree and k-nearest neighbor classifier. The third class of approaches extends the second class by fully exploiting the temporal information present in the data through the usage of a multivariate time series representation of a player's workload history. We demonstrate that machine learning approaches significantly outperform traditional techniques still used in sports industry, moving accuracy prediction from 4% up to 50%, paving the way to a more accurate monitoring of the health status of soccer players.Project(s): SoBigData 
See at:
etd.adm.unipi.it
| CNR IRIS
2018
Other
Metadata Only Access
Capturing football-teams behavior with a stochastic model
Barbone M Laureando Relatori Paolo Ferragina, Luca Pappalardo, Paolo CintiaThis thesis aims to capture soccer teams behavior using a stochastic approach on a graph built on top of the Wyscout dataset, a market-leading company in data scouting for soccer. The main contributions of the thesis are twofold: first, it proposes a stochastic representation of a soccer game via a weighted graph properly derived from the Wyscout dataset. Secondly, it analyses every game through a stochastic model to detect the way teams move the ball together with the way they move onto the field and the performance that they achieve.Project(s): SoBigData 
See at:
etd.adm.unipi.it
| CNR IRIS
2013
Conference article
Restricted
Inferring human activities from GPS tracks
Furletti B., Cintia P., Renso C., Spinsanti L.The collection of huge amount of tracking data made possi- bile by the widespread use of GPS devices, enabled the anal- ysis of such data for several applications domains, ranging from traffic management to advertisement and social stud- ies. However, the raw positioning data, as it is detected by GPS devices, lacks of semantic information since these data do not natively provide any additional contextual in- formation like the places that people visited or the activities performed. Traditionally, this information is collected by hand filled questionnaire where a limited number of users are asked to annotate their tracks whith the activities they have done. With the purpose of getting large amount of semantically rich trajectories, we propose an algorithm for automatically annotating raw trajectories with the activi- ties performed by the users. To do this, we analyse the stops points trying to infer the Point Of Interest (POI) the user has visited. Based on the category of the POI and a probability law, we infer the activity performed. We exper- imented and evaluated the method in a real case study of car trajectories, manually annotated by users with their ac- tivities. We exploit the Gravity law and the nearby POIs for inferring the most probable activity performed by a user during a stop. Experimental results are encouraging and will drive our future works.Source: UrbComp'13 - 2nd ACM SIGKDD International Workshop on Urban Computing, pp. 5–8, Chicago, USA, 11-14 August 2013
DOI: 10.1145/2505821.2505830Project(s): DATA SIM
Metrics:
See at:
dl.acm.org
| doi.org
| CNR ExploRA
2014
Conference article
Open Access
Mining efficient training patterns of non-professional cyclists (Discussion Paper)
Cintia P, Pappalardo L, Pedreschi DThe recent emergence of the so called online social fitness open up new scenarios for fascinating challenges in the field of data sci- ence. Through these platforms, users can collect, monitor and share with friends their sport performance, with interesting details about heartrate, watt consumption and calories burned. The availability of this data, col- lected among a large number of users, gives us the possibility to explore new data mining applications. In the current work, we present the results of a study conducted on a sample of 29; 284 cyclists downloaded via APIs from the social fitness platform Strava.com. We defined two basic metrics: A measure of train- ing effort, that is how much a cyclist struggled during the workout; and a measure of training performance indicating the results achieved during the training. Although the average effort is weakly correlated with the average performance, by deeply investigating workouts time evolution and cyclists' training characteristics interesting findings came out. We found that athletes that better improve their performance follow precise training patterns usually referred as overcompensation theory, with alter- nation of stress peaks and rest periods. Studies and experiments related to such theory, up to now, have always been conducted by sports doctors on a few dozen professionals athletes. To the best of our knowledge, our study is the first corroboration on large scale of this theory.
See at:
CNR IRIS
| toc.proceedings.com
| CNR IRIS
2015
Conference article
Open Access
A network-based approach to evaluate the performance of football teams
Cintia P, Pappalardo L, Rinzivillo SThe striking proliferation of sensing technologies that provide high-fidelity data streams extracted from every game, induced an amazing evolution of football statistics. Nowadays professional statistical analysis firms like ProZone and Opta provide data to football clubs, coaches and leagues, who are starting to analyze these data to monitor their players and improve team strategies. Standard approaches in evaluating and predicting team performance are based on history-related factors such as past victories or defeats, record in qualification games and margin of victory in past games. In contrast with traditional models, in this paper we propose a model based on the observation of players' behavior on the pitch. We model a the game of a team as a network and extract simple network measures, showing the value of our approach on predicting the outcomes of a long-running tournament such as Italian major league.Source: CEUR WORKSHOP PROCEEDINGS, pp. 46-54. Porto, Portugal, 11/09/2015
See at:
ceur-ws.org
| CNR IRIS
| CNR IRIS
2015
Conference article
Restricted
The harsh rule of the goals: Data-driven performance indicators for football teams
Cintia P, Pappalardo L, Pedreschi D, Giannotti F, Malvaldi MSports analytics in general, and football (soccer in USA) analytics in particular, have evolved in recent years in an amazing way, thanks to automated or semi-automated sensing technologies that provide high-fidelity data streams extracted from every game. In this paper we propose a data-driven approach and show that there is a large potential to boost the understanding of football team performance. From observational data of football games we extract a set of pass-based performance indicators and summarize them in the H indicator. We observe a strong correlation among the proposed indicator and the success of a team, and therefore perform a simulation on the four major European championships (78 teams, almost 1500 games). The outcome of each game in the championship was replaced by a synthetic outcome (win, loss or draw) based on the performance indicators computed for each team. We found that the final rankings in the simulated championships are very close to the actual rankings in the real championships, and show that teams with high ranking error show extreme values of a defense/attack efficiency measure, the Pezzali score. Our results are surprising given the simplicity of the proposed indicators, suggesting that a complex systems' view on football data has the potential of revealing hidden patterns and behavior of superior quality.DOI: 10.1109/dsaa.2015.7344823Project(s): CIMPLEX
Metrics:
See at:
doi.org
| CNR IRIS
| ieeexplore.ieee.org
| CNR IRIS
2017
Journal article
Open Access
Discovering and understanding city events with big data: the case of Rome
Furletti B, Trasarti R, Cintia P, Gabrielli LThe increasing availability of large amounts of data and digital footprints has given rise to ambitious research challenges in many fields, which spans from medical research, financial and commercial world, to people and environmental monitoring. Whereas traditional data sources and census fail in capturing actual and up-to-date behaviors, Big Data integrate the missing knowledge providing useful and hidden information to analysts and decision makers. With this paper, we focus on the identification of city events by analyzing mobile phone data (Call Detail Record), and we study and evaluate the impact of these events over the typical city dynamics. We present an analytical process able to discover, understand and characterize city events from Call Detail Record, designing a distributed computation to implement Sociometer, that is a profiling tool to categorize phone users. The methodology provides an useful tool for city mobility manager to manage the events and taking future decisions on specific classes of users, i.e., residents, commuters and tourists.Source: INFORMATION, vol. 8 (issue 3)
DOI: 10.3390/info8030074Metrics:
See at:
Information
| CNR IRIS
| ISTI Repository
| www.mdpi.com
| Information
| CNR IRIS
2020
Other
Metadata Only Access
A Computer Vision Approach for Pass Detection on Soccer Broadcast Video
Sorano D., Pappalardo L., Cintia P., Carrara F.The annotation of the events that occur during a soccer match is a primary issue for companies that produce data for analytical purposes. Nowadays, the annotation is mostly manual, i.e., humans operators use proprietary software to annotate the events. This thesis aims to automate part of the annotation process with a computer vision approach that can recognize one of the most frequent events in soccer: the passes. To achieve this purpose, we combine soccer broadcast videos and events data. Broadcast videos are the input of the models, while the events data define the labels of the videos. We propose a model that is a combination of the pre-trained model ResNet18, applied to extract features from single frames and a Bidirectional LSTM model that analyzes the temporal evolution of the extracted features. Moreover, we use real-time object detection method YOLO to extract the positional information of the ball and the players inside each frame. This information is concatenated to the feature extracted from the ResNet18 model and used as input of bidirectional LSTM. Our results show a significant improvement in the accuracy of pass detection with respect to baseline classifiers applied to the same task, highlighting that our approach is a first step towards the automation of events annotation in soccer.Project(s): SoBigData 
See at:
etd.adm.unipi.it
| CNR IRIS