2018
Journal article  Open Access

Quantifying the relation between performance and success in soccer

Pappalardo L., Cintia P.

Machine Learning (stat.ML)  Statistics - Machine Learning  Data science  Sports analytics  Predictive analytics  Complex systems  J.3  Control and Systems Engineering  FOS: Computer and information sciences  Sports science  Statistics - Applications  H.2.8  Applications (stat.AP) 

The availability of massive data about sports activities offers nowadays the opportunity to quantify the relation between performance and success. In this study, we analyze more than 6000 games and 10 million events in six European leagues and investigate this relation in soccer competitions. We discover that a team's position in a competition's final ranking is significantly related to its typical performance, as described by a set of technical features extracted from the soccer data. Moreover, we find that, while victory and defeats can be explained by the team's performance during a game, it is difficult to detect draws by using a machine learning approach. We then simulate the outcomes of an entire season of each league only relying on technical data and exploiting a machine learning model trained on data from past seasons. The simulation produces a team ranking which is similar to the actual ranking, suggesting that a complex systems' view on soccer has the potential of revealing hidden patterns regarding the relation between performance and success.

Source: Advances in Complex Systems 21 (2018). doi:10.1142/S021952591750014X

Publisher: World Scientific Publishing, Singapore, Singapore


1500 ) s t n ve1000 e # ( P 500
1. C. Anderson and D. Sally. The Numbers Game, chapter What do bookies know? Penguin, 2013.
2. C. Anderson and D. Sally. The Numbers Game. Penguin, 2013.
3. E. Ben-Naim, F. Vazquez, and S. Redner. Parity and predictability of competitions. Journal of Quantitative Analysis in Sports, 2(4), 2006.
4. P. Cintia, M. Coscia, and L. Pappalardo. The haka network: Evaluating rugby team performance with dynamic graph analysis. In 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016, San Francisco, CA, USA, August 18-21, 2016, pages 1095-1102, 2016.
5. P. Cintia, F. Giannotti, L. Pappalardo, D. Pedreschi, and M. Malvaldi. The harsh rule of the goals: Data-driven performance indicators for football teams. In 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Campus des Cordeliers, Paris, France, October 19-21, 2015, pages 1-10, 2015.
6. P. Cintia, L. Pappalardo, and D. Pedreschi. "engine matters": A first large scale data driven study on cyclists' performance. In 13th IEEE International Conference on Data Mining Workshops, ICDM Workshops, TX, USA, December 7-10, 2013, pages 147-153, 2013.
7. P. Cintia, L. Pappalardo, and D. Pedreschi. Mining efficient training patterns of non-professional cyclists. In 22nd Italian Symposium on Advanced Database Systems, SEBD 2014, Sorrento Coast, Italy, June 16-18, 2014., pages 1-8, 2014.
8. P. Cintia, S. Rinzivillo, and L. Pappalardo. A network-based approach to evaluate the performance of football teams. In Proceedings of the Machine Learning and Data Mining for Sports Analytics workshop (MLSA?15), ECML/PKDD 2015, 2015.
9. F. M. Clemente, M. S. Couceiro, F. M. L. Martins, and R. S. Mendes. Using network metrics in soccer: A macro-analysis. Journal of human kinetics, 45(1):123- 134, 2015.
10. J. Cohen, P. Cohen, S. G. West, and L. S. Aiken. Applied multiple regression/correlation analysis for the behavioral sciences. Routledge, 2013.
11. S. Dobson and J. Goddard. The Economics of Football. Cambridge University Press, 2011.
12. J. Friedman, T. Hastie, and R. Tibshirani. The elements of statistical learning, volume 1. Springer series in statistics Springer, Berlin, 2001.
13. J. Gudmundsson and M. Horton. Spatio-temporal analysis of team sports - A survey. CoRR, abs/1602.06994, 2016.
14. M. Horton, J. Gudmundsson, S. Chawla, and J. Estephan. Classification of passes in football matches using spatiotemporal data. arXiv preprint arXiv:1407.5093, 2014.
15. C.-H. Kang, J.-R. Hwang, and K.-J. Li. Trajectory analysis for soccer players. In Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on, pages 377-381, Dec 2006.
16. P. Lucey, A. Bialkowski, M. Monfort, P. Carr, and I. Matthews. Quality vs quantity: Improved shot prediction in soccer using strategic features from spatiotemporal data. MIT Sloan Sports Analytics Conference, 2014.
17. J. L. Peña and H. Touchette. A network theory analysis of football strategies. arXiv preprint arXiv:1206.6904, 2012.
18. A. Prinzie and D. Van den Poel. Random multiclass classification: Generalizing random forests to random mnl and random nb. In International Conference on Database and Expert Systems Applications, pages 349-358. Springer, 2007.
19. C. Reep and B. Benjamin. Skill and chance in association football. Journal of the Royal Statistical Society, 131:581-585, 1968.
20. T. Taki and J.-i. Hasegawa. Visualization of dominant region in team games and its application to teamwork analysis. In Computer Graphics International, 2000. Proceedings, pages 227-235. IEEE, 2000.
21. P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining, (First Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2005.
22. J. W. Tukey. Comparing individual means in the analysis of variance. Biometrics, 5(2):99-114, 1949.
23. www.football data.co.uk/. Football betting, scores & results service, 1999.

Metrics



Back to previous page
BibTeX entry
@article{oai:it.cnr:prodotti:385725,
	title = {Quantifying the relation between performance and success in soccer},
	author = {Pappalardo L. and Cintia P.},
	publisher = {World Scientific Publishing, Singapore, Singapore},
	doi = {10.1142/s021952591750014x and 10.48550/arxiv.1705.00885},
	journal = {Advances in Complex Systems},
	volume = {21},
	year = {2018}
}

SoBigData
SoBigData Research Infrastructure


OpenAIRE