2023
Conference article  Open Access

Score vs. winrate in score-based games: which reward for reinforcement learning?

Pasqualini L., Parton M., Morandin F., Amato G., Gini R., Metta C., Fantozzi M., Marchetti A.

AlphaZero-like algorithms  FOS: Computer and information sciences  Artificial Intelligence (cs.AI)  Reinforcement learning  Computer Science - Artificial Intelligence  Score-based games  I.2.6 

In the last years, DeepMind algorithm AlphaZero has become the state of the art to efficiently tackle perfect information two-player zero-sum games with a win/lose outcome. However, when the win/lose outcome is decided by a final score difference, AlphaZero may play score-suboptimal moves, because all winning final positions are equivalent from the win/lose outcome perspective. This can be an issue, for instance when used for teaching, or when trying to understand whether there is a better move. Moreover, there is the theoretical quest of the perfect game. A naive approach would be training a AlphaZero-like agent to predict score differences instead of win/lose outcomes. Since the game of Go is deterministic, this should as well produce outcome-optimal play. However, it is a folklore belief that "this does not work".In this paper we first provide empirical evidence to this belief. We then give a theoretical interpretation of this suboptimality in a general perfect information two-player zero-sum game where the complexity of a game like Go is replaced by randomness of the environment. We show that an outcome-optimal policy has a different preference for uncertainty when it is winning or losing. In particular, when in a losing state, an outcome-optimal agent chooses actions leading to a higher variance of the score. We then posit that when approximation is involved, a deterministic game behaves like a nondeterministic game, where the score variance is modeled by how uncertain the position is. We validate this hypothesis in a AlphaZero-like software with a human expert.

Source: 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 573–578, Nassau, Bahamas, 12-14/12/2022


Metrics



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:482368,
	title = {Score vs. winrate in score-based games: which reward for reinforcement learning?},
	author = {Pasqualini L. and Parton M. and Morandin F. and Amato G. and Gini R. and Metta C. and Fantozzi M. and Marchetti A.},
	doi = {10.1109/icmla55696.2022.00099 and 10.48550/arxiv.2201.13176},
	booktitle = {2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 573–578, Nassau, Bahamas, 12-14/12/2022},
	year = {2023}
}