Document - GSM-Identity: evaluating mathematical reasoning in LLMs via equivalence transformations

2026

Journal article Open Access

GSM-Identity: evaluating mathematical reasoning in LLMs via equivalence transformations

Negi Kajal, Puccetti Giovanni, Esuli Andrea

Large language models; Mathematical understanding; Reasoning; AI and human

We introduce GSM-Identity, a pipeline to modify existing mathematical reasoning benchmarks by adding extra complexity to the questions while preserving their fundamental meaning. By systematically transforming numerical values in the GSM8K dataset into mathematically equivalent but less obvious expressions, we create a benchmark to measure Large Language Models (LLMs) mathematical understanding. We evaluate LLMs ranging from 7 billions to 72 billions parameters using multiple prompting strategies, including standard, notice-based, and chain-of-thought approaches. We find that Math oriented models can retain most of their performance on GSM8K when evaluated on GSM-Identity, while general purpose models show significant performance degradation. A comparison with human evaluations reveals that models in the 7 billion parameters range perform similar to humans when exposed to the kind of modifications we study, while models with more than 70 billion parameters are more accurate than humans in answering the questions and they are also more resilient to modifications. Our findings highlight GSM-Identity as a valuable tool for distinguishing reasoning from memorization, offering insights into the abilities of LLMs to understand higher level mathematical concepts.

Source: MACHINE LEARNING, vol. 115 (issue 4)

Metrics

Back to previous page

Cite as

BibTeX entry

@article{oai:iris.cnr.it:20.500.14243/575001,
	title = {GSM-Identity: evaluating mathematical reasoning in LLMs via equivalence transformations},
	author = {Negi Kajal and Puccetti Giovanni and Esuli Andrea},
	doi = {10.1007/s10994-026-07029-7},
	year = {2026}
}

CNR authors and affiliations

CNR authors

Esuli, Andrea
0000-0002-5725-4322
Puccetti, Giovanni

Laboratories

Artificial Intelligence for Media and Humanities (2021-ongoing)

Download

CNR IRIS

Bibliographic record
Deposited version

DOI

10.1007/s10994-026-07029-7

Also available from

link.springer.com

Projects

Future Artificial Intelligence Research
Future Artificial Intelligence Research
Italian Strengthening of ESFRI RI RESILIENCE
Italian Strengthening of ESFRI RI RESILIENCE
Word EMBeddings: From Cognitive Linguistics to Language Engineering, and Back
Word EMBeddings: From Cognitive Linguistics to Language Engineering, and Back