2024
Conference article  Open Access

INVALSI - mathematical and language understanding in Italian: a CALAMITA challenge

Puccetti G., Cassese M., Esuli A.

Invalsi  Italian language models  Language understanding  Large Language Models  Mathematical understanding  Math  Benchmark 

While Italian is a high resource language, there are few Italian-native benchmarks to evaluate Language Models (LMs) generative abilities in this language. This work presents two new benchmarks: Invalsi MATE to evaluate models performance on mathematical understanding in Italian and Invalsi ITA to evaluate language understanding in Italian. These benchmarks are based on the Invalsi tests, which are administered to students of age between 6 and 18 within the Italian school system. These tests are prepared by expert pedagogists and have the explicit goal of testing average students' performance over time across Italy. Therefore, the questions are well written, appropriate for the age of the students, and are developed with the goal of assessing students' skills that are essential in the learning process, ensuring that the benchmark proposed here measures key knowledge for undergraduate students. Invalsi MATE is composed of 420 questions about mathematical understanding, these questions range from simple money counting problems to Cartesian geometry questions, e.g. determining if a point belongs to a given line. They are divided into 4 different types: scelta multipla (multiple choice), vero/falso (true/false), numero (number), completa frase (fill the gap). Invalsi ITA is composed of 1279 questions regarding language understanding, these questions involve both the ability to extract information and answer questions about a text passage as well as questions about grammatical knowledge. They are divided into 4 different types: scelta multipla (multiple choice), binaria (binary), domanda aperta (open question), altro (other). We evaluate 4 powerful language models both English-first and tuned for Italian to see that best accuracy on Invalsi MATE is 55% while best accuracy on Invalsi ITA is 80%.

Source: CEUR WORKSHOP PROCEEDINGS, vol. 3878. Pisa, Italy, 4-6/12/2024



Back to previous page
BibTeX entry
@inproceedings{oai:iris.cnr.it:20.500.14243/528783,
	title = {INVALSI - mathematical and language understanding in Italian: a CALAMITA challenge},
	author = {Puccetti G. and Cassese M. and Esuli A.},
	booktitle = {CEUR WORKSHOP PROCEEDINGS, vol. 3878. Pisa, Italy, 4-6/12/2024},
	year = {2024}
}