Page 1 of 1

2021 Conference article Open Access

Garbled-word embeddings for jumbled text
Sperduti G, Moreo A, Sebastiani F
"Aoccdrnig to a reasrech at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny itmopnrat tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe". We investigate the extent to which this phenomenon applies to computers as well. Our hypothesis is that computers are able to learn distributed word representations that are resilient to character reshuffling, without incurring a significant loss in performance in tasks that use these representations. If our hypothesis is confirmed, this may form the basis for a new and more efficient way of encoding character-based representations of text in deep learning, and one that may prove especially robust to misspellings, or to corruption of text due to OCR. This paper discusses some fundamental psycho-linguistic aspects that lie at the basis of the phenomenon we investigate, and reports on a preliminary proof of concept of the above idea.Source: CEUR WORKSHOP PROCEEDINGS. Bari, Italy, 13-15/09/21

See at: ceur-ws.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted

2022 Conference article Open Access

A detailed overview of LeQua 2022: learning to quantify
Esuli A, Moreo A, Sebastiani F, Sperduti G
LeQua 2022 is a new lab for the evaluation of methods for "learning to quantify" in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest Y = {y1 , ..., yn } in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes, a growing body of literature has shown this approach to be suboptimal, and has proposed better methods. The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting; this is the first time that an evaluation exercise solely dedicated to quantification is organized. For both the binary setting and the single-label multiclass setting, data were provided to participants both in ready-made vector form and in raw document form. In this overview article we describe the structure of the lab, we report the results obtained by the participants in the four proposed tasks and subtasks, and we comment on the lessons that can be learned from these results.Source: CEUR WORKSHOP PROCEEDINGS, pp. 1849-1868. Bologna, Italy, 5-8/9/2022
Project(s): AI4Media via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

See at: ceur-ws.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted

2022 Conference article Open Access

A concise overview of LeQua@CLEF 2022: Learning to Quantify
Esuli A, Moreo A, Sebastiani F, Sperduti G
LeQua 2022 is a new lab for the evaluation of methods for "learning to quantify" in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest Y={y1,...,yn} in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes, a growing body of literature has shown this approach to be suboptimal, and has proposed better methods. The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting; this is the first time that an evaluation exercise solely dedicated to quantification is organized. For both the binary setting and the single-label multiclass setting, data were provided to participants both in ready-made vector form and in raw document form. In this overview article we describe the structure of the lab, we report the results obtained by the participants in the four proposed tasks and subtasks, and we comment on the lessons that can be learned from these results.DOI: 10.1007/978-3-031-13643-6_23
Project(s): AI4Media via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: CNR IRIS Open Access | link.springer.com | ISTI Repository | CNR IRIS Restricted | CNR IRIS

2024 Conference article Open Access

An overview of LeQua 2024, the 2nd international data challenge on Learning to Quantify
Esuli A., Moreo Fernandez A., Sebastiani F., Sperduti G.
LeQua 2024 is a data challenge about methods and systems for “learning to quantify” (a.k.a. “quantification”, or “class prior estimation”), i.e., for training predictors of the relative frequencies of classes Y = {y1, ..., yn} in sets of unlabelled datapoints. While these predictions could be easily achieved by first classifying all datapoints via a classifier and then counting how many datapoints have been assigned to each class, a growing body of literature has shown this approach to be suboptimal, especially when the training data and the test data are a!ected by some form of dataset shift, and has proposed better methods. The goal of this data challenge is to provide a setting for the comparative evaluation of methods for learning to quantify. LeQua 2024 is the 2nd edition of the LeQua challenge, following the successful 1st edition of 2022. In LeQua 2024, four tasks were o!ered. The first three tasks (T1, T2, T3) tackle learning to quantify under prior probability shift, while the fourth task (T4) tackles learning to quantify under covariate shift; T1 and T4 are about binary quantification, T2 is about single-label multiclass quantification, while T3 is about ordinal quantification. For all such tasks, data are provided to participants in ready-made vector form. In this overview article we describe in detail the structure of the data challenge and the results obtained by the participating teams.Project(s): Future Artificial Intelligence Research, Quantification in the Context of Dataset Shif, SoBigData-PlusPlus via OpenAIRE

, SoBigData.it

See at: CNR IRIS Open Access | lq-2024.github.io | CNR IRIS Restricted

2021 Other Open Access

AIMH research activities 2021
Aloia N., Amato G., Bartalesi Lenzi V., Benedetti F., Bolettieri P., Cafarelli D., Carrara F., Casarosa V., Coccomini D., Ciampi L., Concordia C., Corbara S., Di Benedetto M., Esuli A., Falchi F., Gennaro C., Lagani G., Massoli F. V., Meghini C., Messina N., Metilli D., Molinari A., Moreo Fernandez A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C.
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability. This report summarize the 2021 activities of the research group.DOI: 10.32079/isti-ar-2021/003
Metrics:

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted

2022 Other Open Access

AIMH research activities 2022
Aloia N., Amato G., Bartalesi Lenzi V., Benedetti F., Bolettieri P., Cafarelli D., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., Di Benedetto M., Esuli A., Falchi F., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Metilli D., Molinari A., Moreo Fernandez A. D., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C.
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability.This report summarize the 2022 activities of the research group.DOI: 10.32079/isti-ar-2022/002
Metrics:

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted

2023 Other Open Access

AIMH Research Activities 2023
Aloia N., Amato G., Bartalesi Lenzi V., Bianchi L., Bolettieri P., Bosio C., Carraglia M., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., De Martino C., Di Benedetto M., Esuli A., Falchi F., Fazzari E., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Molinari A., Moreo Fernandez A., Nardi A., Pedrotti A., Pratelli N., Puccetti G., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C., Versienti L.
The AIMH (Artificial Intelligence for Media and Humanities) laboratory is dedicated to exploring and pushing the boundaries in the field of Artificial Intelligence, with a particular focus on its application in digital media and humanities. This lab's objective is to enhance the current state of AI technology particularly on deep learning, text analysis, computer vision, multimedia information retrieval, multimedia content analysis, recognition, and retrieval. This report encapsulates the laboratory's progress and activities throughout the year 2023.DOI: 10.32079/isti-ar-2023/001
Metrics:

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted