Page 1 of 1

2024 Conference article Open Access

ViLMA: a zero-shot benchmark for linguistic and temporal grounding in video-language models
Kesen I., Pedrotti A., Dogan M., Cafagna M., Can Acikgoz E., Parcalabescu L., Calixto I., Frank A., Gatt A., Erdem A., Erdem E.
With the ever-increasing popularity of pretrained Video-Language Models (VidLMs), there is a pressing need to develop robust evaluation methodologies that delve deeper into their visio-linguistic capabilities. To address this challenge, we present ViLMA (Video Language Model Assessment), a task-agnostic benchmark that places the assessment of fine-grained capabilities of these models on a firm footing. Task-based evaluations, while valuable, fail to capture the complexities and specific temporal aspects of moving images that VidLMs need to process. Through carefully curated counterfactuals, ViLMA offers a controlled evaluation suite that sheds light on the true potential of these models, as well as their performance gaps compared to human-level understanding. ViLMA also includes proficiency tests, which assess basic capabilities deemed essential to solving the main counterfactual tests. We show that current VidLMs' grounding abilities are no better than those of vision-language models which use static images. This is especially striking once the performance on proficiency tests is factored in. Our benchmark serves as a catalyst for future research on VidLMs, helping to highlight areas that still need to be explored.Project(s): AI4Media via OpenAIRE

See at: CNR IRIS Open Access | openreview.net | CNR IRIS Restricted

2021 Conference article Open Access

Heterogeneous document embeddings for cross-lingual text classification
Moreo A, Pedrotti A, Sebastiani F
Funnelling (Fun) is a method for cross-lingual text classification (CLC) based on a two-tier ensemble for heterogeneous transfer learning. In Fun, 1st-tier classifiers, each working on a different, language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a meta-classifier that uses this vector as its input. The metaclassifier can thus exploit class-class correlations, and this (among other things) gives Fun an edge over CLC systems where these correlations cannot be leveraged. We here describe Generalized Funnelling (gFun), a learning ensemble where the metaclassifier receives as input the above vector of calibrated posterior probabilities, concatenated with document embeddings (aligned across languages) that embody other types of correlations, such as word-class correlations (as encoded by Word-Class Embeddings) and word-word correlations (as encoded by Multilingual Unsupervised or Supervised Embeddings). We show that gFun improves on Fun by describing experiments on two large, standard multilingual datasets for multi-label text classification.DOI: 10.1145/3412841.3442093
Project(s): AI4Media via OpenAIRE

, ARIADNEplus via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

Metrics:

2021 Conference article Open Access

See at: ceur-ws.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted

2022 Journal article Open Access

Generalized funnelling: ensemble learning and heterogeneous document embeddings for cross-lingual text classification
Moreo A, Pedrotti A, Sebastiani F
Funnelling (Fun) is a recently proposed method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a meta-classifier that uses this vector as its input. The meta-classifier can thus exploit class-class correlations, and this (among other things) gives Fun an edge over CLTC systems in which these correlations cannot be brought to bear. In this paper we describe Generalized Funnelling (gFun), a generalisation of Fun consisting of an HTL architecture in which 1st-tier components can be arbitrary view-generating functions, i.e., language-dependent functions that each produce a language-independent representation ("view") of the (monolingual) document. We describe an instance of gFun in which the meta-classifier receives as input a vector of calibrated posterior probabilities (as in Fun) aggregated to other embedded representations that embody other types of correlations, such as word-class correlations (as encoded by Word-Class Embeddings), word-word correlations (as encoded by Multilingual Unsupervised or Supervised Embeddings), and word-context correlations (as encoded by multilingual BERT ). We show that this instance of gFun substantially improves over Fun and over state-of-the-art baselines, by reporting experimental results obtained on two large, standard datasets for multilingual multilabel text classification. Our code that implements gFun is publicly available.Source: ACM TRANSACTIONS ON INFORMATION SYSTEMS, vol. 41 (issue 2)
DOI: 10.1145/3544104
Project(s): AI4Media via OpenAIRE

, ARIADNEplus via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: dl.acm.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted | CNR IRIS

2024 Conference article Open Access

Multimodal heterogeneous transfer learning for multilingual image-text classification
Pedrotti A., Moreo Fernandez A., Sebastiani F.
The Multilingual Image-Text Classification (MITC) task is a specific instance of the Image-Text Classification (ITC) task, where each item to be classified consists of a visual representation and a textual description written in one of several possible languages. In this paper we propose MM-gFun, an extension of the gFun learning architecture originally developed for cross-lingual text classification. We extend its original text-only implementation to handle perceptual modalities.Source: CEUR WORKSHOP PROCEEDINGS, vol. 3928. Pisa, Italy, 14-16/10/2024
Project(s): SoBigData via OpenAIRE

See at: ceur-ws.org Open Access | CNR IRIS | CNR IRIS Restricted

2020 Other Open Access

AIMH research activities 2020
Aloia N., Amato G., Bartalesi Lenzi V., Benedetti F., Bolettieri P., Carrara F., Casarosa V., Ciampi L., Concordia C., Corbara S., Esuli A., Falchi F., Gennaro C., Lagani G., Massoli F. V., Meghini C., Messina N., Metilli D., Molinari A., Moreo Fernandez A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Thanos C., Trupiano L., Vadicamo L., Vairo C.
Annual Report of the Artificial Intelligence for Media and Humanities laboratory (AIMH) research activities in 2020.DOI: 10.32079/isti-ar-2020/001
Metrics:

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted

2021 Other Open Access

AIMH research activities 2021
Aloia N., Amato G., Bartalesi Lenzi V., Benedetti F., Bolettieri P., Cafarelli D., Carrara F., Casarosa V., Coccomini D., Ciampi L., Concordia C., Corbara S., Di Benedetto M., Esuli A., Falchi F., Gennaro C., Lagani G., Massoli F. V., Meghini C., Messina N., Metilli D., Molinari A., Moreo Fernandez A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C.
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability. This report summarize the 2021 activities of the research group.DOI: 10.32079/isti-ar-2021/003
Metrics:

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted

2022 Other Open Access

AIMH research activities 2022
Aloia N., Amato G., Bartalesi Lenzi V., Benedetti F., Bolettieri P., Cafarelli D., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., Di Benedetto M., Esuli A., Falchi F., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Metilli D., Molinari A., Moreo Fernandez A. D., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C.
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability.This report summarize the 2022 activities of the research group.DOI: 10.32079/isti-ar-2022/002
Metrics:

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted

2023 Other Open Access

AIMH Research Activities 2023
Aloia N., Amato G., Bartalesi Lenzi V., Bianchi L., Bolettieri P., Bosio C., Carraglia M., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., De Martino C., Di Benedetto M., Esuli A., Falchi F., Fazzari E., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Molinari A., Moreo Fernandez A., Nardi A., Pedrotti A., Pratelli N., Puccetti G., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C., Versienti L.
The AIMH (Artificial Intelligence for Media and Humanities) laboratory is dedicated to exploring and pushing the boundaries in the field of Artificial Intelligence, with a particular focus on its application in digital media and humanities. This lab's objective is to enhance the current state of AI technology particularly on deep learning, text analysis, computer vision, multimedia information retrieval, multimedia content analysis, recognition, and retrieval. This report encapsulates the laboratory's progress and activities throughout the year 2023.DOI: 10.32079/isti-ar-2023/001
Metrics:

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted