13 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
Rights operator: and / or
2025 Conference article Open Access OPEN
Stress-testing machine generated text detection: shifting language models writing style to fool detectors
Pedrotti A., Papucci M., Ciaccio C., Miaschi A., Puccetti G., Dell'Orletta F., Esuli A.
Recent advancements in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, raising concerns about the potential for malicious use, such as misinformation and manipulation. Moreover, detecting Machine-Generated Text (MGT) remains challenging due to the lack of robust benchmarks that assess generalization to real-world scenarios. In this work, we evaluate the resilience of state-of-the-art MGT detectors (e.g., Mage, Radar, LLM-DetectAIve) to linguistically informed adversarial attacks. We develop a pipeline that fine-tunes language models using Direct Preference Optimization (DPO) to shift the MGT style toward human-written text (HWT), obtaining generations more challenging to detect by current models. Additionally, we analyze the linguistic shifts induced by the alignment and how detectors rely on “linguistic shortcuts” to detect texts. Our results show that detectors can be easily fooled with relatively few examples, resulting in a significant drop in detecting performances. This highlights the importance of improving detection methods and making them robust to unseen in-domain texts. We release code, models, and data to support future research on more robust MGT detection benchmarks.DOI: 10.18653/v1/2025.findings-acl.156
Project(s): SoBigData via OpenAIRE
Metrics:


See at: aclanthology.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2025 Other Open Access OPEN
ISTI-day 2025 Proceedings
Del Corso G., Pedrotti A., Federico G., Gennaro C., Carrara F., Amato G., Di Benedetto M., Gabrielli E., Belli D., Matrullo Z., Miori V., Tolomei G., Waheed T., Marchetti E., Calabrò A., Rossetti G., Stella M., Cazabet R., Abramski K., Cau E., Citraro S., Failla A., Mesina V., Morini V., Pansanella V., Colantonio S., Germanese D., Pascali M. A., Bianchi L., Messina N., Falchi F., Barsellotti L., Pacini G., Cassese M., Puccetti G., Esuli A., Volpi L., Moreo A., Sebastiani F., Sperduti G., Nguyen D., Broccia G., Ter Beek M. H., Ferrari A., Massink M., Belmonte G., Ciancia V., Papini O., Canapa G., Catricalà B., Manca M., Paternò F., Santoro C., Zedda E., Gallo S., Maenza S., Mattioli A., Simeoli L., Rucci D., Carlini E., Dazzi P., Kavalionak H., Mordacchini M., Rulli C., Muntean Cristina Ioana, Nardini F. M., Perego R., Rocchietti G., Lettich F., Renso C., Pugliese C., Casini G., Haldimann J., Meyer T., Assante M., Candela L., Dell'Amico A., Frosini L., Mangiacrapa F., Oliviero A., Pagano P., Panichi G., Peccerillo B., Procaccini M., Mannocci A., Manghi P., Lonetti F., Kang D., Di Giandomenico F., Jee E., Lazzini G., Conti F., Scopigno R., D'Acunto M., Moroni D., Cafiso M., Paradisi P., Callieri M., Pavoni G., Corsini M., De Falco A., Sala F., Saraceni Q., Gattiglia G.
ISTI-Day is an annual information and networking event organized by the Institute of Information Science and Technologies "A. Faedo" (ISTI) of the Italian National Research Council (CNR). This event features an opening talk of the Director of the Dept. DIITET (Emilio F. Campana) as well as an overview of the Institute's activities presented by the ISTI Director (Roberto Scopigno). Those institutional segments are complemented by dedicated presentations and round tables featuring former staff members, as well as internal and external collaborators. To foster a network of knowledge and collaboration among newcomers, the 2025 ISTI Day edition also includes a large poster session that provides a comprehensive overview of current research activities. Each of the 13 laboratories contributes 1–3 posters, highlighting the most innovative work and offering early-career researchers a platform for discussion. Thus these proceedings include the posters selected for ISTI-Day 2025, reflecting the diverse and innovative nature of the Institute's research.

See at: CNR IRIS Open Access | www.isti.cnr.it Open Access | CNR IRIS Restricted


2025 Conference article Open Access OPEN
How humans and LLMs organize conceptual knowledge: exploring subordinate categories in Italian
Pedrotti A., Rambelli G., Villani C., Bolognesi M.
People can categorize the same entity at multiple taxonomic levels, such as basic (bear), superordinate (animal), and subordinate (grizzly bear). While prior research has focused on basic-level categories, this study is the first attempt to examine the organization of categories by analyzing exemplars produced at the subordinate level. We present a new Italian psycholinguistic dataset of human-generated exemplars for 187 concrete words. We then leverage these data to evaluate whether textual and vision LLMs produce meaningful exemplars that align with human category organization across three key tasks: exemplar generation, category induction, and typicality judgment. Our findings show a low alignment between humans and LLMs, consistent with previous studies. However, their performance varies notably across different semantic domains. Ultimately, this study highlights both the promises and the constraints of using AI-generated exemplars to support psychological and linguistic research.DOI: 10.18653/v1/2025.acl-long.224
Project(s): SoBigData via OpenAIRE
Metrics:


See at: aclanthology.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Multimodal heterogeneous transfer learning for multilingual image-text classification
Pedrotti A., Moreo Fernandez A., Sebastiani F.
The Multilingual Image-Text Classification (MITC) task is a specific instance of the Image-Text Classification (ITC) task, where each item to be classified consists of a visual representation and a textual description written in one of several possible languages. In this paper we propose MM-gFun, an extension of the gFun learning architecture originally developed for cross-lingual text classification. We extend its original text-only implementation to handle perceptual modalities.Source: CEUR WORKSHOP PROCEEDINGS, vol. 3928. Pisa, Italy, 14-16/10/2024
Project(s): SoBigData via OpenAIRE

See at: ceur-ws.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2024 Other Open Access OPEN
AIMH Research Activities 2024
Aloia N., Amato G., Bartalesi Lenzi V., Bianchi L., Bolettieri P., Bosio C., Carraglia M., Carrara F., Casarosa V., Cassese M., Ciampi L., Coccomini D. A., Concordia C., Connor R., Corbara S., De Martino C., Di Benedetto M., Esuli A., Falchi F., Fazzari E., Gennaro C., Iannello L., Negi K., Lagani G., Lenzi E., Leocata M., Malvaldi M., Meghini C., Messina N., Moreo Fernandez A., Nardi A., Pacini G., Pedrotti A., Pratelli N., Puccetti G., Rabitti F., Savino P., Scotti F., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C., Versienti L., Volpi L.
The AIMH (Artificial Intelligence for Media and Humanities) laboratory is committed to advancing the field of Artificial Intelligence, with a special emphasis on its applications in digital media and the humanities. The lab aims to improve AI technologies, particularly in areas such as deep learning, text analysis, computer vision, multimedia information retrieval, content analysis, recognition, and retrieval. This report summarizes the laboratory’s achievements and activities over the course of 2024.DOI: 10.32079/isti-ar-2024/001
Metrics:


See at: CNR IRIS Open Access | CNR IRIS Restricted


2024 Conference article Open Access OPEN
ViLMA: a zero-shot benchmark for linguistic and temporal grounding in video-language models
Kesen I., Pedrotti A., Dogan M., Cafagna M., Can Acikgoz E., Parcalabescu L., Calixto I., Frank A., Gatt A., Erdem A., Erdem E.
With the ever-increasing popularity of pretrained Video-Language Models (VidLMs), there is a pressing need to develop robust evaluation methodologies that delve deeper into their visio-linguistic capabilities. To address this challenge, we present ViLMA (Video Language Model Assessment), a task-agnostic benchmark that places the assessment of fine-grained capabilities of these models on a firm footing. Task-based evaluations, while valuable, fail to capture the complexities and specific temporal aspects of moving images that VidLMs need to process. Through carefully curated counterfactuals, ViLMA offers a controlled evaluation suite that sheds light on the true potential of these models, as well as their performance gaps compared to human-level understanding. ViLMA also includes proficiency tests, which assess basic capabilities deemed essential to solving the main counterfactual tests. We show that current VidLMs' grounding abilities are no better than those of vision-language models which use static images. This is especially striking once the performance on proficiency tests is factored in. Our benchmark serves as a catalyst for future research on VidLMs, helping to highlight areas that still need to be explored.Project(s): AI4Media via OpenAIRE

See at: CNR IRIS Open Access | openreview.net Open Access | CNR IRIS Restricted


2023 Other Open Access OPEN
AIMH Research Activities 2023
Aloia N., Amato G., Bartalesi Lenzi V., Bianchi L., Bolettieri P., Bosio C., Carraglia M., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., De Martino C., Di Benedetto M., Esuli A., Falchi F., Fazzari E., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Molinari A., Moreo Fernandez A., Nardi A., Pedrotti A., Pratelli N., Puccetti G., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C., Versienti L.
The AIMH (Artificial Intelligence for Media and Humanities) laboratory is dedicated to exploring and pushing the boundaries in the field of Artificial Intelligence, with a particular focus on its application in digital media and humanities. This lab's objective is to enhance the current state of AI technology particularly on deep learning, text analysis, computer vision, multimedia information retrieval, multimedia content analysis, recognition, and retrieval. This report encapsulates the laboratory's progress and activities throughout the year 2023.DOI: 10.32079/isti-ar-2023/001
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2022 Journal article Open Access OPEN
Generalized funnelling: ensemble learning and heterogeneous document embeddings for cross-lingual text classification
Moreo A, Pedrotti A, Sebastiani F
Funnelling (Fun) is a recently proposed method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a meta-classifier that uses this vector as its input. The meta-classifier can thus exploit class-class correlations, and this (among other things) gives Fun an edge over CLTC systems in which these correlations cannot be brought to bear. In this paper we describe Generalized Funnelling (gFun), a generalisation of Fun consisting of an HTL architecture in which 1st-tier components can be arbitrary view-generating functions, i.e., language-dependent functions that each produce a language-independent representation ("view") of the (monolingual) document. We describe an instance of gFun in which the meta-classifier receives as input a vector of calibrated posterior probabilities (as in Fun) aggregated to other embedded representations that embody other types of correlations, such as word-class correlations (as encoded by Word-Class Embeddings), word-word correlations (as encoded by Multilingual Unsupervised or Supervised Embeddings), and word-context correlations (as encoded by multilingual BERT ). We show that this instance of gFun substantially improves over Fun and over state-of-the-art baselines, by reporting experimental results obtained on two large, standard datasets for multilingual multilabel text classification. Our code that implements gFun is publicly available.Source: ACM TRANSACTIONS ON INFORMATION SYSTEMS, vol. 41 (issue 2)
DOI: 10.1145/3544104
Project(s): AI4Media via OpenAIRE, ARIADNEplus via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2022 Other Open Access OPEN
AIMH research activities 2022
Aloia N., Amato G., Bartalesi Lenzi V., Benedetti F., Bolettieri P., Cafarelli D., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., Di Benedetto M., Esuli A., Falchi F., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Metilli D., Molinari A., Moreo Fernandez A. D., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C.
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability.This report summarize the 2022 activities of the research group.DOI: 10.32079/isti-ar-2022/002
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2021 Conference article Open Access OPEN
Heterogeneous document embeddings for cross-lingual text classification
Moreo A, Pedrotti A, Sebastiani F
Funnelling (Fun) is a method for cross-lingual text classification (CLC) based on a two-tier ensemble for heterogeneous transfer learning. In Fun, 1st-tier classifiers, each working on a different, language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a meta-classifier that uses this vector as its input. The metaclassifier can thus exploit class-class correlations, and this (among other things) gives Fun an edge over CLC systems where these correlations cannot be leveraged. We here describe Generalized Funnelling (gFun), a learning ensemble where the metaclassifier receives as input the above vector of calibrated posterior probabilities, concatenated with document embeddings (aligned across languages) that embody other types of correlations, such as word-class correlations (as encoded by Word-Class Embeddings) and word-word correlations (as encoded by Multilingual Unsupervised or Supervised Embeddings). We show that gFun improves on Fun by describing experiments on two large, standard multilingual datasets for multi-label text classification.DOI: 10.1145/3412841.3442093
Project(s): AI4Media via OpenAIRE, ARIADNEplus via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | ISTI Repository Open Access | ZENODO Open Access | dl.acm.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2021 Conference article Open Access OPEN
Generalized funnelling: ensemble learning and heterogeneous document embeddings for cross-lingual text classification
Moreo A., Pedrotti A., Sebastiani F.
Funnelling (Fun) is a method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a metaclassifier that uses this vector as its input. In this paper we describe Generalized Funnelling (gFun), a generalization of Fun consisting of a HTL architecture in which 1st-tier components can be arbitrary view-generating functions, i.e., language-dependent functions that each produce a language-independent representation ("view") of the document. We describe an instance of gFun in which the metaclassifier receives as input a vector of calibrated posterior probabilities (as in Fun) aggregated to other embedded representations that embody other types of correlations. We describe preliminary results that we have obtained on a large standard dataset for multilingual multilabel text classification.Source: IIR 2021 - 11th Italian Information Retrieval Workshop, Bari, Italy, 13-15/09/21

See at: ceur-ws.org Open Access | ISTI Repository Open Access | CNR ExploRA


2021 Other Open Access OPEN
AIMH research activities 2021
Aloia N., Amato G., Bartalesi V., Benedetti F., Bolettieri P., Cafarelli D., Carrara F., Casarosa V., Coccomini D., Ciampi L., Concordia C., Corbara S., Di Benedetto M., Esuli A., Falchi F., Gennaro C., Lagani G., Massoli F. V., Meghini C., Messina N., Metilli D., Molinari A., Moreo A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C.
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability. This report summarize the 2021 activities of the research group.Source: ISTI Annual Report, ISTI-2021-AR/003, pp.1–34, 2021
DOI: 10.32079/isti-ar-2021/003
Metrics:


See at: ISTI Repository Open Access | CNR ExploRA


2020 Other Open Access OPEN
AIMH research activities 2020
Aloia N., Amato G., Bartalesi Lenzi V., Benedetti F., Bolettieri P., Carrara F., Casarosa V., Ciampi L., Concordia C., Corbara S., Esuli A., Falchi F., Gennaro C., Lagani G., Massoli F. V., Meghini C., Messina N., Metilli D., Molinari A., Moreo Fernandez A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Thanos C., Trupiano L., Vadicamo L., Vairo C.
Annual Report of the Artificial Intelligence for Media and Humanities laboratory (AIMH) research activities in 2020.DOI: 10.32079/isti-ar-2020/001
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted