303 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
more
Rights operator: and / or
2026 Conference article Open Access OPEN
ViSketch-GPT: collaborative multi-scale feature extraction for hand-drawn sketch retrieval
Federico Giulio, Carrara Fabio, Gennaro Claudio, Di Benedetto Marco
Understanding the nature of hand-drawn sketches is challenging due to the wide variation in their creation. Federico et al. [10] demonstrated that recognizing complex structural patterns enhances both sketch recognition and generation. Building on this foundation, we explore how the extracted features can also be leveraged for hand-drawn sketch retrieval. In this work, we extend ViSketch-GPT, a multi-scale context extraction model originally designed for classification and generation, to the task of retrieval. The model’s ability to capture intricate details at multiple scales allows it to learn highly discriminative representations, making it well-suited for retrieval applications. Through extensive experiments on the QuickDraw and TU-Berlin datasets, we show that ViSketch-GPT surpasses state-of-the-art methods in sketch retrieval, achieving substantial improvements across multiple evaluation metrics. Our results show that the extracted feature representations, originally designed for classification and generation, are also highly effective for retrieval tasks. This highlights ViSketch-GPT as a versatile and high-powerful framework for various applications in computer vision and sketch analysis.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 16134, pp. 3-13. Reykjavik, Iceland, 1–3 october 2025
DOI: 10.1007/978-3-032-06069-3_1
Project(s): Italian Strengthening of ESFRI RI RESILIENCE, SUN via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | link.springer.com Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2026 Journal article Open Access OPEN
Vi-SketchGPT: a novel multi-scale and context-aware representation for sketch generation and classification
Federico Giulio, Amato Giuseppe, Carrara Fabio, Gennaro Claudio, Di Benedetto Marco
Human sketches exhibit substantial variability across individuals in terms of line style, abstraction level and drawing conventions. Unlike realistic images, they provide limited contextual information and rely on highly simplified concept representations. Recognizing and generating sketches therefore requires efficient use of the available information, identification of the most informative local features, interpretation of their meaning within a minimal context, and understanding of the spatial relationships that define the overall structure. In this study, we introduce ViSketch-GPT, a representation and model that can extract these local features, contextualize them within the sketch and encode spatial relationships, thereby enabling a deeper understanding of the sketch structure. Guided by the intuition of the void as information, we leverage Signed Distance Functions (SDF) to reveal this potentially hidden information, organizing it via quadtree decomposition and processing it with a hierarchical Transformer to capture multi-scale dependencies. This structured representation allows the model to support both high-fidelity generation and accurate classification. Experiments on the QuickDraw and TU-Berlin datasets demonstrated that the model classifies sketches with high accuracy while generating outputs that preserve structural coherence, respect part relationships, and capture essential conceptual patterns despite the scarcity of information in the original sketches.Source: IEEE ACCESS
DOI: 10.1109/access.2026.3659732
Project(s): Italian Strengthening of ESFRI RI RESILIENCE, SUN via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | CNR IRIS Restricted


2026 Journal article Open Access OPEN
Decentralized edge learning: a comparative study of distillation strategies and dissimilarity measures
Molo Mbasa J., Vadicamo Lucia, Gennaro Claudio, Carlini Emanuele
Decentralized learning is emerging as a scalable and privacy-preserving alternative to centralized machine learning, particularly in distributed systems where data cannot be centrally shared among multiple nodes or clients. While Federated Learning is widely adopted in this context, Knowledge Distillation (KD) is emerging as a flexible and scalable alternative where model output is used to share knowledge among distributed clients. However, existing studies often overlook the efficiency and effectiveness of various knowledge transfer strategies in KD, especially in decentralized environments where data is non-IID. This study provides key insights by examining the impact of network topology and distillation strategies in KD-based decentralized learning approaches. Our evaluation spans several dissimilarity measures, including Cross-Entropy, Kullback-Leibler divergence, Triangular Divergence, Jensen-Shannon divergence, Structural Entropic Distance, and Multi-way SED, assessed under both pairwise and holistic distillation schemes. In the pairwise approach, distillation is performed by summing the client-wise dissimilarities between a client's output and each neighbor's prediction individually, while the holistic approach computes dissimilarity with respect to the average of the output predictions received from neighboring clients. We also analyze performance across client connectivity levels to explore the trade-off between convergence speed and model accuracy. The results indicate that the holistic distillation approach, which averages client predictions, outperforms the sum of pairwise distillation, especially when employing alternative measures like TD, SED, and JS. These measures offer improved performance over conventional metrics such as CE and KL divergence.Source: FUTURE GENERATION COMPUTER SYSTEMS, vol. 176
DOI: 10.1016/j.future.2025.108171
Project(s): National Centre for HPC, Big Data and Quantum Computing, Sustainable Mobility Center
Metrics:


See at: CNR IRIS Open Access | www.sciencedirect.com Open Access | Future Generation Computer Systems Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2025 Conference article Restricted
Towards identity-aware cross-modal retrieval: a dataset and a baseline
Messina N., Vadicamo L., Maltese L., Gennaro C.
Recent advancements in deep learning have significantly enhanced content-based retrieval methods, notably through models like CLIP that map images and texts into a shared embedding space. However, these methods often struggle with domain-specific entities and long-tail concepts absent from their training data, particularly in identifying specific individuals. In this paper, we explore the task of identity-aware cross-modal retrieval, which aims to retrieve images of persons in specific contexts based on natural language queries. This task is critical in various scenarios, such as for searching and browsing personalized video collections or large audio-visual archives maintained by national broadcasters. We introduce a novel dataset, COCO Person FaceSwap (COCO-PFS), derived from the widely used COCO dataset and enriched with deepfake-generated faces from VGGFace2. This dataset addresses the lack of large-scale datasets needed for training and evaluating models for this task. Our experiments assess the performance of different CLIP variations repurposed for this task, including our architecture, Identity-aware CLIP (Id-CLIP), which achieves competitive retrieval performance through targeted fine-tuning. Our contributions lay the groundwork for more robust cross-modal retrieval systems capable of recognizing long-tail identities and contextual nuances. Data and code are available at .Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 15572, pp. 437-452. Lucca, Italy, April 6–10, 2025
DOI: 10.1007/978-3-031-88708-6_28
Project(s): Future Artificial Intelligence Research, a MUltimedia platform for Content Enrichment and Search in audiovisual archives
Metrics:


See at: CNR IRIS Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2025 Other Open Access OPEN
ISTI-day 2025 Proceedings
Del Corso G., Pedrotti A., Federico G., Gennaro C., Carrara F., Amato G., Di Benedetto M., Gabrielli E., Belli D., Matrullo Z., Miori V., Tolomei G., Waheed T., Marchetti E., Calabrò A., Rossetti G., Stella M., Cazabet R., Abramski K., Cau E., Citraro S., Failla A., Mesina V., Morini V., Pansanella V., Colantonio S., Germanese D., Pascali M. A., Bianchi L., Messina N., Falchi F., Barsellotti L., Pacini G., Cassese M., Puccetti G., Esuli A., Volpi L., Moreo A., Sebastiani F., Sperduti G., Nguyen D., Broccia G., Ter Beek M. H., Ferrari A., Massink M., Belmonte G., Ciancia V., Papini O., Canapa G., Catricalà B., Manca M., Paternò F., Santoro C., Zedda E., Gallo S., Maenza S., Mattioli A., Simeoli L., Rucci D., Carlini E., Dazzi P., Kavalionak H., Mordacchini M., Rulli C., Muntean Cristina Ioana, Nardini F. M., Perego R., Rocchietti G., Lettich F., Renso C., Pugliese C., Casini G., Haldimann J., Meyer T., Assante M., Candela L., Dell'Amico A., Frosini L., Mangiacrapa F., Oliviero A., Pagano P., Panichi G., Peccerillo B., Procaccini M., Mannocci A., Manghi P., Lonetti F., Kang D., Di Giandomenico F., Jee E., Lazzini G., Conti F., Scopigno R., D'Acunto M., Moroni D., Cafiso M., Paradisi P., Callieri M., Pavoni G., Corsini M., De Falco A., Sala F., Saraceni Q., Gattiglia G.
ISTI-Day is an annual information and networking event organized by the Institute of Information Science and Technologies "A. Faedo" (ISTI) of the Italian National Research Council (CNR). This event features an opening talk of the Director of the Dept. DIITET (Emilio F. Campana) as well as an overview of the Institute's activities presented by the ISTI Director (Roberto Scopigno). Those institutional segments are complemented by dedicated presentations and round tables featuring former staff members, as well as internal and external collaborators. To foster a network of knowledge and collaboration among newcomers, the 2025 ISTI Day edition also includes a large poster session that provides a comprehensive overview of current research activities. Each of the 13 laboratories contributes 1–3 posters, highlighting the most innovative work and offering early-career researchers a platform for discussion. Thus these proceedings include the posters selected for ISTI-Day 2025, reflecting the diverse and innovative nature of the Institute's research.

See at: CNR IRIS Open Access | www.isti.cnr.it Open Access | CNR IRIS Restricted


2025 Contribution to book Open Access OPEN
Adversarial magnification to deceive deepfake detection through super resolution
Coccomini D. A., Caldelli R., Amato G., Falchi F., Gennaro C.
Deepfake technology is rapidly advancing, posing significant challenges to the detection of manipulated media content. Parallel to that, some adversarial attack techniques have been developed to fool the deepfake detectors and make deepfakes even more difficult to be detected. This paper explores the application of super resolution techniques as a possible adversarial attack in deepfake detection. Through our experiments, we demonstrate that minimal changes made by these methods in the visual appearance of images can have a profound impact on the performance of deepfake detection systems. We propose a novel attack using super resolution as a quick, black-box and effective method to camouflage fake images and/or generate false alarms on pristine images. Our results indicate that the usage of super resolution can significantly impair the accuracy of deepfake detectors, thereby highlighting the vulnerability of such systems to adversarial attacks. The code to reproduce our experiments is available at: https://github.com/davide-coccomini/Adversarial-Magnification-to-Deceive-Deepfake-Detection-through-Super-Resolution.Source: COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE, vol. 2134, pp. 491-501
DOI: 10.1007/978-3-031-74627-7_41
Project(s): AI4Media via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | link.springer.com Open Access | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2025 Conference article Restricted
Cross-modal distillation by additive importance measure in HITL autonomous driving
Bano S., Cassarà P., Gennaro C., Gotta A.
With the advent of Advanced Driver Assistance Systems (ADAS) and intelligent transport system applications, recognizing driver emotions has become essential for a decision support system (DSS) with humans in the loop (HITL). Multimodal approaches using visual cues, speech, physiological signals, and driving patterns improve emotion recognition but are challenging in resource-constrained environments where only a subset of modalities is available. This work addresses these challenges by combining multi-modal benefits with single-modality inference for emotion recognition using unlabeled external road condition data. Unlike traditional methods that average teachers' contribution, the proposed cross-modal distillation (CMD) weights teachers thanks to the Shapley additive global explanation (SAGE) aid, which improves the student model's accuracy and provides an interpretation of it. Experimental evaluations of the PPBEmo dataset show that XA-CMD improves emotion recognition accuracy with other baselines and provides deeper insights into decision-making.Source: IEEE VTS ... VEHICULAR TECHNOLOGY CONFERENCE, pp. 1-5. Oslo, Norway, 17 - 20 june 2025
DOI: 10.1109/vtc2025-spring65109.2025.11174460
Metrics:


See at: doi.org Restricted | CNR IRIS Restricted | ieeexplore.ieee.org Restricted | CNR IRIS Restricted


2025 Conference article Open Access OPEN
Exploring strengths and weaknesses of super-resolution attack in deepfake detection
Coccomini D. A., Caldelli R., Falchi F., Gennaro C., Amato G.
Image manipulation is rapidly evolving, allowing the creation of credible content that can be used to bend reality. Although the results of deepfake detectors are promising, deepfakes can be made even more complicated to detect through adversarial attacks. They aim to further manipulate the image to camouflage deepfakes’ artifacts or to insert signals making the image appear pristine. In this paper, we further explore the potential of super-resolution attacks based on different super-resolution techniques and with different scales that can impact the performance of deepfake detectors with more or less intensity. We also evaluated the impact of the attack on more diverse datasets discovering that the super-resolution process is effective in hiding the artifacts introduced by deepfake generation models but fails in hiding the traces contained in fully synthetic images. Finally, we propose some changes to the detectors’ training process to improve their robustness to this kind of attack.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 15643, pp. 351-362. Milan, Italy, 29/09-04/10/2024
DOI: 10.1007/978-3-031-92648-8_21
DOI: 10.48550/arxiv.2410.04205
Metrics:


See at: arXiv.org e-Print Archive Open Access | doi.org Restricted | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted | link.springer.com Restricted


2025 Journal article Open Access OPEN
Training-free sparse representations of dense vectors for scalable information retrieval
Carrara F., Vadicamo L., Amato G., Gennaro C.
In this paper, we propose and analyze Vec2Doc, a novel training-free method to transform dense vectors into sparse integer vectors, facilitating the use of inverted indexes for information retrieval (IR). The exponential growth of deep learning and artificial intelligence has revolutionized scientific problem-solving in areas such as computer vision, natural language processing, and automatic content generation. These advances have also significantly impacted IR, with a better understanding of natural language and multimodal content analysis leading to more accurate information retrieval. Despite these developments, modern IR relies primarily on the similarity evaluation of dense vectors from the latent spaces of deep neural networks. This dependence introduces substantial challenges in performing similarity searches on large collections containing billions of vectors. Traditional IR methods, which employ inverted indexes and vector space models, are adept at handling sparse vectors but do not work well with dense ones. Vec2Doc attempts to fill this gap by converting dense vectors into a format compatible with conventional inverted index techniques. Our preliminary experimental evaluations show that Vec2Doc is a promising solution to overcome the scalability problems inherent in vector-based IR, offering an alternative method for efficient and accurate large-scale information retrieval.Source: INFORMATION SYSTEMS, vol. 133 (issue 102567)
DOI: 10.1016/j.is.2025.102567
Project(s): Empowering Knowledge Extraction to Empower Learners, National Centre for HPC, Big Data and Quantum Computing, SUN via OpenAIRE, a MUltimedia platform for Content Enrichment and Search in audiovisual archives
Metrics:


See at: CNR IRIS Open Access | www.sciencedirect.com Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2025 Conference article Open Access OPEN
CA3D: Convolutional-Attentional 3D nets for efficient video activity recognition on the edge
Lagani G., Falchi F., Gennaro C., Amato G.
In this paper, we introduce a deep learning solution for video activity recognition that leverages an innovative combination of convolutional layers with a linear-complexity attention mechanism. Moreover, we introduce a novel quantization mechanism to further improve the efficiency of our model during both training and inference. Our model maintains a reduced computational cost, while preserving robust learning and generalization capabilities. Our approach addresses the issues related to the high computing requirements of current models, with the goal of achieving competitive accuracy on consumer and edge devices, enabling smart home and smart healthcare applications where efficiency and privacy issues are of concern. We experimentally validate our model on different established and publicly available video activity recognition benchmarks, improving accuracy over alternative models at a competitive computing cost.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 15633, pp. 235-251. Milan, Italy, 29/09/2024
DOI: 10.1007/978-3-031-91979-4_18
DOI: 10.48550/arxiv.2505.19928
Project(s): AI4Media via OpenAIRE, SUN via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | CNR IRIS Open Access | link.springer.com Open Access | doi.org Restricted | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Information dissimilarity measures in decentralized knowledge distillation: a comparative analysis
Molo M. B., Vadicamo L., Carlini E., Gennaro C., Connor R.
Knowledge distillation (KD) is a key technique for transferring knowledge from a large, complex “teacher” model to a smaller, more efficient “student” model. Although initially developed for model compression, it has found applications across various domains due to the benefits of its knowledge transfer mechanism. While Cross Entropy (CE) and Kullback-Leibler (KL) are commonly used in KD, this work investigates the applicability of loss functions based on underexplored information dissimilarity measures, such as Triangular Divergence (TD), Structural Entropic Distance (SED), and Jensen-Shannon Divergence (JS), for both independent and identically distributed (iid) and non-iid data distributions. The primary contributions of this study include an empirical evaluation of these dissimilarity measures within a decentralized learning context, i.e., where independent clients collaborate without a central server coordinating the learning process. Additionally, the paper assesses the performance of clients by comparing pairwise distillation averaging among clients to conventional peer-to-peer pairwise distillation. Results indicate that while dissimilarity measures perform comparably in iid settings, non-iid distributions favor SED and JS, which also demonstrated consistent performance across clients.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 15268, pp. 140-154. Providence, USA, 4-6/11/2024
DOI: 10.1007/978-3-031-75823-2_12
Project(s): National Centre for HPC, Big Data and Quantum Computing, SUN via OpenAIRE
Metrics:


See at: IRIS Cnr Open Access | IRIS Cnr Open Access | IRIS Cnr Open Access | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Teacher-student models for AI vision at the edge: a car parking case study
Molo M. J., Carlini E., Ciampi L., Gennaro C., Vadicamo L.
The surge of the Internet of Things has sparked a multitude of deep learning-based computer vision applications that extract relevant information from the deluge of data coming from Edge devices, such as smart cameras. Nevertheless, this promising approach introduces new obstacles, including the constraints posed by the limited computational resources on these devices and the challenges associated with the generalization capabilities of the AI-based models against novel scenarios never seen during the supervised training, a situation frequently encountered in this context. This work proposes an efficient approach for detecting vehicles in parking lot scenarios monitored by multiple smart cameras that train their underlying AI-based models by exploiting knowledge distillation. Specifically, we consider an architectural scheme comprising a powerful and large detector used as a teacher and several shallow models acting as students, more appropriate for computational-bounded devices and designed to run onboard the smart cameras. The teacher is pre-trained over general-context data and behaves like an oracle, transferring its knowledge to the smaller nodes; on the other hand, the students learn to localize cars in new specific scenarios without using further labeled data, relying solely on the distilled loss coming from the oracle. Preliminary results show that student models trained only with distillation loss increase their performances, sometimes even outperforming the results achieved by the same models supervised with the ground truth.DOI: 10.5220/0012376900003660
Project(s): AI4Media via OpenAIRE, National Centre for HPC, Big Data and Quantum Computing, Sustainable Mobility Center
Metrics:


See at: CNR IRIS Open Access | www.scitepress.org Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Will VISIONE remain competitive in lifelog image search?
Amato G., Bolettieri P., Carrara F., Falchi F., Gennaro C., Messina N., Vadicamo L., Vairo C.
VISIONE is a versatile video retrieval system supporting diverse search functionalities, including free-text, similarity, and temporal searches. Its recent success in securing first place in the 2024 Video Browser Showdown (VBS) highlights its effectiveness. Originally designed for analyzing, indexing, and searching diverse video content, VISIONE can also be adapted to images from lifelog cameras thanks to its reliance on frame-based representations and retrieval mechanisms. In this paper, we present an overview of VISIONE's core characteristics and the adjustments made to accommodate lifelog images. These adjustments primarily focus on enhancing result visualization within the GUI, such as grouping images by date or hour to align with lifelog dataset imagery. It's important to note that while the GUI has been updated, the core search engine and visual content analysis components remain unchanged from the version presented at VBS 2024. Specifically, metadata such as local time, GPS coordinates, and concepts associated with images are not indexed or utilized in the system. Instead, the system relies solely on the visual content of the images, with date and time information extracted from their filenames, which are utilized exclusively within the GUI for visualization purposes. Our objective is to evaluate the system's performance within the Lifelog Search Challenge, emphasizing reliance on visual content analysis without additional metadata.DOI: 10.1145/3643489.3661122
Project(s): AI4Media via OpenAIRE
Metrics:


See at: IRIS Cnr Open Access | IRIS Cnr Open Access | IRIS Cnr Open Access | doi.org Restricted | CNR IRIS Restricted


2024 Journal article Open Access OPEN
Detecting images generated by diffusers
Coccomini D. A., Esuli A., Falchi F., Gennaro C., Amato G.
In recent years, the field of artificial intelligence has witnessed a remarkable surge in the generation of synthetic images, driven by advancements in deep learning techniques. These synthetic images, often created through complex algorithms, closely mimic real photographs, blurring the lines between reality and artificiality. This proliferation of synthetic visuals presents a pressing challenge: how to accurately and reliably distinguish between genuine and generated images. This article, in particular, explores the task of detecting images generated by text-to-image diffusion models, highlighting the challenges and peculiarities of this field. To evaluate this, we consider images generated from captions in the MSCOCO and Wikimedia datasets using two state-of-the-art models: Stable Diffusion and GLIDE. Our experiments show that it is possible to detect the generated images using simple multi-layer perceptrons (MLPs), starting from features extracted by CLIP or RoBERTa, or using traditional convolutional neural networks (CNNs). These latter models achieve remarkable performances in particular when pretrained on large datasets. We also observe that models trained on images generated by Stable Diffusion can occasionally detect images generated by GLIDE, but only on the MSCOCO dataset. However, the reverse is not true. Lastly, we find that incorporating the associated textual information with the images in some cases can lead to a better generalization capability, especially if textual features are closely related to visual ones. We also discovered that the type of subject depicted in the image can significantly impact performance. This work provides insights into the feasibility of detecting generated images and has implications for security and privacy concerns in real-world applications.Source: PEERJ. COMPUTER SCIENCE., vol. 10
DOI: 10.7717/peerj-cs.2127
Project(s): AI4Media via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | peerj.com Open Access | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Visione 5.0: toward evaluation with novice users
Amato G., Bolettieri P., Carrara F., Falchi F., Gennaro C., Messina N., Vadicamo L., Vairo C.
VISIONE is a video search system that integrates multiple search functionalities, allowing users to search for video segments using textual and visual queries, complemented by temporal search capabilities. It exploits state-of-the-art Artificial Intelligence approaches for visual content analysis and highly efficient indexing techniques to ensure fast response and scalability. In the recently concluded Video Browser Showdown (VBS2024) - a well-established international competition in interactive video retrieval - VISIONE ranked first and scored as the best interactive video search system in four out of seven tasks carried out in the competition.This paper provides an overview of the VISIONE system, emphasizing the improvements made to the system in the last year to improve its usability for novice users. A demonstration video showcasing the system's capabilities across 2,300 hours of diverse video content is available online, as well as a simplified demo of VISIONE.DOI: 10.1109/cbmi62980.2024.10859203
Project(s): AI4Media via OpenAIRE, National Centre for HPC, Big Data and Quantum Computing, a MUltimedia platform for Content Enrichment and Search in audiovisual archives
Metrics:


See at: CNR IRIS Open Access | ieeexplore.ieee.org Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
The devil is in the fine-grained details: evaluating open-vocabulary object detectors for fine-grained understanding
Bianchi L., Carrara F., Messina N., Gennaro C., Falchi F.
Recent advancements in large vision-language models enabled visual object detection in open-vocabulary scenar-ios, where object classes are defined in free-text formats during inference. In this paper, we aim to probe the state-of-the-art methods for open-vocabulary object detection to determine to what extent they understand finegrained prop-erties of objects and their parts. To this end, we intro-duce an evaluation protocol based on dynamic vocabulary generation to test whether models detect, discern, and as-sign the correct fine-grained description to objects in the presence of hard-negative classes. We contribute with a benchmark suite of increasing difficulty and probing dif-ferent properties like color, pattern, and material. We fur-ther enhance our investigation by evaluating several state-of-the-art open-vocabulary object detectors using the proposed protocol and find that most existing solutions, which shine in standard open-vocabulary benchmarks, struggle to accurately capture and distinguish finer object details. We conclude the paper by highlighting the limitations of current methodologies and exploring promising research directions to overcome the discovered drawbacks. Data and code are available at https://lorebianchi98.github.io/FG-OVD/.Source: PROCEEDINGS IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, pp. 22520-22529. Seattle (USA), 17-21/06/2024
DOI: 10.1109/cvpr52733.2024.02125
DOI: 10.48550/arxiv.2311.17518
Project(s): SUN via OpenAIRE, a MUltimedia platform for Content Enrichment and Search in audiovisual archives
Metrics:


See at: arXiv.org e-Print Archive Open Access | IRIS Cnr Open Access | ieeexplore.ieee.org Open Access | doi.org Restricted | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2024 Other Open Access OPEN
AIMH Research Activities 2024
Aloia N., Amato G., Bartalesi Lenzi V., Bianchi L., Bolettieri P., Bosio C., Carraglia M., Carrara F., Casarosa V., Cassese M., Ciampi L., Coccomini D. A., Concordia C., Connor R., Corbara S., De Martino C., Di Benedetto M., Esuli A., Falchi F., Fazzari E., Gennaro C., Iannello L., Negi K., Lagani G., Lenzi E., Leocata M., Malvaldi M., Meghini C., Messina N., Moreo Fernandez A., Nardi A., Pacini G., Pedrotti A., Pratelli N., Puccetti G., Rabitti F., Savino P., Scotti F., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C., Versienti L., Volpi L.
The AIMH (Artificial Intelligence for Media and Humanities) laboratory is committed to advancing the field of Artificial Intelligence, with a special emphasis on its applications in digital media and the humanities. The lab aims to improve AI technologies, particularly in areas such as deep learning, text analysis, computer vision, multimedia information retrieval, content analysis, recognition, and retrieval. This report summarizes the laboratory’s achievements and activities over the course of 2024.DOI: 10.32079/isti-ar-2024/001
Metrics:


See at: CNR IRIS Open Access | CNR IRIS Restricted


2024 Journal article Open Access OPEN
In the wild video violence detection: an unsupervised domain adaptation approach
Ciampi L., Santiago C., Falchi F., Gennaro C., Amato G.
This work addresses the challenge of video violence detection in data-scarce scenarios, focusing on bridging the domain gap that often hinders the performance of deep learning models when applied to unseen domains. We present a novel unsupervised domain adaptation (UDA) scheme designed to effectively mitigate this gap by combining supervised learning in the train (source) domain with unlabeled test (target) data. We employ single-image classification and multiple instance learning (MIL) to select frames with the highest classification scores, and, upon this, we exploit UDA techniques to adapt the model to unlabeled target domains. We perform an extensive experimental evaluation, using general-context data as the source domain and target domain datasets collected in specific environments, such as violent/non-violent actions in hockey matches and public transport. The results demonstrate that our UDA pipeline substantially enhances model performances, improving their generalization capabilities in novel scenarios without requiring additional labeled data.Source: SN COMPUTER SCIENCE, vol. 5 (issue 7)
DOI: 10.1007/s42979-024-03126-3
Project(s): "FAIR - Future Artificial Intelligence Research" - Spoke 1 "Human-centered AI", AI4Media via OpenAIRE, SUN via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | link.springer.com Open Access | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Robustness and generalization of synthetic images detectors
Coccomini D. A., Caldelli R., Gennaro C., Fiameni G., Amato G., Falchi F.
In recent times, the increasing spread of synthetic media, known as deepfakes has been made possible by the rapid progress in artificial intelligence technologies, especially deep learning algorithms. Growing worries about the increasing availability and believability of deepfakes have spurred researchers to concentrate on developing methods to detect them. In this field researchers at ISTI CNR’s AIMH Lab, in collaboration with researchers from other organizations, have conducted research, investigations, and projects to contribute to combating this trend, exploring new solutions and threats. This article summarizes the most recent efforts made in this area by our researchers and in collaboration with other institutions and experts.

See at: ceur-ws.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2024 Journal article Open Access OPEN
Scalable bio-inspired training of Deep Neural Networks with FastHebb
Lagani G., Falchi F., Gennaro C., Fassold H., Amato G.
Recent work on sample efficient training of Deep Neural Networks (DNNs) proposed a semi-supervised methodology based on biologically inspired Hebbian learning, combined with traditional backprop-based training. Promising results were achieved on various computer vision benchmarks, in scenarios of scarce labeled data availability. However, current Hebbian learning solutions can hardly address large-scale scenarios due to their demanding computational cost. In order to tackle this limitation, in this contribution, we investigate a novel solution, named FastHebb (FH), based on the reformulation of Hebbian learning rules in terms of matrix multiplications, which can be executed more efficiently on GPU. Starting from Soft-Winner-Takes-All (SWTA) and Hebbian Principal Component Analysis (HPCA) learning rules, we formulate their improved FH versions: SWTA-FH and HPCA-FH. We experimentally show that the proposed approach accelerates training speed up to 70 times, allowing us to gracefully scale Hebbian learning experiments on large datasets and network architectures such as ImageNet and VGG.Source: NEUROCOMPUTING, vol. 595
DOI: 10.1016/j.neucom.2024.127867
Metrics:


See at: CNR IRIS Open Access | www.sciencedirect.com Open Access | CNR IRIS Restricted | CNR IRIS Restricted