Comparing the performance of Hebbian against backpropagation learning using convolutional neural networks Lagani G., Falchi F., Gennaro C., Amato G. In this paper, we investigate Hebbian learning strategies applied to Convolutional Neural Network (CNN) training. We consider two unsupervised learning approaches, Hebbian Winner-Takes-All (HWTA), and Hebbian Principal Component Analysis (HPCA). The Hebbian learning rules are used to train the layers of a CNN in order to extract features that are then used for classification, without requiring backpropagation (backprop). Experimental comparisons are made with state-of-the-art unsupervised (but backprop-based) Variational Auto-Encoder (VAE) training. For completeness,we consider two supervised Hebbian learning variants (Supervised Hebbian Classifiers--SHC, and Contrastive Hebbian Learning--CHL), for training the final classification layer, which are compared to Stochastic Gradient Descent training. We also investigate hybrid learning methodologies, where some network layers are trained following the Hebbian approach, and others are trained by backprop. We tested our approaches on MNIST, CIFAR10, and CIFAR100 datasets. Our results suggest that Hebbian learning is generally suitable for training early feature extraction layers, or to retrain higher network layers in fewer training epochs than backprop. Moreover, our experiments show that Hebbian learning outperforms VAE training, with HPCA performing generally better than HWTA.Source: Neural computing & applications (Print) (2022). doi:10.1007/s00521-021-06701-4 DOI: 10.1007/s00521-021-06701-4 Project(s): AI4EU , AI4Media Metrics:
AIMH Lab for Trustworthy AI Messina N., Carrara F., Coccomini D., Falchi F., Gennaro C., Amato G. In this short paper, we report the activities of the Artificial Intelligence for Media and Humanities (AIMH) laboratory of the ISTI-CNR related to Trustworthy AI. Artificial Intelligence is becoming more and more pervasive in our society, controlling recommendation systems in social platforms as well as safety-critical systems like autonomous vehicles. In order to be safe and trustworthy, these systems require to be easily interpretable and transparent. On the other hand, it is important to spot fake examples forged by malicious AI generative models to fool humans (through fake news or deep-fakes) or other AI systems (through adversarial examples). This is required to enforce an ethical use of these powerful new technologies. Driven by these concerns, this paper presents three crucial research directions contributing to the study and the development of techniques for reliable, resilient, and explainable deep learning methods. Namely, we report the laboratory activities on the detection of adversarial examples, the use of attentive models as a way towards explainable deep learning, and the detection of deepfakes in social platforms.Source: Ital-IA 2020 - Workshop su AI Responsabile ed Affidabile, Online conference, 10/02/2022
AIMH Lab for Cybersecurity Vairo C., Coccomini D. A., Falchi F., Gennaro C., Massoli F. V., Messina N., Amato G. In this short paper, we report the activities of the Artificial Intelligence for Media and Humanities (AIMH) laboratory of the ISTI-CNR related to Cy-bersecurity. We discuss about our active research fields, their applications and challenges. We focus on face recognition and detection of adversarial examples and deep fakes. We also present our activities on the detection of persuasion techniques combining image and text analysis.Source: Ital-IA 2022 - Workshop su AI per Cybersecurity, 10/02/2022
AIMH Lab for Healthcare and Wellbeing Di Benedetto M., Carrara F., Ciampi L., Falchi F., Gennaro C., Amato G. In this work we report the activities of the Artificial Intelligence for Media and Humanities (AIMH) laboratory of the ISTI-CNR related to Healthcare and Wellbeing. By exploiting the advances of recent machine learning methods and the compute power of desktop and mobile platforms, we will show how artificial intelligence tools can be used to improve healthcare systems in various parts of disease treatment. In particular we will see how deep neural networks can assist doctors from diagnosis (e.g., cell counting, pupil and brain analysis) to communication to patients with Augmented Reality .Source: Ital-IA 2022 - Workshop AI per la Medicina e la Salute, Online conference, 10/02/2022
AIMH Lab for the Industry Carrara F., Ciampi L., Di Benedetto M., Falchi F., Gennaro C., Massoli F. V., Amato G. In this short paper, we report the activities of the Artificial Intelligence for Media and Humanities (AIMH) laboratory of the ISTI-CNR related to Industry. The massive digitalization affecting all the stages of product design, production, and control calls for data-driven algorithms helping in the coordination of humans, machines, and digital resources in Industry 4.0. In this context, we developed AI-based Computer-Vision technologies of general interest in the emergent digital paradigm of the fourth industrial revolution, fo-cusing on anomaly detection and object counting for computer-assisted testing and quality control. Moreover, in the automotive sector, we explore the use of virtual worlds to develop AI systems in otherwise practically unfeasible scenarios, showing an application for accident avoidance in self-driving car AI agents.Source: Ital-IA 2022 - Workshop su AI per l'Industria, Online conference, 10/02/2022
AIMH Lab: Smart Cameras for Public Administration Ciampi L., Cafarelli D., Carrara F., Di Benedetto M., Falchi F., Gennaro C., Massoli F. V., Messina N., Amato G. In this short paper, we report the activities of the Artificial Intelligence for Media and Humanities (AIMH) laboratory of the ISTI-CNR related to Public Administration. In particular, we present some AI-based public services serving the citizens that help achieve common goals beneficial to the society, putting humans at the epicenter. Through the automatic analysis of images gathered from city cameras, we provide AI applications ranging from smart parking and smart mobility to human activity monitoring.Source: Ital-IA 2022 - Workshop su AI per la Pubblica Amministrazione, Online conference, 10/02/2022
Counting or localizing? Evaluating cell counting and detection in microscopy images Ciampi L., Carrara F., Amato G., Gennaro C. Image-based automatic cell counting is an essential yet challenging task, crucial for the diagnosing of many diseases. Current solutions rely on Convolutional Neural Networks and provide astonishing results. However, their performance is often measured only considering counting errors, which can lead to masked mistaken estimations; a low counting error can be obtained with a high but equal number of false positives and false negatives. Consequently, it is hard to determine which solution truly performs best. In this work, we investigate three general counting approaches that have been successfully adopted in the literature for counting several different categories of objects. Through an experimental evaluation over three public collections of microscopy images containing marked cells, we assess not only their counting performance compared to several state-of-the-art methods but also their ability to correctly localize the counted cells. We show that commonly adopted counting metrics do not always agree with the localization performance of the tested models, and thus we suggest integrating the proposed evaluation protocol when developing novel cell counting solutions.Source: VISIGRAPP 2022 - 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pp. 887–897, Online conference, 6-8/2/2022 DOI: 10.5220/0010923000003124 Project(s): AI4Media Metrics:
Training convolutional neural networks with competitive hebbian learning approaches Lagani G., Falchi F., Gennaro C., Amato G. We explore competitive Hebbian learning strategies to train feature detectors in Convolutional Neural Networks (CNNs), without supervision. We consider variants of the Winner-Takes-All (WTA) strategy explored in previous works, i.e. k-WTA, e-soft-WTA and p-soft-WTA, performing experiments on different object recognition datasets. Results suggest that the Hebbian approaches are effective to train early feature extraction layers, or to re-train higher layers of a pre-trained network, with soft competition generally performing better than other Hebbian approaches explored in this work. Our findings encourage a path of cooperation between neuroscience and computer science towards a deeper investigation of biologically inspired learning principles.Source: Machine Learning, Optimization, and Data Science, edited by Nicosia G., Ojha V., La Malfa E., La Malfa G., Jansen G., Pardalos P.M., Giuffrida G., Umeton R., pp. 25–40, 2022 DOI: 10.1007/978-3-030-95467-3_2 Project(s): AI4EU , AI4Media Metrics:
Evaluating hebbian learning in a semi-supervised setting Lagani G., Falchi F., Gennaro C., Amato G. We propose a semi-supervised learning strategy for deep Convolutional Neural Networks (CNNs) in which an unsupervised pre-training stage, performed using biologically inspired Hebbian learning algorithms, is followed by supervised end-to-end backprop fine-tuning. We explored two Hebbian learning rules for the unsupervised pre-training stage: soft-Winner-Takes-All (soft-WTA) and nonlinear Hebbian Principal Component Analysis (HPCA). Our approach was applied in sample efficiency scenarios, where the amount of available labeled training samples is very limited, and unsupervised pre-training is therefore beneficial. We performed experiments on CIFAR10, CIFAR100, and Tiny ImageNet datasets. Our results show that Hebbian outperforms Variational Auto-Encoder (VAE) pre-training in almost all the cases, with HPCA generally performing better than soft-WTA.Source: Machine Learning, Optimization, and Data Science, edited by Nicosia G.; Ojha V.; La Malfa E.; La Malfa G.; Jansen G.; Pardalos P.M.; Giuffrida G.; Umeton R., pp. 365–379, 2022 DOI: 10.1007/978-3-030-95470-3_28 Project(s): AI4EU , AI4Media Metrics:
An embedded toolset for human activity monitoring in critical environments Di Benedetto M., Carrara F., Ciampi L., Falchi F., Gennaro C., Amato G. In many working and recreational activities, there are scenarios where both individual and collective safety have to be constantly checked and properly signaled, as occurring in dangerous workplaces or during pandemic events like the recent COVID-19 disease. From wearing personal protective equipment to filling physical spaces with an adequate number of people, it is clear that a possibly automatic solution would help to check compliance with the established rules. Based on an off-the-shelf compact and low-cost hardware, we present a deployed real use-case embedded system capable of perceiving people's behavior and aggregations and supervising the appliance of a set of rules relying on a configurable plug-in framework. Working on indoor and outdoor environments, we show that our implementation of counting people aggregations, measuring their reciprocal physical distances, and checking the proper usage of protective equipment is an effective yet open framework for monitoring human activities in critical conditions.Source: Expert systems with applications 199 (2022). doi:10.1016/j.eswa.2022.117125 DOI: 10.1016/j.eswa.2022.117125 Project(s): AI4EU , AI4Media Metrics:
Deep Learning techniques for visual counting Ciampi L. In this thesis, I investigated and enhanced Deep Learning (DL)-based techniques for the visual counting task, which automatically estimates the number of objects, such as people or vehicles, present in images and videos. Specifically, I tackled the problem related to the lack of data needed for training current DL-based solutions by exploiting synthetic data gathered from video games, employing Domain Adaptation strategies between different data distributions, and taking advantage of the redundant information characterizing datasets labeled by multiple annotators.
Furthermore, I addressed the engineering challenges coming out of the adoption of DL-based techniques in environments with limited power resources, mainly due to the high computational budget the AI-based algorithms require.
Night and day instance segmented park (NDISPark) dataset: a collection of images taken by day and by night for vehicle detection, segmentation and counting in parking areas Ciampi L., Santiago C., Costeira J. P., Gennaro C., Amato G. NDIS Park is a collection of images of parking lots for vehicle detection, segmentation, and counting.
Each image is manually labeled with pixel-wise masks and bounding boxes localizing vehicle instances.
The dataset includes 259 images depicting several parking areas describing most of the problematic situations that we can find in a real scenario: seven different cameras capture the images under various weather conditions and viewing angles. Another challenging aspect is the presence of partial occlusion patterns in many scenes such as obstacles (trees, lampposts, other cars) and shadowed cars.
The main peculiarity is that images are taken during the day and the night, showing utterly different lighting conditions.Project(s): AI4EU , AI4Media
MOBDrone: a drone video dataset for Man OverBoard Rescue Cafarelli D., Ciampi L., Vadicamo L., Gennaro C., Berton A., Paterni M., Benvenuti C., Passera M., Falchi F. Modern Unmanned Aerial Vehicles (UAV) equipped with cameras can play an essential role in speeding up the identification and rescue of people who have fallen overboard, i.e., man overboard (MOB). To this end, Artificial Intelligence techniques can be leveraged for the automatic understanding of visual data acquired from drones. However, detecting people at sea in aerial imagery is challenging primarily due to the lack of specialized annotated datasets for training and testing detectors for this task. To fill this gap, we introduce and publicly release the MOBDrone benchmark, a collection of more than 125K drone-view images in a marine environment under several conditions, such as different altitudes, camera shooting angles, and illumination. We manually annotated more than 180K objects, of which about 113K man overboard, precisely localizing them with bounding boxes. Moreover, we conduct a thorough performance analysis of several state-of-the-art object detectors on the MOBDrone data, serving as baselines for further research.Source: ICIAP 2022 - 21st International Conference on Image Analysis and Processing, pp. 633–644, Lecce, Italia, 23-27/05/2022 DOI: 10.1007/978-3-031-06430-2_53 Metrics:
MOBDrone: a large-scale drone-view dataset for man overboard detection Cafarelli D., Ciampi L., Vadicamo L., Gennaro C., Berton A., Paterni M., Benvenuti C., Passera M., Falchi F. The Man OverBoard Drone (MOBDrone) dataset is a large-scale collection of aerial footage images. It contains 126,170 frames extracted from 66 video clips gathered from one UAV flying at an altitude of 10 to 60 meters above the mean sea level. Images are manually annotated with more than 180K bounding boxes localizing objects belonging to 5 categories --- person, boat, lifebuoy, surfboard, wood. More than 113K of these bounding boxes belong to the person category and localize people in the water simulating the need to be rescued.
Learning to count biological structures with raters' uncertainty Ciampi L., Carrara F., Totaro V., Mazziotti R., Lupori L., Santiago C., Amato G., Pizzorusso T., Gennaro C. Exploiting well-labeled training sets has led deep learning models to astonishing results for counting biological structures in microscopy images. However, dealing with weak multi-rater annotations, i.e., when multiple human raters disagree due to non-trivial patterns, remains a relatively unexplored problem. More reliable labels can be obtained by aggregating and averaging the decisions given by several raters to the same data. Still, the scale of the counting task and the limited budget for labeling prohibit this. As a result, making the most with small quantities of multi-rater data is crucial. To this end, we propose a two-stage counting strategy in a weakly labeled data scenario. First, we detect and count the biological structures; then, in the second step, we refine the predictions, increasing the correlation between the scores assigned to the samples and the raters' agreement on the annotations. We assess our methodology on a novel dataset comprising fluorescence microscopy images of mice brains containing extracellular matrix aggregates named perineuronal nets. We demonstrate that we significantly enhance counting performance, improving confidence calibration by taking advantage of the redundant information characterizing the small sets of available multi-rater data.Source: Medical image analysis (Print) 80 (2022). doi:10.1016/j.media.2022.102500 DOI: 10.1016/j.media.2022.102500 Project(s): AI4Media Metrics:
Combining EfficientNet and vision transformers for video deepfake detection Coccomini D. A., Messina N., Gennaro C., Falchi F. Deepfakes are the result of digital manipulation to forge realistic yet fake imagery. With the astonishing advances in deep generative models, fake images or videos are nowadays obtained using variational autoencoders (VAEs) or Generative Adversarial Networks (GANs). These technologies are becoming more accessible and accurate, resulting in fake videos that are very difficult to be detected. Traditionally, Convolutional Neural Networks (CNNs) have been used to perform video deepfake detection, with the best results obtained using methods based on EfficientNet B7. In this study, we focus on video deep fake detection on faces, given that most methods are becoming extremely accurate in the generation of realistic human faces. Specifically, we combine various types of Vision Transformers with a convolutional EfficientNet B0 used as a feature extractor, obtaining comparable results with some very recent methods that use Vision Transformers. Differently from the state-of-the-art approaches, we use neither distillation nor ensemble methods. Furthermore, we present a straightforward inference procedure based on a simple voting scheme for handling multiple faces in the same video shot. The best model achieved an AUC of 0.951 and an F1 score of 88.0%, very close to the state-of-the-art on the DeepFake Detection Challenge (DFDC). The code for reproducing our results is publicly available here: https://tinyurl.com/cnn-vit-dfd.Source: ICIAP 2022 - 21st International Conference on Image Analysis and Processing, pp. 219–229, Lecce, Italy, 23-27/05/2022 DOI: 10.1007/978-3-031-06433-3_19 Metrics:
Multi-camera vehicle counting using edge-AI Ciampi L., Gennaro C., Carrara F., Falchi F., Vairo C., Amato G. This paper presents a novel solution to automatically count vehicles in a parking lot using images captured by smart cameras. Unlike most of the literature on this task, which focuses on the analysis of single images, this paper proposes the use of multiple visual sources to monitor a wider parking area from different perspectives. The proposed multi-camera system is capable of automatically estimating the number of cars present in the entire parking lot directly on board the edge devices. It comprises an on-device deep learning-based detector that locates and counts the vehicles from the captured images and a decentralized geometric-based approach that can analyze the inter-camera shared areas and merge the data acquired by all the devices. We conducted the experimental evaluation on an extended version of the CNRPark-EXT dataset, a collection of images taken from the parking lot on the campus of the National Research Council (CNR) in Pisa, Italy. We show that our system is robust and takes advantage of the redundant information deriving from the different cameras, improving the overall performance without requiring any extra geometrical information of the monitored scene.Source: Expert systems with applications (2022). doi:10.1016/j.eswa.2022.117929 DOI: 10.1016/j.eswa.2022.117929 Project(s): AI4EU , AI4Media Metrics:
VISIONE at Video Browser Showdown 2022 Amato G., Bolettieri P., Carrara F., Falchi F., Gennaro C., Messina N., Vadicamo L., Vairo C. VISIONE is a content-based retrieval system that supports various search functionalities (text search, object/color-based search, semantic and visual similarity search, temporal search). It uses a full-text search engine as a search backend. In the latest version of our system, we modified the user interface, and we made some changes to the techniques used to analyze and search for videos.Source: MMM 2022 - 28th International Conference on Multimedia Modeling, pp. 543–548, Phu Quoc, Vietnam, 06-10/06/2022 DOI: 10.1007/978-3-030-98355-0_52 Project(s): AI4EU , AI4Media Metrics:
Recurrent vision transformer for solving visual reasoning problems Messina N., Amato G., Carrara F., Gennaro C., Falchi F. Although convolutional neural networks (CNNs) showed remarkable results in many vision tasks, they are still strained by simple yet challenging visual reasoning problems. Inspired by the recent success of the Transformer network in computer vision, in this paper, we introduce the Recurrent Vision Transformer (RViT) model. Thanks to the impact of recurrent connections and spatial attention in reasoning tasks, this network achieves competitive results on the same-different visual reasoning problems from the SVRT dataset. The weight-sharing both in spatial and depth dimensions regularizes the model, allowing it to learn using far fewer free parameters, using only 28k training samples. A comprehensive ablation study confirms the importance of a hybrid CNN + Transformer architecture and the role of the feedback connections, which iteratively refine the internal representation until a stable prediction is obtained. In the end, this study can lay the basis for a deeper understanding of the role of attention and recurrent connections for solving visual abstract reasoning tasks. The code for reproducing our results is publicly available here: https://tinyurl.com/recvitSource: ICIAP 2022 - 21st International Conference on Image Analysis and Processing, pp. 50–61, Lecce, Italy, 23-27/05/2022 DOI: 10.1007/978-3-031-06433-3_5 Project(s): AI4EU , AI4Media Metrics:
A spatio-temporal attentive network for video-based crowd counting Avvenuti M., Bongiovanni M., Ciampi L., Falchi F., Gennaro C., Messina N. Automatic people counting from images has recently drawn attention for urban monitoring in modern Smart Cities due to the ubiquity of surveillance camera networks. Current computer vision techniques rely on deep learning-based algorithms that estimate pedestrian densities in still, individual images. Only a bunch of works take advantage of temporal consistency in video sequences. In this work, we propose a spatio-temporal attentive neural network to estimate the number of pedestrians from surveillance videos. By taking advantage of the temporal correlation between consecutive frames, we lowered state-of-the-art count error by 5% and localization error by 7.5% on the widely-used FDST benchmark.Source: ISCC 2022 - 27th IEEE Symposium on Computers and Communications, Rhodes Island, Greece, 30/06/2022-03/07/2022 DOI: 10.1109/iscc55528.2022.9913019 Project(s): AI4Media Metrics: