210 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
more
Rights operator: and / or
2022 Journal article Open Access OPEN

Comparing the performance of Hebbian against backpropagation learning using convolutional neural networks
Lagani G., Falchi F., Gennaro C., Amato G.
In this paper, we investigate Hebbian learning strategies applied to Convolutional Neural Network (CNN) training. We consider two unsupervised learning approaches, Hebbian Winner-Takes-All (HWTA), and Hebbian Principal Component Analysis (HPCA). The Hebbian learning rules are used to train the layers of a CNN in order to extract features that are then used for classification, without requiring backpropagation (backprop). Experimental comparisons are made with state-of-the-art unsupervised (but backprop-based) Variational Auto-Encoder (VAE) training. For completeness,we consider two supervised Hebbian learning variants (Supervised Hebbian Classifiers--SHC, and Contrastive Hebbian Learning--CHL), for training the final classification layer, which are compared to Stochastic Gradient Descent training. We also investigate hybrid learning methodologies, where some network layers are trained following the Hebbian approach, and others are trained by backprop. We tested our approaches on MNIST, CIFAR10, and CIFAR100 datasets. Our results suggest that Hebbian learning is generally suitable for training early feature extraction layers, or to retrain higher network layers in fewer training epochs than backprop. Moreover, our experiments show that Hebbian learning outperforms VAE training, with HPCA performing generally better than HWTA.Source: Neural computing & applications (Print) (2022). doi:10.1007/s00521-021-06701-4
DOI: 10.1007/s00521-021-06701-4
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: ISTI Repository Open Access | link.springer.com Restricted | CNR ExploRA Restricted


2022 Conference article Open Access OPEN

AIMH Lab for Trustworthy AI
Messina N., Carrara F., Coccomini D., Falchi F., Gennaro C., Amato G.
In this short paper, we report the activities of the Artificial Intelligence for Media and Humanities (AIMH) laboratory of the ISTI-CNR related to Trustworthy AI. Artificial Intelligence is becoming more and more pervasive in our society, controlling recommendation systems in social platforms as well as safety-critical systems like autonomous vehicles. In order to be safe and trustworthy, these systems require to be easily interpretable and transparent. On the other hand, it is important to spot fake examples forged by malicious AI generative models to fool humans (through fake news or deep-fakes) or other AI systems (through adversarial examples). This is required to enforce an ethical use of these powerful new technologies. Driven by these concerns, this paper presents three crucial research directions contributing to the study and the development of techniques for reliable, resilient, and explainable deep learning methods. Namely, we report the laboratory activities on the detection of adversarial examples, the use of attentive models as a way towards explainable deep learning, and the detection of deepfakes in social platforms.Source: Ital-IA 2020 - Workshop su AI Responsabile ed Affidabile, Online conference, 10/02/2022

See at: ISTI Repository Open Access | CNR ExploRA Open Access | www.ital-ia2022.it Open Access


2022 Conference article Open Access OPEN

AIMH Lab for Cybersecurity
Vairo C., Coccomini D. A., Falchi F., Gennaro C., Massoli F. V., Messina N., Amato G.
In this short paper, we report the activities of the Artificial Intelligence for Media and Humanities (AIMH) laboratory of the ISTI-CNR related to Cy-bersecurity. We discuss about our active research fields, their applications and challenges. We focus on face recognition and detection of adversarial examples and deep fakes. We also present our activities on the detection of persuasion techniques combining image and text analysis.Source: Ital-IA 2022 - Workshop su AI per Cybersecurity, 10/02/2022

See at: ISTI Repository Open Access | CNR ExploRA Open Access | www.ital-ia2022.it Open Access


2022 Conference article Open Access OPEN

AIMH Lab for Healthcare and Wellbeing
Di Benedetto M., Carrara F., Ciampi L., Falchi F., Gennaro C., Amato G.
In this work we report the activities of the Artificial Intelligence for Media and Humanities (AIMH) laboratory of the ISTI-CNR related to Healthcare and Wellbeing. By exploiting the advances of recent machine learning methods and the compute power of desktop and mobile platforms, we will show how artificial intelligence tools can be used to improve healthcare systems in various parts of disease treatment. In particular we will see how deep neural networks can assist doctors from diagnosis (e.g., cell counting, pupil and brain analysis) to communication to patients with Augmented Reality .Source: Ital-IA 2022 - Workshop AI per la Medicina e la Salute, Online conference, 10/02/2022

See at: ISTI Repository Open Access | CNR ExploRA Open Access | www.ital-ia2022.it Open Access


2022 Conference article Open Access OPEN

AIMH Lab for the Industry
Carrara F., Ciampi L., Di Benedetto M., Falchi F., Gennaro C., Massoli F. V., Amato G.
In this short paper, we report the activities of the Artificial Intelligence for Media and Humanities (AIMH) laboratory of the ISTI-CNR related to Industry. The massive digitalization affecting all the stages of product design, production, and control calls for data-driven algorithms helping in the coordination of humans, machines, and digital resources in Industry 4.0. In this context, we developed AI-based Computer-Vision technologies of general interest in the emergent digital paradigm of the fourth industrial revolution, fo-cusing on anomaly detection and object counting for computer-assisted testing and quality control. Moreover, in the automotive sector, we explore the use of virtual worlds to develop AI systems in otherwise practically unfeasible scenarios, showing an application for accident avoidance in self-driving car AI agents.Source: Ital-IA 2022 - Workshop su AI per l'Industria, Online conference, 10/02/2022

See at: CNR ExploRA Open Access | www.ital-ia2022.it Open Access


2022 Conference article Open Access OPEN

AIMH Lab: Smart Cameras for Public Administration
Ciampi L., Cafarelli D., Carrara F., Di Benedetto M., Falchi F., Gennaro C., Massoli F. V., Messina N., Amato G.
In this short paper, we report the activities of the Artificial Intelligence for Media and Humanities (AIMH) laboratory of the ISTI-CNR related to Public Administration. In particular, we present some AI-based public services serving the citizens that help achieve common goals beneficial to the society, putting humans at the epicenter. Through the automatic analysis of images gathered from city cameras, we provide AI applications ranging from smart parking and smart mobility to human activity monitoring.Source: Ital-IA 2022 - Workshop su AI per la Pubblica Amministrazione, Online conference, 10/02/2022

See at: ISTI Repository Open Access | CNR ExploRA Open Access | www.ital-ia2022.it Open Access


2022 Contribution to book Open Access OPEN

Training convolutional neural networks with competitive hebbian learning approaches
Lagani G., Falchi F., Gennaro C., Amato G.
We explore competitive Hebbian learning strategies to train feature detectors in Convolutional Neural Networks (CNNs), without supervision. We consider variants of the Winner-Takes-All (WTA) strategy explored in previous works, i.e. k-WTA, e-soft-WTA and p-soft-WTA, performing experiments on different object recognition datasets. Results suggest that the Hebbian approaches are effective to train early feature extraction layers, or to re-train higher layers of a pre-trained network, with soft competition generally performing better than other Hebbian approaches explored in this work. Our findings encourage a path of cooperation between neuroscience and computer science towards a deeper investigation of biologically inspired learning principles.Source: Machine Learning, Optimization, and Data Science, edited by Nicosia G., Ojha V., La Malfa E., La Malfa G., Jansen G., Pardalos P.M., Giuffrida G., Umeton R., pp. 25–40, 2022
DOI: 10.1007/978-3-030-95467-3_2
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: ISTI Repository Open Access | ZENODO Open Access | link.springer.com Restricted | CNR ExploRA Restricted


2022 Contribution to book Open Access OPEN

Evaluating hebbian learning in a semi-supervised setting
Lagani G., Falchi F., Gennaro C., Amato G.
We propose a semi-supervised learning strategy for deep Convolutional Neural Networks (CNNs) in which an unsupervised pre-training stage, performed using biologically inspired Hebbian learning algorithms, is followed by supervised end-to-end backprop fine-tuning. We explored two Hebbian learning rules for the unsupervised pre-training stage: soft-Winner-Takes-All (soft-WTA) and nonlinear Hebbian Principal Component Analysis (HPCA). Our approach was applied in sample efficiency scenarios, where the amount of available labeled training samples is very limited, and unsupervised pre-training is therefore beneficial. We performed experiments on CIFAR10, CIFAR100, and Tiny ImageNet datasets. Our results show that Hebbian outperforms Variational Auto-Encoder (VAE) pre-training in almost all the cases, with HPCA generally performing better than soft-WTA.Source: Machine Learning, Optimization, and Data Science, edited by Nicosia G.; Ojha V.; La Malfa E.; La Malfa G.; Jansen G.; Pardalos P.M.; Giuffrida G.; Umeton R., pp. 365–379, 2022
DOI: 10.1007/978-3-030-95470-3_28
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: ISTI Repository Open Access | ZENODO Open Access | link.springer.com Restricted | CNR ExploRA Restricted


2022 Journal article Open Access OPEN

An embedded toolset for human activity monitoring in critical environments
Di Benedetto M., Carrara F., Ciampi L., Falchi F., Gennaro C., Amato G.
In many working and recreational activities, there are scenarios where both individual and collective safety have to be constantly checked and properly signaled, as occurring in dangerous workplaces or during pandemic events like the recent COVID-19 disease. From wearing personal protective equipment to filling physical spaces with an adequate number of people, it is clear that a possibly automatic solution would help to check compliance with the established rules. Based on an off-the-shelf compact and low-cost hardware, we present a deployed real use-case embedded system capable of perceiving people's behavior and aggregations and supervising the appliance of a set of rules relying on a configurable plug-in framework. Working on indoor and outdoor environments, we show that our implementation of counting people aggregations, measuring their reciprocal physical distances, and checking the proper usage of protective equipment is an effective yet open framework for monitoring human activities in critical conditions.Source: Expert systems with applications 199 (2022). doi:10.1016/j.eswa.2022.117125
DOI: 10.1016/j.eswa.2022.117125
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: ISTI Repository Open Access | CNR ExploRA Restricted


2022 Doctoral thesis Open Access OPEN

Relational Learning in computer vision
Messina N.
The increasing interest in social networks, smart cities, and Industry 4.0 is encouraging the development of techniques for processing, understanding, and organizing vast amounts of data. Recent important advances in Artificial Intelligence brought to life a subfield of Machine Learning called Deep Learning, which can automatically learn common patterns from raw data directly, without relying on manual feature selection. This framework overturned many computer science fields, like Computer Vision and Natural Language Processing, obtaining astonishing results. Nevertheless, many challenges are still open. Although deep neural networks obtained impressive results on many tasks, they cannot perform non-local processing by explicitly relating potentially interconnected visual or textual entities. This relational aspect is fundamental for capturing high-level semantic interconnections in multimedia data or understanding the relationships between spatially distant objects in an image. This thesis tackles the relational understanding problem in Deep Neural Networks, considering three different yet related tasks: Relational Content-based Image Retrieval (R-CBIR), Visual-Textual Retrieval, and the Same-Different tasks. We use state-of-the-art deep learning methods for relational learning, such as the Relation Networks and the Transformer Networks for relating the different entities in an image or in a text.

See at: etd.adm.unipi.it Open Access | ISTI Repository Open Access | CNR ExploRA Open Access


2022 Conference article Open Access OPEN

MOBDrone: a drone video dataset for Man OverBoard Rescue
Cafarelli D., Ciampi L., Vadicamo L., Gennaro C., Berton A., Paterni M., Benvenuti C., Passera M., Falchi F.
Modern Unmanned Aerial Vehicles (UAV) equipped with cameras can play an essential role in speeding up the identification and rescue of people who have fallen overboard, i.e., man overboard (MOB). To this end, Artificial Intelligence techniques can be leveraged for the automatic understanding of visual data acquired from drones. However, detecting people at sea in aerial imagery is challenging primarily due to the lack of specialized annotated datasets for training and testing detectors for this task. To fill this gap, we introduce and publicly release the MOBDrone benchmark, a collection of more than 125K drone-view images in a marine environment under several conditions, such as different altitudes, camera shooting angles, and illumination. We manually annotated more than 180K objects, of which about 113K man overboard, precisely localizing them with bounding boxes. Moreover, we conduct a thorough performance analysis of several state-of-the-art object detectors on the MOBDrone data, serving as baselines for further research.Source: ICIAP 2022 - 21st International Conference on Image Analysis and Processing, pp. 633–644, Lecce, Italia, 23-27/05/2022
DOI: 10.1007/978-3-031-06430-2_53

See at: ISTI Repository Open Access | link.springer.com Restricted | CNR ExploRA Restricted


2022 Dataset Open Access OPEN

MOBDrone: a large-scale drone-view dataset for man overboard detection
Cafarelli D., Ciampi L., Vadicamo L., Gennaro C., Berton A., Paterni M., Benvenuti C., Passera M., Falchi F.
The Man OverBoard Drone (MOBDrone) dataset is a large-scale collection of aerial footage images. It contains 126,170 frames extracted from 66 video clips gathered from one UAV flying at an altitude of 10 to 60 meters above the mean sea level. Images are manually annotated with more than 180K bounding boxes localizing objects belonging to 5 categories --- person, boat, lifebuoy, surfboard, wood. More than 113K of these bounding boxes belong to the person category and localize people in the water simulating the need to be rescued.

See at: ISTI Repository Open Access | CNR ExploRA | zenodo.org


2022 Journal article Open Access OPEN

Multi-camera vehicle counting using edge-AI
Ciampi L., Gennaro C., Carrara F., Falchi F., Vairo C., Amato G.
This paper presents a novel solution to automatically count vehicles in a parking lot using images captured by smart cameras. Unlike most of the literature on this task, which focuses on the analysis of single images, this paper proposes the use of multiple visual sources to monitor a wider parking area from different perspectives. The proposed multi-camera system is capable of automatically estimating the number of cars present in the entire parking lot directly on board the edge devices. It comprises an on-device deep learning-based detector that locates and counts the vehicles from the captured images and a decentralized geometric-based approach that can analyze the inter-camera shared areas and merge the data acquired by all the devices. We conducted the experimental evaluation on an extended version of the CNRPark-EXT dataset, a collection of images taken from the parking lot on the campus of the National Research Council (CNR) in Pisa, Italy. We show that our system is robust and takes advantage of the redundant information deriving from the different cameras, improving the overall performance without requiring any extra geometrical information of the monitored scene.Source: Expert systems with applications (2022). doi:10.1016/j.eswa.2022.117929
DOI: 10.1016/j.eswa.2022.117929
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: ISTI Repository Open Access | CNR ExploRA Restricted | www.sciencedirect.com Restricted


2022 Conference article Open Access OPEN

Recurrent vision transformer for solving visual reasoning problems
Messina N., Amato G., Carrara F., Gennaro C., Falchi F.
Although convolutional neural networks (CNNs) showed remarkable results in many vision tasks, they are still strained by simple yet challenging visual reasoning problems. Inspired by the recent success of the Transformer network in computer vision, in this paper, we introduce the Recurrent Vision Transformer (RViT) model. Thanks to the impact of recurrent connections and spatial attention in reasoning tasks, this network achieves competitive results on the same-different visual reasoning problems from the SVRT dataset. The weight-sharing both in spatial and depth dimensions regularizes the model, allowing it to learn using far fewer free parameters, using only 28k training samples. A comprehensive ablation study confirms the importance of a hybrid CNN + Transformer architecture and the role of the feedback connections, which iteratively refine the internal representation until a stable prediction is obtained. In the end, this study can lay the basis for a deeper understanding of the role of attention and recurrent connections for solving visual abstract reasoning tasks. The code for reproducing our results is publicly available here: https://tinyurl.com/recvitSource: ICIAP 2022 - 21st International Conference on Image Analysis and Processing, pp. 50–61, Lecce, Italy, 23-27/05/2022
DOI: 10.1007/978-3-031-06433-3_5
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: ISTI Repository Open Access | link.springer.com Restricted | CNR ExploRA Restricted


2021 Journal article Restricted

Re-ranking via local embeddings: A use case with permutation-based indexing and the nSimplex projection
Vadicamo L., Gennaro C., Falchi F., Chavez E., Connor R., Amato G.
Approximate Nearest Neighbor (ANN) search is a prevalent paradigm for searching intrinsically high dimensional objects in large-scale data sets. Recently, the permutation-based approach for ANN has attracted a lot of interest due to its versatility in being used in the more general class of metric spaces. In this approach, the entire database is ranked by a permutation distance to the query. Typically, permutations allow the efficient selection of a candidate set of results, but typically to achieve high recall or precision this set has to be reviewed using the original metric and data. This can lead to a sizeable percentage of the database being recalled, along with many expensive distance calculations. To reduce the number of metric computations and the number of database elements accessed, we propose here a re-ranking based on a local embedding using the nSimplex projection. The nSimplex projection produces Euclidean vectors from objects in metric spaces which possess the n-point property. The mapping is obtained from the distances to a set of reference objects, and the original metric can be lower bounded and upper bounded by the Euclidean distance of objects sharing the same set of references. Our approach is particularly advantageous for extensive databases or expensive metric function. We reuse the distances computed in the permutations in the first stage, and hence the memory footprint of the index is not increased. An extensive experimental evaluation of our approach is presented, demonstrating excellent results even on a set of hundreds of millions of objects.Source: Information systems (Oxf.) 95 (2021). doi:10.1016/j.is.2020.101506
DOI: 10.1016/j.is.2020.101506
Project(s): AI4EU via OpenAIRE

See at: Information Systems Restricted | Information Systems Restricted | Information Systems Restricted | Information Systems Restricted | Information Systems Restricted | CNR ExploRA Restricted | www.sciencedirect.com Restricted | Information Systems Restricted


2021 Journal article Open Access OPEN

Solving the same-different task with convolutional neural networks
Messina N., Amato G. Carrara F., Gennaro C., Falchi F.
Deep learning demonstrated major abilities in solving many kinds of different real-world problems in computer vision literature. However, they are still strained by simple reasoning tasks that humans consider easy to solve. In this work, we probe current state-of-the-art convolutional neural networks on a difficult set of tasks known as the same-different problems. All the problems require the same prerequisite to be solved correctly: understanding if two random shapes inside the same image are the same or not. With the experiments carried out in this work, we demonstrate that residual connections, and more generally the skip connections, seem to have only a marginal impact on the learning of the proposed problems. In particular, we experiment with DenseNets, and we examine the contribution of residual and recurrent connections in already tested architectures, ResNet-18, and CorNet-S respectively. Our experiments show that older feed-forward networks, AlexNet and VGG, are almost unable to learn the proposed problems, except in some specific scenarios. We show that recently introduced architectures can converge even in the cases where the important parts of their architecture are removed. We finally carry out some zero-shot generalization tests, and we discover that in these scenarios residual and recurrent connections can have a stronger impact on the overall test accuracy. On four difficult problems from the SVRT dataset, we can reach state-of-the-art results with respect to the previous approaches, obtaining super-human performances on three of the four problems.Source: Pattern recognition letters 143 (2021): 75–80. doi:10.1016/j.patrec.2020.12.019
DOI: 10.1016/j.patrec.2020.12.019
Project(s): AI4EU via OpenAIRE

See at: arXiv.org e-Print Archive Open Access | Pattern Recognition Letters Open Access | ISTI Repository Open Access | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | CNR ExploRA Restricted | Pattern Recognition Letters Restricted | www.sciencedirect.com Restricted


2021 Conference article Open Access OPEN

Defending Neural ODE Image Classifiers from Adversarial Attacks with Tolerance Randomization
Carrara F., Caldelli R., Falchi F., Amato G.
Deep learned models are now largely adopted in different fields, and they generally provide superior performances with respect to classical signal-based approaches. Notwithstanding this, their actual reliability when working in an unprotected environment is far enough to be proven. In this work, we consider a novel deep neural network architecture, named Neural Ordinary Differential Equations (N-ODE), that is getting particular attention due to an attractive property--a test-time tunable trade-off between accuracy and efficiency. This paper analyzes the robustness of N-ODE image classifiers when faced against a strong adversarial attack and how its effectiveness changes when varying such a tunable trade-off. We show that adversarial robustness is increased when the networks operate in different tolerance regimes during test time and training time. On this basis, we propose a novel adversarial detection strategy for N-ODE nets based on the randomization of the adaptive ODE solver tolerance. Our evaluation performed on standard image classification benchmarks shows that our detection technique provides high rejection of adversarial examples while maintaining most of the original samples under white-box attacks and zero-knowledge adversaries.Source: International Conference on Pattern Recognition ICPR 2021, pp. 425–438, Milano (Virtuale), 10-15/01/2021
DOI: 10.1007/978-3-030-68780-9_35
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: ISTI Repository Open Access | link.springer.com Restricted | link.springer.com Restricted | CNR ExploRA Restricted


2021 Journal article Open Access OPEN

TweepFake: about detecting deepfake tweets
Fagni T., Falchi F., Gambini M., Martella A., Tesconi M.
The recent advances in language modeling significantly improved the generative capabilities of deep neural models: In 2019 OpenAI released GPT-2, a pre-trained language model that can autonomously generate coherent, non-trivial and human-like text samples. Since then, ever more powerful text generative models have been developed. Adversaries can exploit these tremendous generative capabilities to enhance social bots that will have the ability to write plausible deepfake messages, hoping to contaminate public debate. To prevent this, it is crucial to develop deepfake social media messages detection systems. However, to the best of our knowledge no one has ever addressed the detection of machinegenerated texts on social networks like Twitter or Facebook. With the aim of helping the research in this detection field, we collected the first dataset of real deepfake tweets, Tweep- Fake. It is real in the sense that each deepfake tweet was actually posted on Twitter. We collected tweets from a total of 23 bots, imitating 17 human accounts. The bots are based on various generation techniques, i.e., Markov Chains, RNN, RNN+Markov, LSTM, GPT-2. We also randomly selected tweets from the humans imitated by the bots to have an overall balanced dataset of 25,572 tweets (half human and half bots generated). The dataset is publicly available on Kaggle. Lastly, we evaluated 13 deepfake text detection methods (based on various state-of-the-art approaches) to both demonstrate the challenges that Tweepfake poses and create a solid baseline of detection techniques. We hope that Tweep- Fake can offer the opportunity to tackle the deepfake detection on social media messages as well.Source: PloS one 16 (2021). doi:10.1371/journal.pone.0251415
DOI: 10.1371/journal.pone.0251415
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: journals.plos.org Open Access | ISTI Repository Open Access | ISTI Repository Open Access | CNR ExploRA Open Access | ZENODO Open Access


2021 Journal article Open Access OPEN

The VISIONE video search system: exploiting off-the-shelf text search engines for large-scale video retrieval
Amato G., Bolettieri P., Carrara F., Debole F., Falchi F., Gennaro C., Vadicamo L., Vairo C.
This paper describes in detail VISIONE, a video search system that allows users to search for videos using textual keywords, the occurrence of objects and their spatial relationships, the occurrence of colors and their spatial relationships, and image similarity. These modalities can be combined together to express complex queries and meet users' needs. The peculiarity of our approach is that we encode all information extracted from the keyframes, such as visual deep features, tags, color and object locations, using a convenient textual encoding that is indexed in a single text retrieval engine. This offers great flexibility when results corresponding to various parts of the query (visual, text and locations) need to be merged. In addition, we report an extensive analysis of the retrieval performance of the system, using the query logs generated during the Video Browser Showdown (VBS) 2019 competition. This allowed us to fine-tune the system by choosing the optimal parameters and strategies from those we tested.Source: JOURNAL OF IMAGING 7 (2021). doi:10.3390/jimaging7050076
DOI: 10.3390/jimaging7050076

See at: ISTI Repository Open Access | ISTI Repository Open Access | CNR ExploRA Open Access | www.mdpi.com Open Access


2021 Conference article Open Access OPEN

Transformer reasoning network for image-text matching and retrieval
Messina N., Falchi F., Esuli A., Amato G.
Image-text matching is an interesting and fascinating task in modern AI research. Despite the evolution of deep-learning-based image and text processing systems, multi-modal matching remains a challenging problem. In this work, we consider the problem of accurate image-text matching for the task of multi-modal large-scale information retrieval. State-of-the-art results in image-text matching are achieved by inter-playing image and text features from the two different processing pipelines, usually using mutual attention mechanisms. However, this invalidates any chance to extract separate visual and textual features needed for later indexing steps in large-scale retrieval systems. In this regard, we introduce the Transformer Encoder Reasoning Network (TERN), an architecture built upon one of the modern relationship-aware self-attentive architectures, the Transformer Encoder (TE). This architecture is able to separately reason on the two different modalities and to enforce a final common abstract concept space by sharing the weights of the deeper transformer layers. Thanks to this design, the implemented network is able to produce compact and very rich visual and textual features available for the successive indexing step. Experiments are conducted on the MS-COCO dataset, and we evaluate the results using a discounted cumulative gain metric with relevance computed exploiting caption similarities, in order to assess possibly non-exact but relevant search results. We demonstrate that on this metric we are able to achieve state-of-the-art results in the image retrieval task. Our code is freely available at https://github.com/mesnico/TERN.Source: ICPR 2021 - International Conference on Pattern Recognition, pp. 5222–5229, Online conference, 10-15/01/2021
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: link.springer.com Open Access | ISTI Repository Open Access | CNR ExploRA Open Access