57 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
Rights operator: and / or
2021 Journal article Open Access OPEN

Solving the same-different task with convolutional neural networks
Messina N., Amato G. Carrara F., Gennaro C., Falchi F.
Deep learning demonstrated major abilities in solving many kinds of different real-world problems in computer vision literature. However, they are still strained by simple reasoning tasks that humans consider easy to solve. In this work, we probe current state-of-the-art convolutional neural networks on a difficult set of tasks known as the same-different problems. All the problems require the same prerequisite to be solved correctly: understanding if two random shapes inside the same image are the same or not. With the experiments carried out in this work, we demonstrate that residual connections, and more generally the skip connections, seem to have only a marginal impact on the learning of the proposed problems. In particular, we experiment with DenseNets, and we examine the contribution of residual and recurrent connections in already tested architectures, ResNet-18, and CorNet-S respectively. Our experiments show that older feed-forward networks, AlexNet and VGG, are almost unable to learn the proposed problems, except in some specific scenarios. We show that recently introduced architectures can converge even in the cases where the important parts of their architecture are removed. We finally carry out some zero-shot generalization tests, and we discover that in these scenarios residual and recurrent connections can have a stronger impact on the overall test accuracy. On four difficult problems from the SVRT dataset, we can reach state-of-the-art results with respect to the previous approaches, obtaining super-human performances on three of the four problems.Source: Pattern recognition letters 143 (2021): 75–80. doi:10.1016/j.patrec.2020.12.019
DOI: 10.1016/j.patrec.2020.12.019
Project(s): AI4EU via OpenAIRE

See at: arXiv.org e-Print Archive Open Access | Pattern Recognition Letters Open Access | ISTI Repository Open Access | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | CNR ExploRA Restricted | Pattern Recognition Letters Restricted | www.sciencedirect.com Restricted


2021 Conference article Open Access OPEN

Defending Neural ODE Image Classifiers from Adversarial Attacks with Tolerance Randomization
Carrara F., Caldelli R., Falchi F., Amato G.
Deep learned models are now largely adopted in different fields, and they generally provide superior performances with respect to classical signal-based approaches. Notwithstanding this, their actual reliability when working in an unprotected environment is far enough to be proven. In this work, we consider a novel deep neural network architecture, named Neural Ordinary Differential Equations (N-ODE), that is getting particular attention due to an attractive property--a test-time tunable trade-off between accuracy and efficiency. This paper analyzes the robustness of N-ODE image classifiers when faced against a strong adversarial attack and how its effectiveness changes when varying such a tunable trade-off. We show that adversarial robustness is increased when the networks operate in different tolerance regimes during test time and training time. On this basis, we propose a novel adversarial detection strategy for N-ODE nets based on the randomization of the adaptive ODE solver tolerance. Our evaluation performed on standard image classification benchmarks shows that our detection technique provides high rejection of adversarial examples while maintaining most of the original samples under white-box attacks and zero-knowledge adversaries.Source: International Conference on Pattern Recognition ICPR 2021, pp. 425–438, Milano (Virtuale), 10-15/01/2021
DOI: 10.1007/978-3-030-68780-9_35
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: ISTI Repository Open Access | link.springer.com Restricted | link.springer.com Restricted | CNR ExploRA Restricted


2021 Journal article Open Access OPEN

The VISIONE video search system: exploiting off-the-shelf text search engines for large-scale video retrieval
Amato G., Bolettieri P., Carrara F., Debole F., Falchi F., Gennaro C., Vadicamo L., Vairo C.
This paper describes in detail VISIONE, a video search system that allows users to search for videos using textual keywords, the occurrence of objects and their spatial relationships, the occurrence of colors and their spatial relationships, and image similarity. These modalities can be combined together to express complex queries and meet users' needs. The peculiarity of our approach is that we encode all information extracted from the keyframes, such as visual deep features, tags, color and object locations, using a convenient textual encoding that is indexed in a single text retrieval engine. This offers great flexibility when results corresponding to various parts of the query (visual, text and locations) need to be merged. In addition, we report an extensive analysis of the retrieval performance of the system, using the query logs generated during the Video Browser Showdown (VBS) 2019 competition. This allowed us to fine-tune the system by choosing the optimal parameters and strategies from those we tested.Source: JOURNAL OF IMAGING 7 (2021). doi:10.3390/jimaging7050076
DOI: 10.3390/jimaging7050076

See at: ISTI Repository Open Access | ISTI Repository Open Access | CNR ExploRA Open Access | www.mdpi.com Open Access


2021 Journal article Open Access OPEN

MEYE: Web-app for translational and real-time pupillometry
Mazziotti R., Carrara F., Viglione A., Lupori L., Lo Verde L., Benedetto A., Ricci G., Sagona G., Amato G., Pizzorusso T.
Pupil dynamics alterations have been found in patients affected by a variety of neuropsychiatric conditions, including autism. Studies in mouse models have used pupillometry for phenotypic assessment and as a proxy for arousal. Both in mice and humans, pupillometry is non-invasive and allows for longitudinal experiments supporting temporal specificity, however, its measure requires dedicated setups. Here, we introduce a Convolutional Neural Network that performs online pupillometry in both mice and humans in a web app format. This solution dramatically simplifies the usage of the tool for the non-specialist and non-technical operators. Because a modern web browser is the only software requirement, this choice is of great interest given its easy deployment and set-up time reduction. The tested model performances indicate that the tool is sensitive enough to detect both locomotor-induced and stimulus-evoked pupillary changes, and its output is comparable with state-of-the-art commercial devices.Source: ENeuro 8 (2021). doi:10.1523/ENEURO.0122-21.2021
DOI: 10.1523/eneuro.0122-21.2021
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: ISTI Repository Open Access | CNR ExploRA Open Access | www.eneuro.org Open Access


2020 Conference article Open Access OPEN

Scalar Quantization-Based Text Encoding for Large Scale Image Retrieval
Amato G., Carrara F., Falchi F., Gennaro C., Rabitti F., Vadicamo L.
The great success of visual features learned from deep neu-ral networks has led to a significant effort to develop efficient and scal- A ble technologies for image retrieval. This paper presents an approach to transform neural network features into text codes suitable for being indexed by a standard full-text retrieval engine such as Elasticsearch. The basic idea is providing a transformation of neural network features with the twofold aim of promoting the sparsity without the need of un-supervised pre-training. We validate our approach on a recent convolu-tional neural network feature, namely Regional Maximum Activations of Convolutions (R-MAC), which is a state-of-art descriptor for image retrieval. An extensive experimental evaluation conducted on standard benchmarks shows the effectiveness and efficiency of the proposed ap-proach and how it compares to state-of-the-art main-memory indexes.Source: 28th Italian Symposium on Advanced Database Systems, pp. 258–265, Virtual (online) due COVID-19, 21-24/06/2020

See at: ceur-ws.org Open Access | ISTI Repository Open Access | CNR ExploRA Open Access


2020 Journal article Open Access OPEN

Large-scale instance-level image retrieval
Amato G., Carrara F., Falchi F., Gennaro C., Vadicamo L.
The great success of visual features learned from deep neural networks has led to a significant effort to develop efficient and scalable technologies for image retrieval. Nevertheless, its usage in large-scale Web applications of content-based retrieval is still challenged by their high dimensionality. To overcome this issue, some image retrieval systems employ the product quantization method to learn a large-scale visual dictionary from a training set of global neural network features. These approaches are implemented in main memory, preventing their usage in big-data applications. The contribution of the work is mainly devoted to investigating some approaches to transform neural network features into text forms suitable for being indexed by a standard full-text retrieval engine such as Elasticsearch. The basic idea of our approaches relies on a transformation of neural network features with the twofold aim of promoting the sparsity without the need of unsupervised pre-training. We validate our approach on a recent convolutional neural network feature, namely Regional Maximum Activations of Convolutions (R-MAC), which is a state-of-art descriptor for image retrieval. Its effectiveness has been proved through several instance-level retrieval benchmarks. An extensive experimental evaluation conducted on the standard benchmarks shows the effectiveness and efficiency of the proposed approach and how it compares to state-of-the-art main-memory indexes.Source: Information processing & management 57 (2020). doi:10.1016/j.ipm.2019.102100
DOI: 10.1016/j.ipm.2019.102100
Project(s): AI4EU via OpenAIRE

See at: ISTI Repository Open Access | Information Processing & Management Restricted | Information Processing & Management Restricted | Information Processing & Management Restricted | Information Processing & Management Restricted | CNR ExploRA Restricted | Information Processing & Management Restricted | www.sciencedirect.com Restricted


2020 Conference article Open Access OPEN

Continuous ODE-defined image features for adaptive retrieval
Carrara F., Amato G., Falchi F., Gennaro C.
In the last years, content-based image retrieval largely benefited from representation extracted from deeper and more complex convolutional neural networks, which became more effective but also more computationally demanding. Despite existing hardware acceleration, query processing times may be easily saturated by deep feature extraction in high-throughput or real-time embedded scenarios, and usually, a trade-off between efficiency and effectiveness has to be accepted. In this work, we experiment with the recently proposed continuous neural networks defined by parametric ordinary differential equations, dubbed ODE-Nets, for adaptive extraction of image representations. Given the continuous evolution of the network hidden state, we propose to approximate the exact feature extraction by taking a previous "near-in-time" hidden state as features with a reduced computational cost. To understand the potential and the limits of this approach, we also evaluate an ODE-only architecture in which we minimize the number of classical layers in order to delegate most of the representation learning process - - and thus the feature extraction process - - to the continuous part of the model. Preliminary experiments on standard benchmarks show that we are able to dynamically control the trade-off between efficiency and effectiveness of feature extraction at inference-time by controlling the evolution of the continuous hidden state. Although ODE-only networks provide the best fine-grained control on the effectiveness-efficiency trade-off, we observed that mixed architectures perform better or comparably to standard residual nets in both the image classification and retrieval setups while using fewer parameters and retaining the controllability of the trade-off.Source: ICMR '20 - International Conference on Multimedia Retrieval, pp. 198–206, Dublin, Ireland, 8-11 June, 2020
DOI: 10.1145/3372278.3390690
Project(s): AI4EU via OpenAIRE

See at: ISTI Repository Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | dl.acm.org Restricted | dl.acm.org Restricted | CNR ExploRA Restricted


2020 Master thesis Unknown

A Computer Vision Approach for Pass Detection on Soccer Broadcast Video
Sorano D.
The annotation of the events that occur during a soccer match is a primary issue for companies that produce data for analytical purposes. Nowadays, the annotation is mostly manual, i.e., humans operators use proprietary software to annotate the events. This thesis aims to automate part of the annotation process with a computer vision approach that can recognize one of the most frequent events in soccer: the passes. To achieve this purpose, we combine soccer broadcast videos and events data. Broadcast videos are the input of the models, while the events data define the labels of the videos. We propose a model that is a combination of the pre-trained model ResNet18, applied to extract features from single frames and a Bidirectional LSTM model that analyzes the temporal evolution of the extracted features. Moreover, we use real-time object detection method YOLO to extract the positional information of the ball and the players inside each frame. This information is concatenated to the feature extracted from the ResNet18 model and used as input of bidirectional LSTM. Our results show a significant improvement in the accuracy of pass detection with respect to baseline classifiers applied to the same task, highlighting that our approach is a first step towards the automation of events annotation in soccer.Project(s): SoBigData via OpenAIRE

See at: etd.adm.unipi.it | CNR ExploRA


2020 Report Open Access OPEN

Automatic Pass Annotation from Soccer Video Streams Based on Object Detection and LSTM
Sorano D., Carrara F., Cintia P., Falchi F., Pappalardo L.
Soccer analytics is attracting increasing interest in academia and industry, thanks to the availability of data that describe all the spatio-temporal events that occur in each match. These events (e.g., passes, shots, fouls) are collected by human operators manually, constituting a considerable cost for data providers in terms of time and economic resources. In this paper, we describe PassNet, a method to recognize the most frequent events in soccer, i.e., passes, from video streams. Our model combines a set of artificial neural networks that perform feature extraction from video streams, object detection to identify the positions of the ball and the players, and classification of frame sequences as passes or not passes. We test PassNet on different scenarios, depending on the similarity of conditions to the match used for training. Our results show good classification results and significant improvement in the accuracy of pass detection with respect to baseline classifiers, even when the match's video conditions of the test and training sets are considerably different. PassNet is the first step towards an automated event annotation system that may break the time and the costs for event annotation, enabling data collections for minor and non-professional divisions, youth leagues and, in general, competitions whose matches are not currently annotated by data providers.Source: Research report, H2020 SoBigData++, 871042, 2020
Project(s): SoBigData via OpenAIRE

See at: arxiv.org Open Access | ISTI Repository Open Access | CNR ExploRA Open Access


2020 Journal article Open Access OPEN

Detection of Face Recognition Adversarial Attacks
Massoli F. V., Carrara F., Amato G., Falchi F.
Deep Learning methods have become state-of-the-art for solving tasks such as Face Recognition (FR). Unfortunately, despite their success, it has been pointed out that these learning models are exposed to adversarial inputs - images to which an imperceptible amount of noise for humans is added to maliciously fool a neural network - thus limiting their adoption in sensitive real-world applications. While it is true that an enormous effort has been spent to train robust models against this type of threat, adversarial detection techniques have recently started to draw attention within the scientific community. The advantage of using a detection approach is that it does not require to re-train any model; thus, it can be added to any system. In this context, we present our work on adversarial detection in forensics mainly focused on detecting attacks against FR systems in which the learning model is typically used only as features extractor. Thus, training a more robust classifier might not be enough to counteract the adversarial threats. In this frame, the contribution of our work is four-fold: (i) we test our proposed adversarial detection approach against classification attacks, i.e., adversarial samples crafted to fool an FR neural network acting as a classifier; (ii) using a k-Nearest Neighbor (k-NN) algorithm as a guide, we generate deep features attacks against an FR system based on a neural network acting as features extractor, followed by a similarity-based procedure which returns the query identity; (iii) we use the deep features attacks to fool an FR system on the 1:1 face verification task, and we show their superior effectiveness with respect to classification attacks in evading such type of system; (iv) we use the detectors trained on the classification attacks to detect the deep features attacks, thus showing that such approach is generalizable to different classes of offensives.Source: Computer vision and image understanding (Print) 202 (2020). doi:10.1016/j.cviu.2020.103103
DOI: 10.1016/j.cviu.2020.103103
Project(s): AI4EU via OpenAIRE

See at: arXiv.org e-Print Archive Open Access | Computer Vision and Image Understanding Open Access | ISTI Repository Open Access | Computer Vision and Image Understanding Restricted | Computer Vision and Image Understanding Restricted | Computer Vision and Image Understanding Restricted | Computer Vision and Image Understanding Restricted | CNR ExploRA Restricted | Computer Vision and Image Understanding Restricted | Computer Vision and Image Understanding Restricted | www.sciencedirect.com Restricted


2020 Journal article Open Access OPEN

Learning accurate personal protective equipment detection from virtual worlds
Di Benedetto M., Carrara F., Meloni E., Amato G., Falchi F., Gennaro C.
Deep learning has achieved impressive results in many machine learning tasks such as image recognition and computer vision. Its applicability to supervised problems is however constrained by the availability of high-quality training data consisting of large numbers of humans annotated examples (e.g. millions). To overcome this problem, recently, the AI world is increasingly exploiting artificially generated images or video sequences using realistic photo rendering engines such as those used in entertainment applications. In this way, large sets of training images can be easily created to train deep learning algorithms. In this paper, we generated photo-realistic synthetic image sets to train deep learning models to recognize the correct use of personal safety equipment (e.g., worker safety helmets, high visibility vests, ear protection devices) during at-risk work activities. Then, we performed the adaptation of the domain to real-world images using a very small set of real-world images. We demonstrated that training with the synthetic training set generated and the use of the domain adaptation phase is an effective solution for applications where no training set is available.Source: Multimedia tools and applications (2020). doi:10.1007/s11042-020-09597-9
DOI: 10.1007/s11042-020-09597-9
Project(s): AI4EU via OpenAIRE

See at: ISTI Repository Open Access | Multimedia Tools and Applications Restricted | Multimedia Tools and Applications Restricted | Multimedia Tools and Applications Restricted | Multimedia Tools and Applications Restricted | CNR ExploRA Restricted


2020 Conference article Open Access OPEN

Nor-Vdpnet: a no-reference high dynamic range quality metric trained on Hdr-Vdp 2
Banterle F., Artusi A., Moreo A., Carrara F.
HDR-VDP 2 has convincingly shown to be a reliable metric for image quality assessment, and it is currently playing a remarkable role in the evaluation of complex image processing algorithms. However, HDR-VDP 2 is known to be computationally expensive (both in terms of time and memory) and is constrained to the availability of a ground-truth image (the so-called reference) against to which the quality of a processed imaged is quantified. These aspects impose severe limitations on the applicability of HDR-VDP 2 to realworld scenarios involving large quantities of data or requiring real-time responses. To address these issues, we propose Deep No-Reference Quality Metric (NoR-VDPNet), a deeplearning approach that learns to predict the global image quality feature (i.e., the mean-opinion-score index Q) that HDRVDP 2 computes. NoR-VDPNet is no-reference (i.e., it operates without a ground truth reference) and its computational cost is substantially lower when compared to HDR-VDP 2 (by more than an order of magnitude). We demonstrate the performance of NoR-VDPNet in a variety of scenarios, including the optimization of parameters of a denoiser and JPEG-XT.Source: IEEE International Conference on Image Processing (ICIP 2020), pp. 126–130, Abu Dhabi, United Arab Emirates, United Arab Emirates, 25/10/2020-28/10/2020
DOI: 10.1109/icip40778.2020.9191202
Project(s): EVOCATION via OpenAIRE, ENCORE via OpenAIRE

See at: ISTI Repository Open Access | ISTI Repository Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | ieeexplore.ieee.org Restricted | CNR ExploRA Restricted | xplorestaging.ieee.org Restricted


2020 Conference article Open Access OPEN

Learning distance estimators from pivoted embeddings of metric objects
Carrara F., Gennaro C., Falchi F., Amato G.
Efficient indexing and retrieval in generic metric spaces often translate into the search for approximate methods that can retrieve relevant samples to a query performing the least amount of distance computations. To this end, when indexing and fulfilling queries, distances are computed and stored only against a small set of reference points (also referred to as pivots) and then adopted in geometrical rules to estimate real distances and include or exclude elements from the result set. In this paper, we propose to learn a regression model that estimates the distance between a pair of metric objects starting from their distances to a set of reference objects. We explore architectural hyper-parameters and compare with the state-of-the-art geometrical method based on the n-simplex projection. Preliminary results show that our model provides a comparable or slightly degraded performance while being more efficient and applicable to generic metric spaces.Source: SISAP 2020: the 13th International Conference on Similarity Search and Applications, pp. 361–368, Copenhagen, Denmark (Virtual), 30/09/2020 - 02/10/2020
DOI: 10.1007/978-3-030-60936-8_28
Project(s): AI4EU via OpenAIRE

See at: ISTI Repository Open Access | academic.microsoft.com Restricted | link.springer.com Restricted | link.springer.com Restricted | link.springer.com Restricted | CNR ExploRA Restricted


2020 Contribution to book Open Access OPEN

Preface - SISAP 2020
Satoh S., Vadicamo L., Zimek A., Carrara F., Bartolini I., Aumüller M., Jónsson B. Þór, Pagh R.
Preface of Volume 12440 LNCS,2020, Pages v-vi, 13th International Conference on Similarity Search and Applications, SISAP 2020.Source: Similarity Search and Applications, pp. v–vi. New York: Springer Science and Business Media, 2020
DOI: 10.1007/978-3-030-60936-8

See at: link.springer.com Open Access | CNR ExploRA Open Access | link.springer.com Restricted | link.springer.com Restricted


2020 Report Open Access OPEN

AIMH research activities 2020
Aloia N., Amato G., Bartalesi V., Benedetti F., Bolettieri P., Carrara F., Casarosa V., Ciampi L., Concordia C., Corbara S., Esuli A., Falchi F., Gennaro C., Lagani G., Massoli F. V., Meghini C., Messina N., Metilli D., Molinari A., Moreo A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Thanos C., Trupiano L., Vadicamo L., Vairo C.
Annual Report of the Artificial Intelligence for Media and Humanities laboratory (AIMH) research activities in 2020.

See at: ISTI Repository Open Access | CNR ExploRA Open Access


2019 Conference article Open Access OPEN

An Image Retrieval System for Video
Bolettieri P., Carrara F., Debole F., Falchi F., Gennaro C., Vadicamo L., Vairo C.
Since the 1970's the Content-Based Image Indexing and Retrieval (CBIR) has been an active area. Nowadays, the rapid increase of video data has paved the way to the advancement of the technologies in many different communities for the creation of Content-Based Video Indexing and Retrieval (CBVIR). However, greater attention needs to be devoted to the development of effective tools for video search and browse. In this paper, we present Visione, a system for large-scale video retrieval. The system integrates several content-based analysis and retrieval modules, including a keywords search, a spatial object-based search, and a visual similarity search. From the tests carried out by users when they needed to find as many correct examples as possible, the similarity search proved to be the most promising option. Our implementation is based on state-of-the-art deep learning approaches for content analysis and leverages highly efficient indexing techniques to ensure scalability. Specifically, we encode all the visual and textual descriptors extracted from the videos into (surrogate) textual representations that are then efficiently indexed and searched using an off-the-shelf text search engine using similarity functions.Source: International Conference on Similarity Search and Applications (SISAP), pp. 332–339, Newark, NJ, USA, 2-4/10/2019
DOI: 10.1007/978-3-030-32047-8_29

See at: ISTI Repository Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | link.springer.com Restricted | link.springer.com Restricted | CNR ExploRA Restricted | rd.springer.com Restricted


2019 Software Unknown

VISIONE Content-Based Video Retrieval System, VBS 2019
Amato G., Bolettieri P., Carrara F., Debole F., Falchi F., Gennaro C., Vadicamo L., Vairo C.
VISIONE is a content-based video retrieval system that participated to VBS for the very first time in 2019. It is mainly based on state-of-the-art deep learning approaches for visual content analysis and exploits highly efficient indexing techniques to ensure scalability. The system supports query by scene tag, query by object location, query by color sketch, and visual similarity search.

See at: bilioso.isti.cnr.it | CNR ExploRA


2019 Conference article Open Access OPEN

Adversarial Examples Detection in Features Distance Spaces
Carrara F., Becarelli R., Caldelli R., Falchi F., Amato G.
Maliciously manipulated inputs for attacking machine learning methods -- in particular deep neural networks -- are emerging as a relevant issue for the security of recent artificial intelligence technologies, especially in computer vision. In this paper, we focus on attacks targeting image classifiers implemented with deep neural networks, and we propose a method for detecting adversarial images which focuses on the trajectory of internal representations (i.e. hidden layers neurons activation, also known as deep features) from the very first, up?to the last. We argue that the representations of adversarial inputs follow a different evolution with respect to genuine inputs, and we define a distance-based embedding of features to efficiently encode this information. We train an LSTM network that analyzes the sequence of deep features embedded in a distance space to detect adversarial examples. The results of our preliminary experiments are encouraging: our detection scheme is able to detect adversarial inputs targeted to the ResNet-50 classifier pre-trained on the ILSVRC'12 dataset and generated by a variety of crafting algorithms.Source: ECCV: European Conference on Computer Vision, pp. 313–327, Monaco, Germania, 8-14 Settembre 2018
DOI: 10.1007/978-3-030-11012-3_26

See at: ISTI Repository Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | link.springer.com Restricted | link.springer.com Restricted | openaccess.thecvf.com Restricted | openaccess.thecvf.com Restricted | CNR ExploRA Restricted | rd.springer.com Restricted


2019 Conference article Open Access OPEN

Learning relationship-aware visual features
Messina N., Amato G., Carrara F., Falchi F., Gennaro C.
Relational reasoning in Computer Vision has recently shown impressive results on visual question answering tasks. On the challenging dataset called CLEVR, the recently proposed Relation Network (RN), a simple plug-and-play module and one of the state-of-the-art approaches, has obtained a very good accuracy (95.5%) answering relational questions. In this paper, we define a sub-field of Content-Based Image Retrieval (CBIR) called Relational-CBIR (R-CBIR), in which we are interested in retrieving images with given relationships among objects. To this aim, we employ the RN architecture in order to extract relation-aware features from CLEVR images. To prove the effectiveness of these features, we extended both CLEVR and Sort-of-CLEVR datasets generating a ground-truth for R-CBIR by exploiting relational data embedded into scene-graphs. Furthermore, we propose a modification of the RN module - a two-stage Relation Network (2S-RN) - that enabled us to extract relation-aware features by using a preprocessing stage able to focus on the image content, leaving the question apart. Experiments show that our RN features, especially the 2S-RN ones, outperform the RMAC state-of-the-art features on this new challenging task.Source: ECCV 2018 - European Conference on Computer Vision, pp. 486–501, Monaco, Germania, 8-14 Settembre 2018
DOI: 10.1007/978-3-030-11018-5_40

See at: ISTI Repository Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | doi.org Restricted | link.springer.com Restricted | link.springer.com Restricted | link.springer.com Restricted | openaccess.thecvf.com Restricted | openaccess.thecvf.com Restricted | CNR ExploRA Restricted


2019 Report Open Access OPEN

SmartPark@Lucca - D5. Integrazione e sperimentazione sul campo
Amato G., Bolettieri P., Carrara F., Ciampi L., Gennaro C., Leone G. R., Moroni D., Pieri G., Vairo C.
In questo deliverable sono descritte le attività eseguite all'interno del WP3, in particolare relative al Task 3.1 - Integrazione e al Task 3.2 - Sperimentazione sul campo.Source: Project report, SmartPark@Lucca, Deliverable D5, pp.1–24, 2019

See at: ISTI Repository Open Access | CNR ExploRA Open Access